CN118228320B

CN118228320B - Data analysis method based on privacy calculation

Info

Publication number: CN118228320B
Application number: CN202410663983.0A
Authority: CN
Inventors: 程烨; 黄仁华
Original assignee: Yanchen Shumeng Hangzhou Technology Co ltd
Current assignee: Yanchen Shumeng Hangzhou Technology Co ltd
Priority date: 2024-05-27
Filing date: 2024-05-27
Publication date: 2024-08-13
Anticipated expiration: 2044-05-27
Also published as: CN118228320A

Abstract

The application discloses a data analysis method based on privacy calculation, which belongs to the technical field of computers, by the technical scheme provided by the embodiment of the application, a plurality of service data of the target service and the data attribute of each service data are acquired. The data value of each business data is determined by the data attribute of each business data, so that a plurality of target business data with higher data value are screened out from a plurality of business data by the data value. The plurality of target business data is input into a trusted execution environment, and data sensitivity of each target business data is determined based on each target business data in the trusted execution environment. And analyzing the target service data by adopting a data analysis mode corresponding to the data sensitivity of the target service data, so as to obtain the data analysis result of the target service data, and realize the aim of data analysis while protecting the data security.

Description

Data analysis method based on privacy calculation

Technical Field

The application relates to the technical field of computers, in particular to a data analysis method based on privacy calculation.

Background

With the development of big data and cloud computing, the value of the data is increasingly prominent, but the problems of data security and privacy are also increased. It is difficult for traditional data processing methods to meet strict privacy protection requirements, and how to realize data analysis while protecting data security remains a great challenge.

Disclosure of Invention

The embodiment of the application provides a data analysis method based on privacy calculation, which can realize data analysis while protecting data security.

In one aspect, a method for analyzing data based on privacy computation is provided, the method comprising:

Acquiring a plurality of service data of a target service and data attributes of each service data, wherein the data attributes comprise a data source, a data type and a data acquisition cost, and the data acquisition cost is the resource consumption for acquiring corresponding service data;

Determining the data value of each business data based on the data attribute of each business data;

Acquiring a plurality of target service data from the plurality of service data based on the data value of each service data, wherein the data value of the target service data is greater than or equal to a preset data value;

Inputting the plurality of target service data into a trusted execution environment, and determining the data sensitivity of each target service data based on each target service data in the trusted execution environment;

And analyzing each target service data in the trusted execution environment by adopting a data analysis mode corresponding to the data sensitivity of each target service data, and outputting a data analysis result of each target service data, wherein the data analysis result is associated with a service target of the target service.

In one aspect, there is provided a data analysis apparatus based on privacy calculations, the apparatus comprising:

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a plurality of service data of a target service and data attributes of each service data, the data attributes comprise a data source, a data type and a data acquisition cost, and the data acquisition cost is the resource amount consumed by acquiring corresponding service data;

the data value determining module is used for determining the data value of each business data based on the data attribute of each business data;

The screening module is used for acquiring a plurality of target service data from the plurality of service data based on the data value of each service data, wherein the data value of the target service data is greater than or equal to a preset data value;

the sensitivity determining module is used for inputting the plurality of target service data into a trusted execution environment, and determining the data sensitivity of each target service data based on each target service data in the trusted execution environment;

the analysis module is used for analyzing each target service data in the trusted execution environment by adopting a data analysis mode corresponding to the data sensitivity of each target service data, and outputting a data analysis result of each target service data, wherein the data analysis result is associated with a service target of the target service.

In a possible implementation manner, the acquiring module is configured to acquire a plurality of initial service data of the target service; determining the association degree between the plurality of initial service data and the service targets of the target service; screening a plurality of reference service data from the plurality of initial service data based on the association degree, wherein the reference service data is the initial service data with the association degree with the service target of the target service being greater than or equal to a preset association degree; desensitizing the plurality of reference service data to obtain the plurality of service data; and acquiring the data attribute of each service data.

In a possible implementation manner, the acquiring module is configured to perform field identification on the multiple reference service data to obtain field types of multiple data fields in each reference service data; marking data fields of which the field types belong to a preset type set in each piece of reference service data as sensitive fields, wherein the preset type set comprises a plurality of preset types; and adjusting the data fields marked as the sensitive fields based on the field types of the data fields marked as the sensitive fields in the plurality of reference service data to obtain the plurality of service data.

In a possible implementation manner, the data value determining module is configured to determine, for any service data of the plurality of service data, a source data value of the service data based on a data source of the service data; determining the type data value and the reference data acquisition cost of the service data based on the data source and the data type of the service data;

determining the cost data value of the service data based on the data type, the data acquisition cost and the reference data acquisition cost of the service data; and determining the data value of the service data based on the source data value, the type data value and the cost data value of the service data.

In a possible implementation manner, the data value determining module is configured to determine a reference data value corresponding to a data type of the service data; determining a ratio between a data acquisition cost of the service data and the reference data acquisition cost; and multiplying the ratio by the reference data value to obtain the cost data value of the service data.

In a possible implementation manner, the sensitivity determining module is configured to perform feature extraction on each target service data in the trusted execution environment to obtain a data content feature and a data transmission feature of each target service data; determining data content sensitivity and data transmission risk of each target service data based on the data content characteristics and data transmission characteristics of each target service data in the trusted execution environment; and determining the data sensitivity of each target service data based on the data content sensitivity and the data transmission risk of each target service data in the trusted execution environment.

In one possible implementation manner, the sensitivity determining module is configured to perform feature extraction on any one of the plurality of target service data in the trusted execution environment by using a first feature extractor and a second feature extractor, where the first feature extractor focuses attention on data content with high importance when performing feature extraction, and the second feature extractor focuses attention on data content related to data propagation when performing feature extraction.

In a possible implementation manner, the analysis module is configured to group the target service data based on the data sensitivity of each target service data in the trusted execution environment to obtain a plurality of service data groups, where the data sensitivity of the target service data in each service data set is the same; loading a plurality of data analyzers corresponding to business targets of the target business in the trusted execution environment, wherein different data analyzers correspond to different data sensitivities; and analyzing target business data in the plurality of business data groups by adopting the plurality of data analyzers in the trusted execution environment according to the corresponding relation between the business data groups and the data analyzers, and outputting data analysis results of the target business data.

In a possible implementation manner, the analysis module is configured to determine, for any one of the plurality of service data sets, a target data analyzer corresponding to the service data set from the plurality of data analyzers in the trusted execution environment; adopting the target data analyzer in the trusted execution environment to perform feature extraction on target service data in the service data set to obtain data features of the target service data in the service data set; and adopting the target data analyzer in the trusted execution environment, mapping based on the data characteristics of the target service data in the service data group, and outputting the data analysis result of the target service data in the service data group.

In one possible implementation, the target service is any one of a financial service, an electronic commerce service, a content recommendation service, and a communication service.

In one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one computer program stored therein, the computer program loaded and executed by the one or more processors to implement the privacy calculation based data analysis method.

In one aspect, a computer readable storage medium having at least one computer program stored therein is provided, the computer program being loaded and executed by a processor to implement the privacy calculation based data analysis method.

In one aspect, a computer program product or computer program is provided, the computer program product or computer program comprising a program code, the program code being stored in a computer readable storage medium, the program code being read from the computer readable storage medium by a processor of a computer device, the program code being executed by the processor, causing the computer device to perform the above-described data analysis method based on privacy calculations.

By the technical scheme provided by the embodiment of the application, a plurality of service data of the target service and the data attribute of each service data are acquired. The data value of each business data is determined by the data attribute of each business data, so that a plurality of target business data with higher data value are screened out from a plurality of business data by the data value. The plurality of target business data is input into a trusted execution environment, and data sensitivity of each target business data is determined based on each target business data in the trusted execution environment. And analyzing the target service data by adopting a data analysis mode corresponding to the data sensitivity of the target service data, so as to obtain the data analysis result of the target service data, and realize the aim of data analysis while protecting the data security.

Drawings

For a clearer description of the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person of ordinary skill in the art;

Fig. 1 is a schematic diagram of an implementation environment of a data analysis method based on privacy calculation according to an embodiment of the present application;

FIG. 2 is a flow chart of a data analysis method based on privacy calculation according to an embodiment of the present application;

FIG. 3 is a flowchart of another data analysis method based on privacy calculations provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a data analysis device based on privacy calculation according to an embodiment of the present application;

Fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a theory, method, technique, and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain better results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements the learning behavior of a human to acquire new knowledge or skills, reorganizing existing knowledge sub-models to continuously improve its own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Privacy calculation: the privacy calculation is a calculation theory and method for protecting the whole life cycle of privacy information. The method mainly comprises the operations of describing, measuring, evaluating, fusing and the like of the related privacy information when processing various forms of data (such as video, audio, images, graphics, characters, numerical values, network behavior information streams and the like). In the embodiment of the application, a means of privacy calculation, namely a trusted execution environment, is mainly adopted.

Trusted execution environment: this is a hardware-based privacy protection technique that creates a protected environment in which data can be handled securely without being stolen or tampered with by external attackers.

Data management: data governance is a management action involving the use of data, intended to be managed by formulating and enforcing a series of policies and procedures for business applications and techniques for data within an entire enterprise. The data management is an active set of rights and controls for data asset management, the purpose of which is to promote the value of data, and is the basis for enterprises to realize digital strategy, and is a management system comprising organization, system, flow and tools.

Normalization: and the number sequences with different value ranges are mapped to the (0, 1) interval, so that the data processing is facilitated. In some cases, the normalized value may be directly implemented as a probability.

Embedded Coding (Embedded Coding): the embedded code mathematically represents a correspondence, that is, data in the X space is mapped to the Y space by a function F, where the function F is a single-shot function, and the mapping result is a structure save, where the single-shot function represents that the mapped data corresponds uniquely to the pre-mapped data, and the structure save represents that the size relationship of the pre-mapped data is the same as the size relationship of the post-mapped data, for example, there are data X ₁ and X ₂ before mapping, and Y ₁ corresponding to X ₁ and Y ₂ corresponding to X ₂ are obtained after mapping. If the pre-mapped data X ₁＞X₂, then the mapped data Y ₁ is correspondingly larger than Y ₂. For words, the words are mapped to another space, so that subsequent machine learning and processing are facilitated.

Attention weighting: the importance of certain data in the training or prediction process can be expressed, and the importance represents the influence of input data on output data. The data with high importance has higher corresponding attention weight value, and the data with low importance has lower corresponding attention weight value. The importance of the data is not the same in different scenarios, and the process of training attention weights of the model is the process of determining the importance of the data.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, in the embodiment of the present application, the collection and use of the service data may be performed after the permission of the user.

Fig. 1 is a schematic diagram of an implementation environment of a data analysis method based on privacy calculation according to an embodiment of the present application, referring to fig. 1, the implementation environment may include a terminal 110 and a server 140.

Terminal 110 is connected to server 140 via a wireless network or a wired network. Alternatively, the terminal 110 is a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like, but is not limited thereto. The terminal 110 is installed and runs an application program supporting data analysis based on privacy calculations.

The server 140 is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a distribution network (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, and the like. Server 140 is capable of providing background services for applications running on terminal 110. In an embodiment of the present application, server 140 is also referred to as a privacy calculation server.

The data analysis method based on privacy calculation provided by the embodiment of the application is explained below. Fig. 2 is a flowchart of a data analysis method based on privacy calculation according to an embodiment of the present application, referring to fig. 2, taking an execution subject as a server as an example, the method includes the following steps.

201. The server acquires a plurality of service data of the target service and data attributes of each service data, wherein the data attributes comprise a data source, a data type and a data acquisition cost, and the data acquisition cost is the resource amount consumed by acquiring the corresponding service data.

The target business is any one of financial business, electronic commerce business, content recommendation business and communication business, and the business data is data related to the financial business under the condition that the target business is the financial business; in the case that the target service is an electric service, the service data is data related to the electric service; in the case that the target service is a content recommendation service, the service data is data related to the content recommendation service; in the case where the target service is a communication service, the service data is data related to the communication service. The data source comprises a data acquisition channel and a data acquisition mode, the data type is associated with a system for classifying the data under the target service, and the obtaining of the resource amount consumed by the corresponding service data comprises obtaining of the virtual resource amount, the network resource amount and the operation resource amount consumed by the corresponding service data.

202. The server determines the data value of each service data based on the data attributes of each service data.

Wherein the data value of the service data is used for reflecting the importance degree of the service data.

203. The server acquires a plurality of target service data from the plurality of service data based on the data value of each service data, wherein the data value of the target service data is greater than or equal to a preset data value.

The preset data value is set by a technician according to actual situations, which is not limited in the embodiment of the application. The target service data is service data with higher data value in the plurality of service data, namely service data with higher importance.

204. The server inputs the plurality of target business data into a trusted execution environment in which data sensitivity of each target business data is determined based on each target business data.

The trusted execution environment is a protected environment, in which data can be safely processed without being stolen or tampered by an external attacker, and is created by combining hardware and software. The data sensitivity of the target service data is positively correlated with the confidentiality of the target service data, that is, the higher the data sensitivity of the target service data, the higher the confidentiality of the target service data should not be propagated over a large range.

205. The server analyzes each target service data in the trusted execution environment by adopting a data analysis mode corresponding to the data sensitivity of each target service data, and outputs the data analysis result of each target service data, wherein the data analysis result is associated with the service target of the target service.

The service target of the target service is set by a technician according to actual situations, for example, in the case that the target service is a financial service, the service target may be configured that the bad property rate is lower than a preset bad property rate, or the service target may be configured that a newly added loan amount reaches a preset amount, or the like; in the case where the target service is a business service, the business target may be configured such that the conversion rate reaches a preset conversion rate, or the business target may be configured such that the user growth rate reaches a preset growth rate, or the like; in the case that the target service is a content recommendation service, the service target may be configured such that the user growth rate reaches a preset growth rate, or the service target is configured such that the number of daily active users reaches a preset number, or the like; in the case where the target service is a communication service, the service target may be configured such that the user growth rate reaches a preset growth rate, or such that the average human consumption amount reaches a preset consumption amount, or the like.

The foregoing steps 201 to 205 are a simple introduction of the data analysis method based on privacy calculation provided by the embodiment of the present application, and the data analysis method based on privacy calculation provided by the embodiment of the present application will be described more clearly with reference to fig. 3, taking the execution subject as a server as an example, where the method includes the following steps.

301. The server acquires a plurality of service data of the target service and data attributes of each service data, wherein the data attributes comprise a data source, a data type and a data acquisition cost, and the data acquisition cost is the resource amount consumed by acquiring the corresponding service data.

The target business is any one of financial business, electronic commerce business, content recommendation business and communication business, and the business data is data related to the financial business under the condition that the target business is the financial business. In the case where the target service is an electric service, the service data is electric service related data. In the case where the target service is a content recommendation service, the service data is data related to the content recommendation service. In the case where the target service is a communication service, the service data is data related to the communication service. The data source comprises a data acquisition channel and a data acquisition mode, the data type is associated with a system for classifying the data under the target service, and the obtaining of the resource amount consumed by the corresponding service data comprises obtaining of the virtual resource amount, the network resource amount and the operation resource amount consumed by the corresponding service data.

In one possible implementation, the server obtains a plurality of initial service data for the target service. The server determines a degree of association between the plurality of initial business data and a business target of the target business. The server screens a plurality of reference service data from the plurality of initial service data based on the association degree, wherein the reference service data is the initial service data with the association degree with the service target of the target service being greater than or equal to the preset association degree. The server desensitizes the plurality of reference service data to obtain the plurality of service data. The server acquires the data attribute of each service data.

Wherein the initial service data is directly acquired service data. The association degree between the initial business data and the business targets of the target business is used for reflecting the contribution degree of the initial business data to the realization progress describing the business targets, namely, the higher the association degree is, the higher the contribution degree of the initial business data to the realization progress describing the business targets is; the lower the degree of association, the lower the contribution of the initial business data to the progress of implementation describing the business objective. The preset association degree is set by a technician according to actual conditions, and the embodiment of the application is not limited to the preset association degree. The desensitization of the reference service data is to eliminate sensitive information in the reference service data to protect the security of the service data.

In order to more clearly describe the above embodiments, the above embodiments will be described below in sections.

The first part and the server acquire a plurality of initial service data of the target service.

In one possible implementation, the server obtains, from a plurality of data sources associated with a target service, a plurality of initial service data of the target service, where the initial service data is used to reflect an operation condition of the target service within a preset duration.

Optionally, after the plurality of initial service data are acquired, the server performs preprocessing on the plurality of initial service data to eliminate errors in the plurality of initial service data, so as to facilitate subsequent analysis and processing.

The second part, the server, determines the degree of association between the plurality of initial business data and the business objectives of the target business.

In one possible implementation manner, the server inputs the plurality of initial service data and the service target of the target service into a relevance prediction model, and performs feature extraction on each initial service data and the service target through the relevance prediction model to obtain a first relevance prediction feature of each initial service data and a second relevance prediction feature of the service target. And the server determines the association degree between each initial service data and the service target of the target service based on the similarity between the first association degree prediction feature and the second association degree prediction feature of each initial service data through the association degree prediction model.

The relevance prediction model is obtained by training in a supervision training mode based on a plurality of sample service data, a plurality of sample service targets and label relevance between each sample service target corresponding to each sample service data, and has the capability of predicting relevance according to the input initial service data and service targets. The relevance prediction model can adopt a prediction model with any structure, and the embodiment of the application is not limited to the above.

And the third part, the server screens a plurality of reference service data from the plurality of initial service data based on the association degree.

In one possible implementation manner, the server determines the first N initial service data with the highest association degree in the plurality of initial service data and the service target as the plurality of reference service data, where N is a positive integer.

And the fourth part, the server desensitizes the plurality of reference service data to obtain the plurality of service data.

In one possible implementation manner, the server performs field identification on the plurality of reference service data to obtain field types of a plurality of data fields in each reference service data. The server marks data fields with field types belonging to a preset type set in each piece of reference service data as sensitive fields, wherein the preset type set comprises a plurality of preset types. The server adjusts the data fields marked as the sensitive fields based on the field types of the data fields marked as the sensitive fields in the plurality of reference service data to obtain the plurality of service data.

The preset types included in the preset type set are set by a technician according to the situation of the target service, for example, in the case that the target service is a financial service, the preset types include a user name, a user asset, a user income, a place where the user transacts the service, and the like, which is not limited in the embodiment of the present application.

In order to more clearly describe the above embodiment, a manner in which the server adjusts the data field marked as the sensitive field based on the field type of the data field marked as the sensitive field in the plurality of reference service data in the above embodiment will be described below.

In some embodiments, for any data field marked as a sensitive field, the server determines the field adjustment model corresponding to the field type of the data field. The server inputs the data field into the field adjustment model, and the field adjustment model processes the data field to obtain an adjusted data field. The server replaces the data field with the adjusted data field.

In the embodiment of the present application, different field types correspond to different field adjustment models, and a specific field adjustment mode is set by a technician according to a situation of a target service, which is not limited in the embodiment of the present application.

And the fifth part and the server acquire the data attribute of each service data.

In one possible implementation, the server obtains the data source, the data type, and the data acquisition cost of each business data.

302. The server determines the data value of each service data based on the data attributes of each service data.

In one possible implementation, for any one of the plurality of business data, the server determines a source data value for the business data based on a data source of the business data. The server determines a type data value and a reference data acquisition cost of the service data based on a data source and a data type of the service data. The service area determines the cost data value of the service data based on the data type, the data acquisition cost and the reference data acquisition cost of the service data. The server determines the data value of the business data based on the source data value, the type data value, and the cost data value of the business data.

The first part, the server, determines the source data value of the business data based on the data source of the business data.

In one possible implementation, the data sources include data collection channels and data collection modes, and the server determines a first source value corresponding to the data collection channels and a second source value corresponding to the data collection modes. And the server performs weighted fusion on the first source value and the second source value to obtain the source data value of the business data.

The data acquisition channel refers to a data source for acquiring service data, and the data acquisition mode is a mode for acquiring the service data from the data source, for example, the data acquisition mode includes active uploading of the data source, active crawler acquisition, timing scanning acquisition and the like, which is not limited in the embodiment of the application.

For example, the server uses the data collection channel to query for the first source value. And the server queries in a data acquisition mode to obtain the second source value. And the server adopts a first weight corresponding to the data acquisition channel and a second weight corresponding to the data acquisition mode to fuse the first source value and the second source value, so as to obtain the source data value of the service data.

The corresponding relationship between the data collection channel and the first source value, the corresponding relationship between the data collection mode and the second source value, the corresponding relationship between the data collection channel and the first weight, and the corresponding relationship between the data collection mode and the second weight are set by a technician according to actual conditions, which is not limited by the embodiment of the application.

The second part, the server, based on the data source and data type of the service data, determines the type data value and reference data acquisition cost of the service data.

In one possible implementation, the server uses the data source and the data type of the service data to query, and obtains the type data value of the service data. And the server queries by adopting the data source and the data type of the service data to obtain the reference data acquisition cost of the service data.

The type data value is used for reflecting the data value of service data under a certain data type of a certain data source, is a macroscopic data value based on the data type, and the corresponding relation between the type data value and the data source and the data type is set by a technician according to the actual situation, which is not limited by the embodiment of the application. The reference data acquisition cost is used for reflecting the average acquisition cost of service data under a certain data type of a certain data source, can reflect the acquisition difficulty of the service data to a certain extent, and may have different reference data acquisition costs for the same data type of different data sources. In addition, the reference data acquisition cost may vary over time.

And the third part and the service area determine the cost data value of the service data based on the data type, the data acquisition cost and the reference data acquisition cost of the service data.

In one possible implementation, the server determines a reference data value corresponding to the data type of the service data. The server determines a ratio between the data acquisition cost of the traffic data and the reference data acquisition cost. The server multiplies the ratio by the reference data value to obtain the cost data value of the service data.

The correspondence between the data type and the reference data value is set by a technician according to the actual situation, which is not limited by the embodiment of the present application.

And a fourth part, the server determining the data value of the business data based on the source data value, the type data value and the cost data value of the business data.

In one possible implementation, the server performs weighted fusion on the source data value, the type data value and the cost data value of the service data to obtain the data value of the service data.

The weighted sum weight is determined based on the source data value, the type data value and the credibility of the cost data value of the service data, that is, the weighted sum weight is obtained after normalizing the credibility, and the credibility is set by a technician according to the actual situation, which is not limited by the embodiment of the application.

303. The server acquires a plurality of target service data from the plurality of service data based on the data value of each service data, wherein the data value of the target service data is greater than or equal to a preset data value.

304. The server inputs the plurality of target service data into a trusted execution environment, and performs feature extraction on each target service data in the trusted execution environment to obtain data content features and data transmission features of each target service data.

The trusted execution environment is a protected environment, in which data can be safely processed without being stolen or tampered by an external attacker, and is created by combining hardware and software. The data sensitivity of the target service data is positively correlated with the confidentiality of the target service data, that is, the higher the data sensitivity of the target service data, the higher the confidentiality of the target service data should not be propagated over a large range. The data content features are used for representing the target business data from the dimension of the data content, and the data propagation features are used for representing the target business data from the dimension of the data propagation, wherein the dimension of the data propagation comprises the dimension of the data propagation speed, the dimension of the data propagation range and the like.

In one possible implementation manner, the server inputs the plurality of target service data into a trusted execution environment, and for any one of the plurality of target service data, the server performs feature extraction on the target service data by using a first feature extractor and a second feature extractor in the trusted execution environment, so as to obtain a data content feature and a data transmission feature of the target service data, wherein the first feature extractor focuses attention on data content with high importance when performing feature extraction, and the second feature extractor focuses attention on data content related to data transmission when performing feature extraction.

The first feature extractor can concentrate attention on data content with higher importance degree to finish feature extraction, namely, set higher attention weight on the data content with higher importance degree, so as to finish feature extraction. The second feature extractor is capable of focusing attention on the data content related to the data propagation to accomplish feature extraction, i.e., to set a higher attention weight for the data content related to the data propagation to accomplish feature extraction.

For example, the server inputs the plurality of target service data into a trusted execution environment, and for any one of the plurality of target service data, the server performs data segmentation on the target service data in the trusted execution environment to obtain a plurality of data fields of the target service data. And the server performs embedded coding on the plurality of data fields in the trusted execution environment to obtain embedded characteristics of each data field. The server inputs the embedded features of each data field into a first feature extractor in the trusted execution environment, and the first feature extractor determines the importance level of each data field based on the embedded features of each data field. The server marks data fields with importance greater than or equal to an importance threshold as important data fields in the trusted execution environment by the first feature extractor. The server encodes the embedded features of the plurality of data fields after marking based on the attention mechanism through a first feature extractor in the trusted execution environment to obtain the data content features of the target service data. The server inputs the embedded features of the respective data fields into a second feature extractor in the trusted execution environment, and the second feature extractor determines the propagation association degree of the respective data fields based on the embedded features of the respective data fields. The server marks the data fields with a propagation association degree greater than or equal to the propagation association degree threshold value as propagation association data fields in the trusted execution environment through the second feature extractor. The server encodes the embedded features of the plurality of data fields after marking based on the attention mechanism through a second feature extractor in the trusted execution environment to obtain the data propagation features of the target service data.

Wherein, the first feature extractor and the second feature extractor are both based on encoders of the BERT model, and specific structures of the first feature extractor and the second feature extractor are not limited in the embodiments of the present application. The importance degree threshold and the propagation association degree threshold are set by the technician according to the actual situation, and the embodiment of the application is not limited to this.

305. The server determines data content sensitivity and data transmission risk of each target service data based on the data content characteristics and data transmission characteristics of each target service data in the trusted execution environment.

Wherein the data content sensitivity is used to represent the sensitivity level of the service data from the dimension of the data content, and the data propagation risk is used to represent the sensitivity level of the data from the dimension of the data propagation, and in general, the higher the data content sensitivity and the data propagation risk, the higher the data sensitivity of the target service data.

In one possible implementation manner, for any one of the plurality of target service data, the server performs full connection and normalization on the data content characteristics of the service data in the trusted execution environment to obtain the data content sensitivity of the target service data. And the server performs full connection and normalization on the data content characteristics of the service data in the trusted execution environment to obtain the data transmission risk of the target service data.

306. The server determines the data sensitivity of each target service data based on the data content sensitivity and the data transmission risk of each target service data in the trusted execution environment.

In one possible implementation manner, the server inputs the data content sensitivity and the data propagation risk of each target service data into a data sensitivity determining model in the trusted execution environment, and fuses the data content sensitivity and the data propagation risk of each target data through the data sensitivity determining model to output the data sensitivity of each target service data.

For example, for any one target data of a plurality of target data, the server inputs the data content sensitivity and the data propagation risk of the target service data into a data sensitivity determining model in the trusted execution environment, and the data content sensitivity and the data propagation risk of the target service data are spliced through the data sensitivity determining model to obtain the splicing characteristic. And the server performs multiple downsampling and multiple upsampling on the spliced characteristic in the trusted execution environment through the data sensitivity determination model to obtain a sensitivity determination characteristic. And the server carries out full connection and normalization on the sensitivity determination characteristics through the data sensitivity determination model in the trusted execution environment, and outputs the data sensitivity of the target business data.

The process of performing multiple downsampling and multiple upsampling on the spliced feature is realized through a convolution kernel and an deconvolution kernel, that is, the data sensitivity determination model comprises a plurality of downsampling convolution kernels (convolution kernels) and a plurality of upsampling convolution kernels (deconvolution kernels), and jump connection is formed between each downsampling convolution kernel and the corresponding upsampling convolution kernel, so that the convolved feature is transmitted to the corresponding upsampling convolution kernel before the downsampling convolution kernel convolves, the completeness of the spliced feature is ensured, and the accuracy of data sensitivity is improved. In the embodiment of the application, the data sensitivity comprises primary sensitivity, secondary sensitivity, tertiary sensitivity and quaternary sensitivity, wherein the sensitivity degree of the primary sensitivity is the highest, and the sensitivity degree of the quaternary sensitivity is the lowest. The data sensitivity determination model may be regarded as a classification model, that is, the classification of the target service data is achieved by using the data content sensitivity and the data transmission risk, and any type of classification model may be used as the data sensitivity determination model, which is not limited in the embodiment of the present application.

307. The server analyzes each target service data in the trusted execution environment by adopting a data analysis mode corresponding to the data sensitivity of each target service data, and outputs the data analysis result of each target service data, wherein the data analysis result is associated with the service target of the target service.

The business objective of the objective business is set by a technician according to actual situations, for example, in the case that the objective business is a financial business, the business objective may be configured to have a bad property rate lower than a preset bad property rate, or the business objective may be configured to newly add a loan amount to reach a preset amount, or the like. In the case where the target service is a business service, the service target may be configured such that the conversion rate reaches a preset conversion rate, or the service target may be configured such that the user growth rate reaches a preset growth rate, or the like. In the case where the target service is a content recommendation service, the service target may be configured such that the user growth rate reaches a preset growth rate, or such that the number of active users reaches a preset number, or the like. In the case where the target service is a communication service, the service target may be configured such that the user growth rate reaches a preset growth rate, or such that the average human consumption amount reaches a preset consumption amount, or the like.

In one possible implementation, the server groups the plurality of service data groups based on the data sensitivity of each target service data in the trusted execution environment, so that the data sensitivity of the target service data in each service data set is the same. The server loads a plurality of data analyzers corresponding to the business targets of the target business in the trusted execution environment, and different data analyzers correspond to different data sensitivities. And the server adopts the plurality of data analyzers to analyze the target business data in the plurality of business data groups in the trusted execution environment according to the corresponding relation between the business data groups and the data analyzers, and outputs the data analysis result of each target business data.

The data analyzer is configured to analyze the target service data to obtain a data analysis result related to the service target, for example, the target service is taken as a financial service, and the bad asset rate of the service target is lower than a preset bad asset rate, so that the data analysis result of analyzing the target service data is an influence on the bad asset rate in aspects of service operation, service stock, service increment, service change condition and the like described by the target service data. Of course, if the business objective of the financial business becomes the newly added loan amount to reach the preset amount, the data analysis result of the analysis of the objective business data is the influence of the aspects of business operation, business stock, business increment, business change condition and the like described by the objective business data on the newly added loan amount. The data analyzer may be regarded as a function set encapsulating a set of data analysis functions, where the data analysis functions are used to analyze target service data according to a mode corresponding to a service target, and the specific data analysis functions are designed by a technician according to actual situations, which is not limited by the embodiment of the present application. In addition, the different data sensitivities correspond to different data analyzers, and the specific correspondence is performed by the technician according to the actual situation, which is not limited by the embodiment of the present application.

In order to explain the above embodiment, a description will be given below of a manner in which the server analyzes target service data in the plurality of service data groups by using the plurality of data analyzers in the trusted execution environment according to the correspondence between the service data groups and the data analyzers, and outputs data analysis results of the respective target service data.

In some embodiments, for any one of the plurality of business data sets, the server determines a target data analyzer corresponding to the business data set from the plurality of data analyzers in the trusted execution environment. And the server adopts the target data analyzer in the trusted execution environment to perform feature extraction on the target service data in the service data set to obtain the data features of the target service data in the service data set. The server adopts the target data analyzer in the trusted execution environment, maps based on the data characteristics of the target service data in the service data group, and outputs the data analysis result of the target service data in the service data group.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.

Fig. 4 is a schematic structural diagram of a data analysis device based on privacy calculation according to an embodiment of the present application, referring to fig. 4, the device includes: an acquisition module 401, a data value determination module 402, a screening module 403, a sensitivity determination module 404, and an analysis module 405.

The acquiring module 401 is configured to acquire a plurality of service data of a target service and data attributes of each service data, where the data attributes include a data source, a data type, and a data acquisition cost, and the data acquisition cost is an amount of resources consumed for acquiring corresponding service data.

The data value determining module 402 is configured to determine the data value of each service data based on the data attribute of each service data.

The filtering module 403 is configured to obtain a plurality of target service data from the plurality of service data based on the data value of each service data, where the data value of the target service data is greater than or equal to a preset data value.

The sensitivity determining module 404 is configured to input the plurality of target service data into a trusted execution environment, where the data sensitivity of each target service data is determined based on each target service data.

And the analysis module 405 is configured to analyze each target service data in the trusted execution environment by adopting a data analysis manner corresponding to the data sensitivity of each target service data, and output a data analysis result of each target service data, where the data analysis result is associated with a service target of the target service.

In a possible implementation manner, the acquiring module 401 is configured to acquire a plurality of initial service data of the target service. And determining the association degree between the plurality of initial service data and the service targets of the target service. And screening a plurality of reference service data from the plurality of initial service data based on the association degree, wherein the reference service data is the initial service data with the association degree with the service target of the target service being greater than or equal to the preset association degree. Desensitizing the plurality of reference service data to obtain the plurality of service data. And acquiring the data attribute of each service data.

In a possible implementation manner, the obtaining module 401 is configured to perform field identification on the plurality of reference service data to obtain field types of a plurality of data fields in each reference service data. And marking data fields of field types belonging to a preset type set in each piece of reference service data as sensitive fields, wherein the preset type set comprises a plurality of preset types. And adjusting the data fields marked as the sensitive fields based on the field types of the data fields marked as the sensitive fields in the plurality of reference service data to obtain the plurality of service data.

In one possible implementation, the data value determining module 402 is configured to determine, for any one of the plurality of service data, a source data value of the service data based on a data source of the service data. Based on the data source and data type of the service data, the type data value and the reference data acquisition cost of the service data are determined.

And determining the cost data value of the service data based on the data type, the data acquisition cost and the reference data acquisition cost of the service data. The data value of the business data is determined based on the source data value, the type data value, and the cost data value of the business data.

In one possible implementation, the data value determining module 402 is configured to determine a reference data value corresponding to a data type of the service data. A ratio between the data acquisition cost of the traffic data and the reference data acquisition cost is determined. Multiplying the ratio by the reference data value to obtain the cost data value of the service data.

In a possible implementation manner, the sensitivity determining module 404 is configured to perform feature extraction on each target service data in the trusted execution environment, so as to obtain a data content feature and a data transmission feature of each target service data. The data content sensitivity and the data transmission risk of each target service data are determined based on the data content characteristics and the data transmission characteristics of each target service data in the trusted execution environment. The data sensitivity of each target business data is determined in the trusted execution environment based on the data content sensitivity and the data transmission risk of each target business data.

In a possible implementation manner, the sensitivity determining module 404 is configured to perform, for any one of the plurality of target service data, feature extraction on the target service data in the trusted execution environment by using a first feature extractor and a second feature extractor, where the first feature extractor focuses on data content with high importance when performing feature extraction, and the second feature extractor focuses on data content related to data propagation when performing feature extraction, and obtain a data content feature and a data broadcast feature of the target service data.

In a possible implementation manner, the analysis module 405 is configured to group the plurality of service data groups based on the data sensitivities of the target service data in the trusted execution environment, where the data sensitivities of the target service data in each service data set are the same. A plurality of data analyzers corresponding to business targets of the target business are loaded in the trusted execution environment, and different data analyzers correspond to different data sensitivities. And analyzing the target business data in the plurality of business data groups by adopting the plurality of data analyzers in the trusted execution environment according to the corresponding relation between the business data groups and the data analyzers, and outputting the data analysis result of each target business data.

In a possible implementation manner, the analysis module 405 is configured to determine, for any one of the plurality of service data sets, a target data analyzer corresponding to the service data set from the plurality of data analyzers in the trusted execution environment. And adopting the target data analyzer in the trusted execution environment to perform feature extraction on the target service data in the service data group to obtain the data features of the target service data in the service data group. And adopting the target data analyzer in the trusted execution environment, mapping based on the data characteristics of the target service data in the service data group, and outputting the data analysis result of the target service data in the service data group.

It should be noted that: in the data analysis device based on privacy calculation provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data analysis device based on privacy calculation provided in the above embodiment and the data analysis method embodiment based on privacy calculation belong to the same concept, and detailed implementation processes of the data analysis device based on privacy calculation are detailed in the method embodiment, and are not described herein.

Fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPUs) 501 and one or more memories 502, where the one or more memories 502 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 501 to implement the methods provided in the foregoing method embodiments. Of course, the server 500 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for inputting and outputting, and the server 500 may also include other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium, for example a memory comprising a computer program executable by a processor to perform the privacy calculation based data analysis method of the above embodiment is also provided. For example, the computer readable storage medium may be Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), compact disc Read-Only Memory (CD-ROM), magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises a program code, which is stored in a computer readable storage medium, from which the processor of the computer device reads the program code, which is executed by the processor, such that the computer device performs the above-mentioned data analysis method based on privacy calculations.

In some embodiments, a computer program according to an embodiment of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site or on multiple computer devices distributed across multiple sites and interconnected by a communication network, where the multiple computer devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements falling within the spirit and principles of the present application.

Claims

1. A data analysis method based on privacy computation, the method comprising:

the determining the data value of each service data based on the data attribute of each service data comprises the following steps:

For any one of the plurality of service data, determining a source data value of the service data based on a data source of the service data; determining the type data value and the reference data acquisition cost of the service data based on the data source and the data type of the service data; determining the cost data value of the service data based on the data type, the data acquisition cost and the reference data acquisition cost of the service data; determining a data value of the business data based on the source data value, the type data value and the cost data value of the business data;

2. The method of claim 1, wherein the obtaining the plurality of service data of the target service and the data attribute of each of the service data comprises:

acquiring a plurality of initial service data of the target service;

Determining the association degree between the plurality of initial service data and the service targets of the target service;

Screening a plurality of reference service data from the plurality of initial service data based on the association degree, wherein the reference service data is the initial service data with the association degree with the service target of the target service being greater than or equal to a preset association degree;

desensitizing the plurality of reference service data to obtain the plurality of service data;

and acquiring the data attribute of each service data.

3. The method of claim 2, wherein said desensitizing said plurality of reference traffic data to obtain said plurality of traffic data comprises:

Performing field identification on the plurality of reference service data to obtain field types of a plurality of data fields in each reference service data;

Marking data fields of which the field types belong to a preset type set in each piece of reference service data as sensitive fields, wherein the preset type set comprises a plurality of preset types;

And adjusting the data fields marked as the sensitive fields based on the field types of the data fields marked as the sensitive fields in the plurality of reference service data to obtain the plurality of service data.

4. The method of claim 1, wherein the determining the cost data value of the business data based on the data type, the data acquisition cost, and the reference data acquisition cost of the business data comprises:

determining a reference data value corresponding to the data type of the service data;

determining a ratio between a data acquisition cost of the service data and the reference data acquisition cost;

And multiplying the ratio by the reference data value to obtain the cost data value of the service data.

5. The method of claim 1, wherein said determining, in said trusted execution environment, a data sensitivity of each of said target business data based on each of said target business data comprises:

Extracting the characteristics of each target service data in the trusted execution environment to obtain the data content characteristics and the data transmission characteristics of each target service data;

determining data content sensitivity and data transmission risk of each target service data based on the data content characteristics and data transmission characteristics of each target service data in the trusted execution environment;

And determining the data sensitivity of each target service data based on the data content sensitivity and the data transmission risk of each target service data in the trusted execution environment.

6. The method according to claim 5, wherein the feature extraction of each of the target service data in the trusted execution environment, to obtain a data content feature and a data transmission feature of each of the target service data, includes:

For any one of the plurality of target service data, a first feature extractor and a second feature extractor are adopted in the trusted execution environment to respectively perform feature extraction on the target service data by using an attention mechanism to obtain data content features and data transmission features of the target service data, the first feature extractor focuses attention on data content with high importance when performing feature extraction, and the second feature extractor focuses attention on data content related to data transmission when performing feature extraction.

7. The method according to claim 1, wherein the analyzing each target service data in the trusted execution environment by adopting a data analysis mode corresponding to a data sensitivity of each target service data, and outputting a data analysis result of each target service data includes:

Grouping the target service data based on the data sensitivity of each target service data in the trusted execution environment to obtain a plurality of service data groups, wherein the data sensitivity of the target service data in each service data group is the same;

loading a plurality of data analyzers corresponding to business targets of the target business in the trusted execution environment, wherein different data analyzers correspond to different data sensitivities;

and analyzing target business data in the plurality of business data groups by adopting the plurality of data analyzers in the trusted execution environment according to the corresponding relation between the business data groups and the data analyzers, and outputting data analysis results of the target business data.

8. The method of claim 7, wherein the analyzing, in the trusted execution environment, the target service data in the plurality of service data groups by using the plurality of data analyzers according to the correspondence between the service data groups and the data analyzers, and outputting the data analysis result of each target service data, includes:

for any one of the plurality of business data sets, determining a target data analyzer corresponding to the business data set from the plurality of data analyzers in the trusted execution environment;

Adopting the target data analyzer in the trusted execution environment to perform feature extraction on target service data in the service data set to obtain data features of the target service data in the service data set;

and adopting the target data analyzer in the trusted execution environment, mapping based on the data characteristics of the target service data in the service data group, and outputting the data analysis result of the target service data in the service data group.

9. The method of any one of claims 1-8, wherein the target service is any one of a financial service, an electronic commerce service, a content recommendation service, and a communication service.