CN113660263A - Data processing method and device, storage medium, user equipment and server - Google Patents

Data processing method and device, storage medium, user equipment and server Download PDF

Info

Publication number
CN113660263A
CN113660263A CN202110936861.0A CN202110936861A CN113660263A CN 113660263 A CN113660263 A CN 113660263A CN 202110936861 A CN202110936861 A CN 202110936861A CN 113660263 A CN113660263 A CN 113660263A
Authority
CN
China
Prior art keywords
dimensional vector
processed
dimensional
target
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110936861.0A
Other languages
Chinese (zh)
Inventor
侯宪龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110936861.0A priority Critical patent/CN113660263A/en
Publication of CN113660263A publication Critical patent/CN113660263A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0869Generation of secret information including derivation or calculation of cryptographic keys or passwords involving random numbers or seeds

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data processing method, a data processing device, a storage medium, user equipment and a server. The method comprises the following steps: acquiring buried point data of a target application program; mapping the buried point data to obtain m-dimensional vectors to be processed; performing disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector; and sending the processed m-dimensional vector to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program. The data privacy can be better.

Description

Data processing method and device, storage medium, user equipment and server
Technical Field
The present application belongs to the field of electronic technologies, and in particular, to a data processing method and apparatus, a storage medium, a user equipment, and a server.
Background
The server usually collects a plurality of buried point data to the client, and performs unbiased estimation based on the buried point data to obtain a corresponding unbiased estimation result. In the related art, after collecting the buried point data, the user side sends the buried point data to the server side for subsequent unbiased estimation. However, in the related art, data privacy is poor.
Disclosure of Invention
The embodiment of the application provides a data processing method and device, a storage medium, user equipment and a server, and can enable data privacy to be good.
In a first aspect, an embodiment of the present application provides a data processing method, applied to a user side, including:
acquiring buried point data of a target application program;
mapping the buried point data to obtain m-dimensional vectors to be processed;
performing disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector;
and sending the processed m-dimensional vector to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program.
In a second aspect, an embodiment of the present application provides a data processing method, which is applied to a server and includes:
acquiring a processed m-dimensional vector sent by a user side, wherein the processed m-dimensional vector is obtained by performing disturbance processing on a to-be-processed m-dimensional vector by the user side, and the to-be-processed m-dimensional vector is obtained by mapping the obtained buried point data of a target application program by the user side;
performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and acquiring second converted m-dimensional vectors corresponding to other clients;
generating a k multiplied by m dimensional iteration matrix according to the first converted m dimensional vector and the second converted m dimensional vector;
and carrying out unbiased estimation on the target application program according to the k x m dimensional iteration matrix to obtain unbiased use times of the target application program.
In a third aspect, an embodiment of the present application provides a data processing apparatus, applied to a user side, including:
the acquisition module is used for acquiring buried point data of the target application program;
the mapping processing module is used for mapping the buried point data to obtain an m-dimensional vector to be processed;
the disturbance processing module is used for carrying out disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector;
and the sending module is used for sending the processed m-dimensional vector to a server, so that the server carries out unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program.
In a fourth aspect, an embodiment of the present application provides a data processing apparatus, which is applied to a server and includes:
the acquisition module is used for acquiring a processed m-dimensional vector sent by a user side, wherein the processed m-dimensional vector is obtained by performing disturbance processing on a to-be-processed m-dimensional vector by the user side, and the to-be-processed m-dimensional vector is obtained by mapping the embedded data of an acquired target application program by the user side;
the conversion processing module is used for performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector and acquiring second converted m-dimensional vectors corresponding to other clients;
a generating module, configured to generate a k × m-dimensional iterative matrix according to the first converted m-dimensional vector and the second converted m-dimensional vector;
and the unbiased estimation module is used for carrying out unbiased estimation on the target application program according to the k multiplied by m dimensional iteration matrix to obtain the unbiased use times of the target application program.
In a fifth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the data processing method provided by the present application.
In a sixth aspect, an embodiment of the present application further provides a user equipment, where the user equipment includes a memory and a processor, and the processor is configured to execute the data processing method provided in the embodiment of the present application by calling a computer program stored in the memory.
In a seventh aspect, an embodiment of the present application further provides a server, where the server includes a memory and a processor, and the processor is configured to execute the data processing method provided in the embodiment of the present application by calling a computer program stored in the memory.
In the embodiment of the application, the data of the buried point of the target application program is obtained; mapping the buried point data to obtain an m-dimensional vector; carrying out disturbance processing on the m-dimensional vector to obtain a processed m-dimensional vector; the processed m-dimensional vector is sent to the server, so that the server carries out unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program, noise is added to buried point data of the target application program at a user end, the noisy data (the processed m-dimensional vector) can be sent to the server when the data needs to be sent to the server, and the data privacy is better compared with a scheme of directly sending original data (the buried point data) to the server.
Drawings
The technical solutions and advantages of the present application will become apparent from the following detailed description of specific embodiments of the present application when taken in conjunction with the accompanying drawings.
Fig. 1 is a schematic flowchart illustrating a data processing method applied to a user side according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method applied to a server according to an embodiment of the present application;
fig. 3 is an interaction diagram of a client and a server according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a data processing apparatus applied to a user side according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data processing apparatus applied to a server according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a user equipment provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.
It can be understood that the data processing method provided in the embodiments of the present application can be applied to a Client or a Server, where the Server is referred to the Client, and the Server and the Client correspond to a C/S (Client-Server) architecture. In practical applications, the server may be, for example, a computer, a general server, a cloud server, a super personal computer, a notebook computer, a tablet computer, or the like. The user terminal may be configured on the user equipment, and the user terminal may be, for example, a mobile phone, a tablet computer, a virtual reality device, an augmented reality device, a wearable device, or other user equipment. The embodiment of the present application does not make much limitation on the specific types of the server and the client.
Referring to fig. 1, fig. 1 is a schematic flowchart of a data processing method applied to a user side according to an embodiment of the present application, where the flow may include:
101. and acquiring buried point data of the target application program.
The target application program is an application program which needs to be subjected to unbiased estimation to obtain unbiased use times. For example, when the number of unbiased uses of any application in a certain application set needs to be determined, any application in the application set can be determined as a target application.
Buried point data is some data that is needed for unbiased estimation. For example, the buried point data of the target application may include: application package name, timestamp, whether first open within a period of time, length of use, etc.
For example, a collection program of the buried point data of the application program may be set on the user side, and when the user uses a certain target application program, the collection program may record the buried point data of the target application program. The buried point data of the target application program collected by the collecting program can be used as a buried point data during the period from the start of using the target application program by the user to the exit of using the target application program by the user. It will be appreciated that by collecting burial point data for users using different target applications, multiple burial point data may be obtained, and by collecting burial point data for users using the same target application multiple times, multiple burial point data may also be obtained.
In the embodiment of the application, each time the acquisition program acquires one buried point data, the user side acquires one buried point data.
102. And mapping the buried point data to obtain an m-dimensional vector to be processed.
For example, after the buried point data is obtained, the user side may perform mapping processing on the buried point data to obtain an m-dimensional vector to be processed, so that preliminary noise addition may be performed on the buried point data.
Wherein m is a positive integer greater than 0. For example, m may be 10, 25, 100, 500, or 1000, and so on.
103. And carrying out disturbance processing on the m-dimensional vector to be processed to obtain the processed m-dimensional vector.
For example, after obtaining the m-dimensional vector to be processed, the user side may perform perturbation processing on the m-dimensional vector to be processed to obtain the processed m-dimensional vector, so that the noise may be further added to the buried point data.
104. And sending the processed m-dimensional vector to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program.
For example, after the processed m-dimensional vector is obtained, the client may send the processed m-dimensional vector to the server, so that the server performs unbiased estimation according to the processed m-dimensional vector to obtain unbiased usage times of the target application program.
In the embodiment of the application, the data of the buried point of the target application program is obtained; mapping the buried point data to obtain an m-dimensional vector; carrying out disturbance processing on the m-dimensional vector to obtain a processed m-dimensional vector; the processed m-dimensional vector is sent to the server, so that the server carries out unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program, noise is added to buried point data of the target application program at a user end, the noisy data (the processed m-dimensional vector) can be sent to the server when the data needs to be sent to the server, and the data privacy is better compared with a scheme of directly sending original data (the buried point data) to the server.
It can be understood that how many pieces of processed m-dimensional vectors can be correspondingly obtained by how many pieces of buried point data are obtained by the user side, the processed m-dimensional vectors can be sent to the server side by the user side every time one processed m-dimensional vector is obtained, and the user side can also send all the processed m-dimensional vectors obtained in a certain time period to the server side in a centralized manner.
It can also be understood that, when the unbiased usage number of any application program in a certain application program set in a certain time period needs to be obtained, multiple user terminals may respectively send the processed m-dimensional vectors corresponding to the buried point data of the application program in the obtained application program set to the service terminal in the certain time period, so that the service terminal performs corresponding processing on the obtained processed m-dimensional vectors to obtain the unbiased usage number of any application program in the certain application program set in the certain time period. All applications, part of applications, all social applications, all game applications, all audio applications, or all video applications, etc. may be included in the set of applications.
In some embodiments, mapping the buried point data to obtain an m-dimensional vector to be processed may include:
(1) selecting a target hash function from the k hash functions;
(2) mapping the buried point data by using a target hash function to obtain a target hash value;
(3) and determining m-dimensional vectors to be processed according to the target hash value, wherein in the m-dimensional vectors to be processed, the value of the element in the first target hash value dimension is 1, and the values of the elements in other dimensions are-1.
The k hash functions may be generated by the server, each hash function may correspond to a hash value, and each hash function may correspond to a different index value. For example, the server may randomly generate k independent hash functions, map the raw data to the space m, determine a corresponding hash value and index value for each of the k hash functions, and issue the k hash functions to the userAnd (4) the client. The space m represents a space range to which the hash function maps the data, and the value is 1,2,3, 4 … … m. k is a positive integer greater than 0. The hash values corresponding to different hash functions may be the same or different, and take the positive integer from 1 to m. For example, assuming that m is 5 and k is 5, the hash function h1The corresponding hash value may be 2, hash function h2The corresponding hash value may be 3, hash function h5The corresponding hash value may be 2. The index values corresponding to different hash functions are different and take positive integers from 1 to k. For example, assume that k is 4, hash function h1The corresponding index value may be 1, hash function h2The corresponding index value may be 2, hash function h3The corresponding hash value may be 3, hash function h4The corresponding index value may be 4.
After the target hash value is obtained, the user side can determine the m-dimensional vector to be processed according to the target hash value. Where the target hash value is a positive integer from 1 to m, for example, assuming that m is 20, the target hash value may be 2, 5, or 10, and so on. In the m-dimensional vector to be processed, the value of the first target hash value dimension element is 1, and the values of other dimension elements are-1. For example, assuming that m is 5 and the target hash value is 1, the m-dimensional vector is [1, -1, -1, -1 ]. For another example, assuming that m is 4 and the target hash value is 2, the m-dimensional vector is [ -1,1, -1, -1 ].
In some embodiments, performing perturbation processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector may include:
performing disturbance processing on the value of each dimension element of the m-dimensional vector to be processed according to a target probability to obtain a processed m-dimensional vector, wherein if the value of the dimension element of the m-dimensional vector to be processed is 1, the value of the dimension element of the m-dimensional vector to be processed is inverted to-1 according to the target probability, and if the value of the dimension element of the m-dimensional vector to be processed is-1, the value of the dimension element of the m-dimensional vector to be processed is inverted to 1 according to the target probability, wherein the target probability is as follows:
Figure BDA0003213560140000061
wherein epsilon is a preset privacy budget.
The epsilon is generally obtained by the posteriori, specifically, the availability of the downstream data corresponding to different epsilon values is found in an experimental mode, and the value of the epsilon is weighed based on the data availability and the privacy protection degree. For example, ε may be 1, 1.3, 2, or 3, etc.
For example, when the value of an element of a certain dimension of the m-dimensional vector to be processed is 1, the user side may turn the value of the element of the dimension to-1 according to the target probability; when the value of an element of a certain dimension of the m-dimensional vector to be processed is-1, the user side can turn the value of the element of the dimension into 1 according to the target probability, so that the processed m-dimensional vector is finally obtained.
For example, assuming that the m-dimensional vector to be processed is [1, -1, -1, -1, -1], the processed m-dimensional vector may be [1,1, -1, -1,1 ]; the processed m-dimensional vector may also be [ -1, -1, -1,1, -1 ].
In some embodiments, each hash function corresponds to a different index value, and the sending the processed m-dimensional vector to the server, so that the server performs unbiased estimation on the processed m-dimensional vector to obtain unbiased usage times of the target application program, may include:
(1) acquiring a target index value corresponding to the target hash function;
(2) and sending the processed m-dimensional vector and the target index value to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector and the target index value to obtain unbiased use times of the target application program.
For example, the user side may further obtain a target index value corresponding to the target hash function. For example, assuming that the index value corresponding to the target hash function is 2, the target index value is 2.
And then, the client can send the processed m-dimensional vector and the target index value to the server, so that the server performs unbiased estimation according to the processed m-dimensional vector and the target index value to obtain unbiased use times of the target application program.
It can be understood that how many buried point data are obtained by the user side, how many processed m-dimensional vectors and a target index value corresponding to each processed m-dimensional vector can be correspondingly obtained, the user side can send the processed m-dimensional vectors and the target index values corresponding to the processed m-dimensional vectors to the server side every time one processed m-dimensional vector and the target index value corresponding to the processed m-dimensional vector are obtained, and the user side can also send all the processed m-dimensional vectors obtained in a certain time period and the target index values corresponding to each processed m-dimensional vector to the server side in a centralized manner.
It can also be understood that, when the unbiased usage number of any application program in a certain application program set in a certain time period needs to be obtained, multiple user terminals may respectively send the processed m-dimensional vector corresponding to the obtained buried point data of the application program in the application program set and the target index value corresponding to the processed m-dimensional vector to the service terminal in the certain time period, so that the service terminal performs corresponding processing on the obtained processed m-dimensional vector and the target index value corresponding to the processed m-dimensional vector, and obtains the unbiased usage number of any application program in the certain application program set in the certain time period. All applications, part of applications, all social applications, all game applications, all audio applications, or all video applications, etc. may be included in the set of applications.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data processing method applied to a server according to an embodiment of the present application, where the flowchart may include:
201. and acquiring the processed m-dimensional vector sent by the user side.
The processed m-dimensional vector is obtained by performing disturbance processing on the m-dimensional vector to be processed by the user side, and the m-dimensional vector to be processed is obtained by performing mapping processing on the obtained buried point data of the target application program by the user side. m is a positive integer greater than 0. For example, m may be 10, 25, 100, 500, or 1000, and so on. The target application program is an application program which needs to be subjected to unbiased estimation to obtain unbiased use times. For example, when the number of unbiased uses of any application in a certain application set needs to be determined, any application in the application set can be determined as a target application.
For example, the client a1 may obtain the processed m-dimensional vector through the processes 101 to 104, and send the processed m-dimensional vector to the server. The server receives the processed m-dimensional vector sent by the client a 1.
202. And performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and acquiring second converted m-dimensional vectors corresponding to other clients.
After obtaining the processed m-dimensional vector sent by the user a1, the server may perform a perturbation transformation on the processed m-dimensional vector to obtain a first transformed m-dimensional vector.
In some embodiments, each time the client a1 obtains a processed m-dimensional vector, the processed m-dimensional vector is sent to the server, so that the server can perform a perturbation transformation process on the processed m-dimensional vector to obtain a first transformed m-dimensional vector.
In some embodiments, the client a1 may collectively send the processed m-dimensional vectors obtained in a certain time period to the server, so that the server may perform perturbation transformation on the processed m-dimensional vectors simultaneously or sequentially to obtain a plurality of first transformed m-dimensional vectors.
Other clients, such as the client a2, A3, a4 … … An, etc., can also obtain the processed m-dimensional vector through the processes 101 to 105, and send the processed m-dimensional vector to the server. The server side receives the processed m-dimensional vectors sent by other user sides. Wherein n is a positive integer greater than 1, for example, n may be 10, 100, 500, or the like.
After the processed m-dimensional vectors sent by other clients are obtained, the server may perform a perturbation transformation on the processed m-dimensional vectors to obtain second transformed m-dimensional vectors. For example, after obtaining the processed m-dimensional vector sent by the client a2, the server may perform a perturbation transformation process on the processed m-dimensional vector to obtain a second transformed m-dimensional vector.
In some embodiments, each time the other user side obtains one processed m-dimensional vector, the processed m-dimensional vector is sent to the server side, so that the server side can perform disturbance conversion processing on the processed m-dimensional vector to obtain a second converted m-dimensional vector.
In some embodiments, the other user side may collectively send the plurality of processed m-dimensional vectors obtained in a certain time period to the server side, so that the server side may perform the disturbance transformation processing on the plurality of processed m-dimensional vectors simultaneously or sequentially, respectively, to obtain a plurality of second transformed m-dimensional vectors.
203. And generating a k multiplied by m dimensional iteration matrix according to the first converted m dimensional vector and the second converted m dimensional vector.
For example, after obtaining the first converted m-dimensional vector and the second converted m-dimensional vector, the server may generate a k × m-dimensional iterative matrix according to the first converted m-dimensional vector and the second converted m-dimensional vector. Wherein k is a positive integer greater than 0.
204. And carrying out unbiased estimation on the target application program according to the k multiplied by m dimension iteration matrix to obtain the unbiased use times of the target application program.
For example, after the k × m dimensional iteration matrix is obtained, the server may perform unbiased estimation on the target application program according to the k × m dimensional iteration matrix to obtain unbiased usage times of the target application program.
In the embodiment of the application, a processed m-dimensional vector sent by a user side is obtained, wherein the processed m-dimensional vector is obtained by performing disturbance processing on the m-dimensional vector to be processed by the user side, and the m-dimensional vector to be processed is obtained by mapping the obtained buried point data of a target application program by the user side; performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and acquiring second converted m-dimensional vectors corresponding to other clients; generating a k multiplied by m dimensional iteration matrix according to the first converted m dimensional vector and the second converted m dimensional vector; the unbiased estimation is carried out on the target application program according to the kXm dimensional iteration matrix to obtain the unbiased use times of the target application program, and because the noise addition of the buried point data of the target application program is realized at the user side, when the user side sends the data to the server side, the user side can send the data (processed m dimensional vector) subjected to the noise addition to the server side, and compared with a scheme of directly sending the original data (buried point data) to the server side, the data privacy is better.
In some embodiments, the mapping of the buried point data by the m-dimensional vector to be processed using the target hash function selected from the k hash functions, and performing the disturbance conversion on the processed m-dimensional vector to obtain the first converted m-dimensional vector may include:
performing disturbance conversion processing on the processed m-dimensional vector according to the following formula to obtain a first converted m-dimensional vector;
Figure BDA0003213560140000101
wherein x isiIs the value of the ith dimension element of the first converted m-dimension vector, k is the number of hash functions, epsilon is the preset privacy budget, viIs the value of the ith dimension element of the processed m-dimension vector.
The m-dimensional vector to be processed is obtained by the user side according to a target hash value obtained by mapping the buried point data by using a target hash function selected from the k hash functions. k hash functions may be generated by the server, each hash function may correspond to a hash value, and each hash function may correspond to a different index value. For example, the server may randomly generate k independent hash functions, map the raw data to the space m, determine a corresponding hash value and index value for each of the k hash functions, and send the k hash functions to the client. The space m represents a space range to which the hash function maps the data, and the value is 1,2,3, 4 … … m. k is a positive integer greater than 0. The hash values corresponding to different hash functions may be the same or different, and take the positive integer from 1 to m. For example, assuming that m is 5 and k is 5, the hash function h1The corresponding hash value may be 2, hash function h2The corresponding hash value may be 3, hash function h5The corresponding hash value may be 2. The index values corresponding to different hash functions are different and take positive integers from 1 to k. For example, assume that k is 4, hash function h1The corresponding index value may be 1, hash function h2The corresponding index value may be 2, hash function h3The corresponding hash value may be 3, hash function h4The corresponding index value may be 4. This epsilon is usually obtained by a posteriori, in particular by experimental meansAnd finding the availability of the downstream data corresponding to different epsilon values, and weighing the value of epsilon based on the data availability and the privacy protection degree. For example, ε may be 1, 1.3, 2, or 3, etc.
For example, after obtaining the processed m-dimensional vector, the server may use a formula to obtain the value of the ith element of the processed m-dimensional vector
Figure BDA0003213560140000111
And obtaining the value of the ith element of the first converted m-dimensional vector, thereby finally obtaining the first converted m-dimensional vector.
It can be understood that the server may also obtain the second converted m-dimensional vector according to the processed m-dimensional vectors sent by other clients in the same manner as described above.
In some embodiments, each hash function corresponds to a different index value, and generating a k × M-dimensional iterative matrix from the first converted M-dimensional vector and the second converted M-dimensional vector includes:
(1) acquiring a k multiplied by m dimension summary matrix, a first target index value corresponding to a target hash function and a second target index value corresponding to a hash function corresponding to a second converted m dimension vector, wherein the values of elements in the k multiplied by m dimension summary matrix are all 0;
(2) and performing iterative processing on the k multiplied by m dimensional summary matrix according to the first converted m dimensional vector, the first target index value, the second converted m dimensional vector and the second target index value to generate a k multiplied by m dimensional iterative matrix.
For example, the server may obtain a k × m-dimensional summary matrix, a first target index value corresponding to a target hash function, and a second target index value corresponding to a hash function corresponding to a second converted m-dimensional vector; and performing iterative processing on the k × m dimensional summary matrix according to the first converted m dimensional vector, the first target index value, the second converted m dimensional vector and the second target index value to generate a k × m dimensional iterative matrix.
The hash function corresponding to the target hash function and the second converted m-dimensional vector may be one of k hash functions, and each of the k hash functions is associated with a different index valueAnd (7) corresponding. The index value may be a positive integer of 1 to k. Assuming the target hash function as hash function h2Hash function h2The corresponding index value is 2, and the first target index value is 2. Assuming that the hash function corresponding to the m-dimensional vector after the second conversion is hash function h5Hash function h5The corresponding index value is 5, and the second target index value is 5. The client side generates a k × m dimensional iterative matrix according to the processed m dimensional vector obtained by which hash function, and the server side generates a k × m dimensional iterative matrix according to the index value corresponding to which hash function. The values of the elements in the k × m dimensional summary matrix are all 0.
In some embodiments, iteratively processing the k × m-dimensional summary matrix according to the first transformed m-dimensional vector, the first target index value, the second transformed m-dimensional vector, and the second target index value to generate a k × m-dimensional iterative matrix may include:
the values of the elements of each dimension of the first transformed m-dimensional vector are added to the values of the corresponding elements of the first target index value row of the k x m-dimensional summary matrix, and the values of the elements of each dimension of the second transformed m-dimensional vector are added to the values of the corresponding elements of the second target index value row of the k x m-dimensional summary matrix, generating a k x m-dimensional iterative matrix.
For example, assume that the server obtains the first converted m-dimensional vectors v11 and v12 corresponding to the user terminal a1, and the second converted m-dimensional vectors v21, v22 and v23 corresponding to other user terminals (e.g., the user terminal a2 and the user terminal A3). The first target index value corresponding to the first converted m-dimensional vector v11 is j11, the first target index value corresponding to the first converted m-dimensional vector v12 is j12, the second index value j21 corresponding to the second converted m-dimensional vector v21, the second index value j22 corresponding to the second converted m-dimensional vector v22, the second index value j23 corresponding to the second converted m-dimensional vector v23, m is 3, the server may add the value of the 1 st dimensional element of v11 to the value of the 1 st column element of the j11 th row of the k × m-dimensional summary matrix, add the value of the 2 nd dimensional element of v11 to the value of the 2 nd column element of the j11 th row of the k × m-dimensional summary matrix, add the value of the 3 rd dimensional element of v11 to the row 3 rd column element of the k × m-dimensional summary matrix, and add the value of the 3 rd dimensional element of v11 to the corresponding row of the k × m 11 th column element of the k × m-dimensional summary matrix in the same manner, and add the value of the corresponding row of the k × m 12 of the k × m-dimensional summary matrix, the values of the elements of each dimension of v21 are added to the values of the corresponding elements of row j21 of the k × m dimensional summary matrix, v22 is added to the values of the corresponding elements of row j22 of the k × m dimensional summary matrix, v23 is added to the values of the corresponding elements of row j23 of the k × m dimensional summary matrix, generating a k × m dimensional iterative matrix.
In some embodiments, performing unbiased estimation on the target application according to the k × m-dimensional iteration matrix to obtain unbiased usage times of the target application may include:
and carrying out unbiased estimation on the target application program according to the kXM dimension iteration matrix, the k hash functions and the index value and the hash value corresponding to each hash function to obtain unbiased use times of the target application program.
For example, the server may obtain k hash functions, a hash value and an index value corresponding to each hash function, and perform unbiased estimation on the target application program according to the kxm-dimensional iteration matrix, the k hash functions, and the hash value and the index value corresponding to each hash function, so as to obtain unbiased usage times of the target application program.
In some embodiments, performing unbiased estimation on the target application according to the k × m-dimensional iteration matrix, the k hash functions, and the index value and the hash value corresponding to each hash function to obtain unbiased usage times of the target application may include:
according to the kXM dimension iteration matrix, the k hash functions and the index value and hash value corresponding to each hash function, carrying out unbiased estimation on the target application program according to the following formula to obtain unbiased use times of the target application program;
Figure BDA0003213560140000131
wherein F (APP) represents the unbiased use times of the target application program, m represents the dimensionality of the processed m-dimensional vector, k represents the number of k hash functions, hj(APP) represents that the jth hash function in the k hash functions is used for carrying out mapping processing on the target application program to obtain a hashAnd j represents an index value corresponding to the jth hash function, the value is a positive integer from 1 to k, N represents the total number of processed M-dimensional vectors sent by the user side and other user sides, and M represents a k × M-dimensional iteration matrix.
For example, after obtaining the k × m dimensional transformation matrix, the server may use a formula according to the k × m dimensional transformation matrix corresponding to the target application, k hash functions, and the hash value and the index value corresponding to each hash function
Figure BDA0003213560140000132
And carrying out unbiased estimation on the target application program to obtain unbiased use times of the target application program.
Referring to fig. 3, fig. 3 is an interaction diagram of a client and a server according to an embodiment of the present disclosure.
Taking the unbiased number of times of use of an arbitrary application in a certain application set within a certain period of time, where m is 5, as an example, first, the server randomly generates k independent hash functions h that map the original data to the space m1、h2、h3……hkThe server side can use a hash function h1、h2、h3……hkSending to a plurality of user terminals, such as user terminal a1, a2, A3 … … An. Wherein the hash function h1、h2、h3……hkThe hash values are respectively positive integers from 1 to m, and the hash function h1、h2、h3……hkThe index values are 1,2,3 … … k. n is greater than 5000.
At time t1 in the period, the user C1 starts to use the application W1 in the application set through the user terminal a1, and at time t2 in the period, the user C1 quits to use the application W1, so that the user terminal a1 can collect the data of the application W1 from time t1 to time t2 as a piece of embedded data through the collection program. At time t3 in the time period, the user C1 starts to use the application W1 again through the user terminal a1, and at time t4 in the time period, the user C1 quits using the application W1 again, so that the user terminal a1 can collect the data of the embedded point of the application W1 from time t3 to time t4 as data of the embedded point through the collection program. At time t5 in the period, the user C1 starts to use the application W2 in the application set through the user terminal a1, and at time t6 in the period, the user C1 quits to use the application W2, so that the user terminal a1 can collect the data of the application W2 from time t5 to time t6 as a piece of embedded data through the collection program.
Taking a certain buried point data as an example, the user terminal a1 can use the hash function h after obtaining the buried point data1、h2、h3……hkRandomly selects a hash function, e.g. assuming that hash function h is selected3,h3The corresponding hash value is 2, and the hash function h can be determined3The corresponding index value is 3, and the hash function h can be used3Mapping the buried point data to obtain a m-dimensional vector [ -1,1, -1, -1, -1] to be processed]. Subsequently, the user end a1 can follow the target probability
Figure BDA0003213560140000141
To-be-processed m-dimensional vector [ -1,1, -1, -1, -1]Performing perturbation processing to obtain a processed m-dimensional vector, which may be [ -1,1,1, -1]. The client A1 can then process the m-dimensional vector [ -1,1,1, -1]And the target index value "3" is sent to the server. By analogy, the client a1 may also obtain the m-dimensional vectors and their corresponding index values after the other processing according to the above manner, and send the m-dimensional vectors and their corresponding index values to the server.
Other clients, such as the client a2 and the client A3 … …, the client An may also obtain the corresponding buried point data in the above manner, and correspondingly obtain the corresponding processed m-dimensional vector and the corresponding index value thereof, it can be understood that how many processed m-dimensional vectors and the corresponding index values thereof can be obtained by how many buried point data are obtained by each client.
After the server receives the processed m-dimensional vectors and their corresponding index values sent by the ue a1, the ue a2, and the ue A3 … …, one of the processed m-dimensional vectors and its corresponding index value of the ue a1 is usedTaking the index "3" as an example, first, the server end follows
Figure BDA0003213560140000142
Performing disturbance conversion on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and then adding, by the server, the value of each dimension element of the first converted m-dimensional vector to the value of the corresponding element in the 3 rd row in the k × m-dimensional summary matrix, thereby completing iteration of the k × m-dimensional summary matrix, and so on, and obtaining, by the server, other converted m-dimensional vectors (including the first converted m-dimensional vector corresponding to the user a1 or the second converted m-dimensional vectors corresponding to other users (e.g., the user a2 and the user A3 … … user An)), and adding the value of each dimension element of the other converted m-dimensional vectors to the value of the corresponding element in the corresponding row in the k × m-dimensional summary matrix, when the user a1, the user a2, and the user A3 … … obtain all the converted m-dimensional vectors corresponding to the user a period (including the first converted m-dimensional vector and the second converted m-dimensional vector to the corresponding row in the k × m-dimensional summary matrix) Quantity) is added to the values of the corresponding elements of the corresponding rows in the k × m dimensional summary matrix, and then the server generates a k × m dimensional iterative matrix. Finally, the server can pass the formula
Figure BDA0003213560140000151
And obtaining the unbiased browsing times of any application program in the application program set in the time period.
It should be noted that, the data processing method provided in the embodiment of the present application can achieve the same downstream data availability as that of the central-line differential privacy algorithm under the same privacy budget, and experiments prove that, for obtaining the unbiased number of times of use of any application program in an application program set including 10 different application programs in a certain period, when the population number is >5k, m is >500, and k is >5, the variance of ∈ 1 is ═ O (10), and the data processing method provided in the embodiment of the present application is 1/100 of the corresponding differential privacy algorithm.
For example, assuming that 10 experiments were performed,
Figure BDA0003213560140000152
wherein O (10) represents a variance, F (APP)i) The number of unbiased use times of a certain application program obtained in the ith time by the data processing method provided by the embodiment of the application is shown, and i is 1,2, 3.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a data processing device applied to a user side according to an embodiment of the present application. The data processing apparatus includes: an obtaining module 301, a mapping processing module 302, a perturbation processing module 303 and an unbiased estimation module 304.
The obtaining module 301 is configured to obtain buried point data of a target application program.
And the mapping processing module 302 is configured to perform mapping processing on the buried point data to obtain an m-dimensional vector to be processed.
And the disturbance processing module 303 is configured to perform disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector.
A sending module 304, configured to send the processed m-dimensional vector to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector, to obtain unbiased usage times of the target application program.
In some embodiments, the mapping processing module 302 may be configured to: selecting a target hash function from the k hash functions; mapping the buried point data by using the target hash function to obtain a target hash value; and determining m-dimensional vectors to be processed according to the target hash value, wherein in the m-dimensional vectors to be processed, the value of the element in the dimension of the target hash value is 1, and the values of the elements in other dimensions are-1.
In some embodiments, the perturbation processing module 303 may be configured to: disturbing the value of each dimension element of the m-dimensional vector to be processed according to the target probability to obtain the processed m-dimensional vector, wherein if the value of the corresponding dimension element of the m-dimensional vector to be processed is 1, the value of the corresponding dimension element of the m-dimensional vector to be processed is turned to-1 according to the target probability, and if the corresponding dimension element of the m-dimensional vector to be processed is-1If the value of (b) is-1, the value of the element of the corresponding dimension of the m-dimensional vector to be processed is turned to 1 according to a target probability, wherein the target probability is:
Figure BDA0003213560140000161
wherein epsilon is a preset privacy budget.
In some embodiments, each hash function corresponds to a different index value, and the sending module 304 may be configured to: acquiring a target index value corresponding to the target hash function; and sending the processed m-dimensional vector and the target index value to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector and the target index value to obtain unbiased use times of the target application program.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a data processing device applied to a user side according to an embodiment of the present application. The data processing apparatus includes: the device comprises an acquisition module 401, a conversion processing module 402, a generation module 403 and an unbiased estimation module 404.
The obtaining module 401 is configured to obtain a processed m-dimensional vector sent by a user side, where the processed m-dimensional vector is obtained by performing, by the user side, a perturbation process on a to-be-processed m-dimensional vector, and the to-be-processed m-dimensional vector is obtained by performing, by the user side, a mapping process on obtained buried point data of a target application program.
A conversion processing module 402, configured to perform a disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and obtain a second converted m-dimensional vector corresponding to another user side.
A generating module 403, configured to generate a k × m-dimensional iterative matrix according to the first converted m-dimensional vector and the second converted m-dimensional vector.
An unbiased estimation module 404, configured to perform unbiased estimation on the target application program according to the k × m-dimensional iteration matrix, so as to obtain unbiased usage times of the target application program.
In some embodiments, the m-dimensional vector to be processed maps the buried point data by using a target hash function selected from k hash functionsThe transform processing module 402 may be configured to: performing disturbance conversion processing on the processed m-dimensional vector according to the following formula to obtain a first converted m-dimensional vector;
Figure BDA0003213560140000171
wherein x isiIs the value of the ith dimension element of the first converted m-dimension vector, k is the number of hash functions, epsilon is the preset privacy budget, viIs the value of the ith dimension element of the processed m-dimension vector.
In some embodiments, each hash function corresponds to a different index value, and the generating module 403 may be configured to: acquiring a k × m-dimensional summary matrix, a first target index value corresponding to the target hash function and a second target index value corresponding to the hash function corresponding to the second converted m-dimensional vector, wherein values of elements in the k × m summary matrix are all 0; and performing iterative processing on the k × m-dimensional summary matrix according to the first converted m-dimensional vector, the first target index value, the second converted m-dimensional vector and the second target index value to generate a k × m-dimensional iterative matrix.
In some embodiments, the unbiased estimation module 404 may be configured to: and carrying out unbiased estimation on the target application program according to the kxm dimensional iteration matrix, the k hash functions and the index value and the hash value corresponding to each hash function to obtain unbiased use times of the target application program.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the unbiased estimation method provided by the embodiment.
The embodiment of the application provides user equipment, which comprises a memory and a processor, wherein the processor is used for executing the data processing method provided by the embodiment of the application by calling a computer program stored in the memory.
The embodiment of the application provides a server, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the processor is used for executing the data processing method provided by the embodiment of the application by calling the computer program stored in the memory.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a user equipment according to an embodiment of the present disclosure.
The user equipment 500 may include components such as a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, a communication unit 503, a power supply 504, an input unit 505, and a display unit 506. Those skilled in the art will appreciate that the user equipment configuration shown in fig. 6 does not constitute a limitation of the user equipment and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 501 is a control center of the user equipment, connects various parts of the entire user equipment by using various interfaces and lines, and performs various functions of the user equipment and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring of the user equipment. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.
The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the user equipment, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.
The communication unit 503 may be used for receiving and transmitting signals during information transmission and reception, and in particular, the communication unit 503 receives signals transmitted by a terminal and provides the signals to one or more processors 501 for processing.
The user device also includes a power supply 504 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 501 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 504 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The user device may further include an input unit 505, and the input unit 505 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The user device may also include a display unit 506, and the display unit 506 may be used to display information input by the user or provided to the user, as well as various graphical user interfaces of the user device, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 506 may include a Display panel, and optionally, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
Specifically, in this embodiment, the processor 501 in the user equipment loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application program stored in the memory 502, thereby implementing various functions as follows:
acquiring buried point data of a target application program;
mapping the buried point data to obtain m-dimensional vectors to be processed;
performing disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector;
and sending the processed m-dimensional vector to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program.
In some embodiments, when the processor 501 performs the mapping process on the buried point data to obtain an m-dimensional vector to be processed, the following steps may be performed: selecting a target hash function from the k hash functions; mapping the buried point data by using the target hash function to obtain a target hash value; and determining m-dimensional vectors to be processed according to the target hash value, wherein in the m-dimensional vectors to be processed, the value of the element in the dimension of the target hash value is 1, and the values of the elements in other dimensions are-1.
In some embodiments, when the processor 501 performs the perturbation processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector, the processor may perform: and performing disturbance processing on the value of each dimension element of the m-dimensional vector to be processed according to a target probability to obtain a processed m-dimensional vector, wherein if the value of the dimension element of the m-dimensional vector to be processed is 1, the value of the dimension element of the m-dimensional vector to be processed is inverted to-1 according to the target probability, and if the value of the dimension element of the m-dimensional vector to be processed is-1, the value of the dimension element of the m-dimensional vector to be processed is inverted to 1 according to the target probability, wherein the target probability is as follows:
Figure BDA0003213560140000201
wherein epsilon is a preset privacy budget.
In some embodiments, each hash function corresponds to a different index value, and the processor 501 performs the sending of the processed m-dimensional vector to the server, so that the server performs unbiased estimation according to the processed m-dimensional vector, and when obtaining unbiased usage times of the target application program, may perform: acquiring a target index value corresponding to the target hash function; and sending the processed m-dimensional vector and the target index value to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector and the target index value to obtain unbiased use times of the target application program.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.
The server 600 may include components such as a processor 601 of one or more processing cores, memory 602 of one or more computer-readable storage media, a communication unit 603, a power supply 604, an input unit 605, and a display unit 606. Those skilled in the art will appreciate that the server architecture shown in FIG. 7 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 601 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the server. Optionally, processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.
The memory 602 may be used to store software programs and modules, and the processor 601 executes various functional applications and data processing by operating the software programs and modules stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 602 may also include a memory controller to provide the processor 601 with access to the memory 602.
The communication unit 603 may be used for receiving and transmitting signals during transmitting and receiving information, and particularly, the communication unit 603 receives signals transmitted by a terminal and processes the signals by one or more processors 601.
The server also includes a power supply 604 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 601 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 604 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The server may also include an input unit 605, and the input unit 605 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The server may also include a display unit 606, and the display unit 606 may be used to display information input by the user or provided to the user, as well as various graphical user interfaces of the server, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 606 may include a Display panel, and optionally, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
Specifically, in this embodiment, the processor 601 in the server loads the executable file corresponding to the process of one or more application programs into the memory 602 according to the following instructions, and the processor 601 runs the application programs stored in the memory 602, thereby implementing various functions as follows:
acquiring a processed m-dimensional vector sent by a user side, wherein the processed m-dimensional vector is obtained by performing disturbance processing on a to-be-processed m-dimensional vector by the user side, and the to-be-processed m-dimensional vector is obtained by mapping the obtained buried point data of a target application program by the user side;
performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and acquiring second converted m-dimensional vectors corresponding to other clients;
generating a k multiplied by m dimensional iteration matrix according to the first converted m dimensional vector and the second converted m dimensional vector;
and carrying out unbiased estimation on the target application program according to the k x m dimensional iteration matrix to obtain unbiased use times of the target application program.
In some embodiments, the m-dimensional vector to be processed is obtained by mapping the buried point data by using a target hash function selected from k hash functions, and the processor 601 performs the perturbation conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and may perform: performing disturbance conversion processing on the processed m-dimensional vector according to the following formula to obtain a first converted m-dimensional vector;
Figure BDA0003213560140000221
wherein x isiIs the value of the ith dimension element of the first converted m-dimension vector, k is the number of hash functions, epsilon is the preset privacy budget, viIs the value of the ith dimension element of the processed m-dimension vector.
In some embodiments, each hash function corresponds to a different index value, and the processor 601 may perform the following steps when performing the k × M-dimensional iterative matrix generation according to the first converted M-dimensional vector and the second converted M-dimensional vector: acquiring a k × m-dimensional summary matrix, a first target index value corresponding to the target hash function and a second target index value corresponding to the hash function corresponding to the second converted m-dimensional vector, wherein values of elements in the k × m summary matrix are all 0; and performing iterative processing on the k × m-dimensional summary matrix according to the first converted m-dimensional vector, the first target index value, the second converted m-dimensional vector and the second target index value to generate a k × m-dimensional iterative matrix.
In some embodiments, the processor 601, when performing the unbiased estimation on the target application according to the k × m-dimensional iteration matrix to obtain the unbiased number of times of use of the target application, may perform: and carrying out unbiased estimation on the target application program according to the kxm dimensional iteration matrix, the k hash functions and the index value and the hash value corresponding to each hash function to obtain unbiased use times of the target application program.
In the above embodiments, "a plurality" means "two" or more than "two".
In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the data processing method, and are not described herein again.
The data processing apparatus provided in the embodiment of the present application and the data processing method in the above embodiment belong to the same concept, and any method provided in the embodiment of the data processing method may be run on the data processing apparatus, and a specific implementation process thereof is described in the embodiment of the data processing method in detail, and is not described herein again.
It should be noted that, for the data processing method described in the embodiment of the present application, it can be understood by those skilled in the art that all or part of the process of implementing the data processing method described in the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and during the execution, the process of the embodiment of the data processing method can be included. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a random access Memory (RMM), and the like.
In the data processing apparatus according to the embodiment of the present application, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.
The data processing method, the data processing device, the storage medium, the user equipment and the server provided by the embodiments of the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiments is only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (13)

1. A data processing method is applied to a user side and is characterized by comprising the following steps:
acquiring buried point data of a target application program;
mapping the buried point data to obtain m-dimensional vectors to be processed;
performing disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector;
and sending the processed m-dimensional vector to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program.
2. The data processing method of claim 1, wherein the mapping the buried point data to obtain an m-dimensional vector to be processed comprises:
selecting a target hash function from the k hash functions;
mapping the buried point data by using the target hash function to obtain a target hash value;
and determining m-dimensional vectors to be processed according to the target hash value, wherein in the m-dimensional vectors to be processed, the value of the element in the dimension of the target hash value is 1, and the values of the elements in other dimensions are-1.
3. The data processing method according to claim 2, wherein the performing the perturbation process on the m-dimensional vector to be processed to obtain a processed m-dimensional vector comprises:
and performing disturbance processing on the value of each dimension element of the m-dimensional vector to be processed according to a target probability to obtain a processed m-dimensional vector, wherein if the value of the dimension element of the m-dimensional vector to be processed is 1, the value of the dimension element of the m-dimensional vector to be processed is inverted to-1 according to the target probability, and if the value of the dimension element of the m-dimensional vector to be processed is-1, the value of the dimension element of the m-dimensional vector to be processed is inverted to 1 according to the target probability, wherein the target probability is as follows:
Figure FDA0003213560130000011
wherein epsilon is a preset privacy budget.
4. The data processing method according to claim 2, wherein each hash function corresponds to a different index value, and the sending of the processed m-dimensional vector to a server enables the server to perform unbiased estimation according to the processed m-dimensional vector to obtain unbiased usage times of the target application program, includes:
acquiring a target index value corresponding to the target hash function;
and sending the processed m-dimensional vector and the target index value to a server, so that the server performs unbiased estimation according to the processed m-dimensional vector and the target index value to obtain unbiased use times of the target application program.
5. A data processing method is applied to a server side and is characterized by comprising the following steps:
acquiring a processed m-dimensional vector sent by a user side, wherein the processed m-dimensional vector is obtained by performing disturbance processing on a to-be-processed m-dimensional vector by the user side, and the to-be-processed m-dimensional vector is obtained by mapping the obtained buried point data of a target application program by the user side;
performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector, and acquiring second converted m-dimensional vectors corresponding to other clients;
generating a k multiplied by m dimensional iteration matrix according to the first converted m dimensional vector and the second converted m dimensional vector;
and carrying out unbiased estimation on the target application program according to the k x m dimensional iteration matrix to obtain unbiased use times of the target application program.
6. The method for determining the number of times of using the application program according to claim 5, wherein the m-dimensional vector to be processed is obtained by mapping the buried point data by using a target hash function selected from k hash functions, and the performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector comprises:
performing disturbance conversion processing on the processed m-dimensional vector according to the following formula to obtain a first converted m-dimensional vector;
Figure FDA0003213560130000021
wherein x isiIs the value of the ith dimension element of the first converted m-dimension vector, k is the number of hash functions, epsilon is the preset privacy budget, viIs the value of the ith dimension element of the processed m-dimension vector.
7. The method of claim 6, wherein each hash function corresponds to a different index value, and wherein generating a k x M-dimensional iterative matrix according to the first converted M-dimensional vector and the second converted M-dimensional vector comprises:
acquiring a k × m-dimensional summary matrix, a first target index value corresponding to the target hash function and a second target index value corresponding to the hash function corresponding to the second converted m-dimensional vector, wherein values of elements in the k × m summary matrix are all 0;
and performing iterative processing on the k × m-dimensional summary matrix according to the first converted m-dimensional vector, the first target index value, the second converted m-dimensional vector and the second target index value to generate a k × m-dimensional iterative matrix.
8. The method for determining the number of times of using the application program according to claim 7, wherein the unbiased estimation of the target application program according to the kxm-dimensional iteration matrix to obtain the number of times of using the target application program includes:
and carrying out unbiased estimation on the target application program according to the kxm dimensional iteration matrix, the k hash functions and the index value and the hash value corresponding to each hash function to obtain unbiased use times of the target application program.
9. A data processing device applied to a user side is characterized by comprising:
the acquisition module is used for acquiring buried point data of the target application program;
the mapping processing module is used for mapping the buried point data to obtain an m-dimensional vector to be processed;
the disturbance processing module is used for carrying out disturbance processing on the m-dimensional vector to be processed to obtain a processed m-dimensional vector;
and the sending module is used for sending the processed m-dimensional vector to a server, so that the server carries out unbiased estimation according to the processed m-dimensional vector to obtain unbiased use times of the target application program.
10. A data processing device applied to a server side is characterized by comprising:
the acquisition module is used for acquiring a processed m-dimensional vector sent by a user side, wherein the processed m-dimensional vector is obtained by performing disturbance processing on a to-be-processed m-dimensional vector by the user side, and the to-be-processed m-dimensional vector is obtained by mapping the embedded data of an acquired target application program by the user side;
the conversion processing module is used for performing disturbance conversion processing on the processed m-dimensional vector to obtain a first converted m-dimensional vector and acquiring second converted m-dimensional vectors corresponding to other clients;
a generating module, configured to generate a k × m-dimensional iterative matrix according to the first converted m-dimensional vector and the second converted m-dimensional vector;
and the unbiased estimation module is used for carrying out unbiased estimation on the target application program according to the k multiplied by m dimensional iteration matrix to obtain the unbiased use times of the target application program.
11. A storage medium readable by a computer, wherein a computer program is stored in the storage medium, which when run on a computer causes the computer to perform the data processing method of any one of claims 1 to 4, or the data processing method of any one of claims 5 to 8.
12. A user equipment, characterized in that the user equipment comprises a processor, a memory, a computer program being stored in the memory, the processor being adapted to execute the data processing method of any of claims 1 to 4 by calling the computer program stored in the memory.
13. A server, characterized in that the server comprises a processor, a memory in which a computer program is stored, the processor being adapted to execute the data processing method of any one of claims 5 to 8 by calling the computer program stored in the memory.
CN202110936861.0A 2021-08-16 2021-08-16 Data processing method and device, storage medium, user equipment and server Pending CN113660263A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110936861.0A CN113660263A (en) 2021-08-16 2021-08-16 Data processing method and device, storage medium, user equipment and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110936861.0A CN113660263A (en) 2021-08-16 2021-08-16 Data processing method and device, storage medium, user equipment and server

Publications (1)

Publication Number Publication Date
CN113660263A true CN113660263A (en) 2021-11-16

Family

ID=78479241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110936861.0A Pending CN113660263A (en) 2021-08-16 2021-08-16 Data processing method and device, storage medium, user equipment and server

Country Status (1)

Country Link
CN (1) CN113660263A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180349636A1 (en) * 2017-06-04 2018-12-06 Apple Inc. Differential privacy using a count mean sketch
CN109543842A (en) * 2018-11-02 2019-03-29 西安交通大学 The Distribution estimation method of higher-dimension intelligent perception data with local secret protection
CN111881469A (en) * 2020-07-07 2020-11-03 深圳市腾讯网域计算机网络有限公司 Data processing method and device
CN112329056A (en) * 2020-11-03 2021-02-05 石家庄铁道大学 Government affair data sharing-oriented localized differential privacy method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180349636A1 (en) * 2017-06-04 2018-12-06 Apple Inc. Differential privacy using a count mean sketch
CN109543842A (en) * 2018-11-02 2019-03-29 西安交通大学 The Distribution estimation method of higher-dimension intelligent perception data with local secret protection
CN111881469A (en) * 2020-07-07 2020-11-03 深圳市腾讯网域计算机网络有限公司 Data processing method and device
CN112329056A (en) * 2020-11-03 2021-02-05 石家庄铁道大学 Government affair data sharing-oriented localized differential privacy method

Similar Documents

Publication Publication Date Title
EP3502880B1 (en) Method for preloading application, storage medium, and terminal device
US9537957B2 (en) Seamless application session reconstruction between devices
KR101503209B1 (en) Method and system for dynamically creating and servicing master-slave pairs within and across switch fabrics of a portable computing device
WO2016040211A1 (en) Modified matrix factorization of content-based model for recommendation system
JP5880101B2 (en) Information processing apparatus, information processing method, and program
CN106534281A (en) Data request responding method, apparatus and system
CN110334091A (en) A kind of data fragmentation distributed approach, system, medium and electronic equipment
CN112734498B (en) Task rewarding acquisition method, device, terminal and storage medium
CN106537386A (en) Identifying files for data write operations
CN111090877B (en) Data generation and acquisition methods, corresponding devices and storage medium
CN114694226B (en) Face recognition method, system and storage medium
Thakkar et al. Renda: resource and network aware data placement algorithm for periodic workloads in cloud
CN105027155A (en) Unifying cloud services for online sharing
CN112131457A (en) Information recommendation method, device and system and storage medium
US11308063B2 (en) Data structure to array conversion
CN112818080A (en) Search method, device, equipment and storage medium
CN113656272A (en) Data processing method and device, storage medium, user equipment and server
CN113660263A (en) Data processing method and device, storage medium, user equipment and server
CN111782933A (en) Method and device for recommending book list
Chang et al. PAGroup: Privacy-aware grouping framework for high-performance federated learning
CN115563160A (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN116955271A (en) Method and device for storing data copy, electronic equipment and storage medium
CN113900920A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN109561146A (en) Document down loading method, device, terminal device
CN112068976A (en) Data backup storage method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211116