CN116436704B - Data processing method and data processing equipment for user privacy data - Google Patents

Data processing method and data processing equipment for user privacy data Download PDF

Info

Publication number
CN116436704B
CN116436704B CN202310692841.2A CN202310692841A CN116436704B CN 116436704 B CN116436704 B CN 116436704B CN 202310692841 A CN202310692841 A CN 202310692841A CN 116436704 B CN116436704 B CN 116436704B
Authority
CN
China
Prior art keywords
data
privacy
calculation
target
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310692841.2A
Other languages
Chinese (zh)
Other versions
CN116436704A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shencun Technology Wuxi Co ltd
Original Assignee
Shencun Technology Wuxi Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shencun Technology Wuxi Co ltd filed Critical Shencun Technology Wuxi Co ltd
Priority to CN202310692841.2A priority Critical patent/CN116436704B/en
Publication of CN116436704A publication Critical patent/CN116436704A/en
Application granted granted Critical
Publication of CN116436704B publication Critical patent/CN116436704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

The application discloses a data processing method and data processing equipment for user privacy data, which relate to the field of data processing, and are characterized in that privacy data collected by terminal equipment is received based on a data access engine, and the privacy data is classified and stored into a storage medium through a data storage engine; receiving a service request instruction of a data center through a service access engine, and determining request authority of the data center and a requested target data type; reading target privacy data from a local storage medium through a data storage engine, and performing desensitization conversion through a data protection engine to convert the target privacy data into encrypted data which can not acquire original information; the generated encrypted data is sent back to the data center through the service access engine. According to the scheme, the private data is localized, when the manufacturer needs the private data to conduct personalized analysis recommendation, the private data is returned to the manufacturer after being subjected to encryption conversion processing, privacy disclosure caused by direct acquisition of the private data by the manufacturer is avoided, and data security is improved.

Description

Data processing method and data processing equipment for user privacy data
Technical Field
The embodiment of the application relates to the field of data processing, in particular to a data processing method and data processing equipment for user privacy data.
Background
With the rapid development of the internet, various APPs based on user services generate a large amount of user privacy data, and collection and calculation services of these privacy data are provided by manufacturers (edge devices+data centers). Relevant data information such as APP account number and configuration is stored in the APP server's data center (server).
Although the service provider promises to protect the privacy of the user, when the security of the data storage mode is mainly dependent on the importance of the service provider to the user data, the private data can be revealed when the service provider is not well managed. Furthermore, edge devices may also face data leakage risks during the data collection phase. For example, hacking, storage, human leakage of third parties and management personnel, etc., the construction and maintenance cost of a data security system of a data center is high, and most manufacturers have no capability to provide promised data privacy protection in practice. While collecting user data, the edge equipment provider can use the data to perform machine learning while providing privacy protection of the original data, so that the trained public model reveals customer privacy data information when providing services for others, and the hidden danger is particularly obvious in GPT and other generation models. User data are not shared among manufacturers, so that equipment for collecting information at edges is over-distributed, the use cost of users is increased, and high-quality products of small manufacturers cannot effectively provide high-quality services due to imperfect ecological chains. Of course, when the customer refuses to provide the private data due to the concern about privacy, the service provider cannot provide any personalized service to the customer.
Disclosure of Invention
The application provides a data processing method and data processing equipment for user privacy data, which solve the risk of leakage of user privacy, improve the safety of the user privacy data and perfect the ecological chain and intelligent service of a service provider.
In one aspect, the present application provides a data processing method for user privacy data, including:
collecting and generating privacy data of a user based on a communication protocol established by each terminal device, and storing the privacy data in a local storage medium in a classified manner; the private data collection and storage process is unidirectional transmission;
receiving a service request instruction of a data center, and determining request authority of the data center and a requested target data type; wherein, different data centers correspond to different platform service providers;
reading target privacy data from a local storage medium, performing desensitization conversion on the target privacy data according to the privacy calculation type indicated by the service request instruction, and converting the target privacy data into encrypted data which can not acquire original information; when the privacy computing type is a type of computing type, the target privacy data is directly subjected to desensitization conversion, and when the privacy computing type is a type of computing type, the target privacy data is processed and subjected to desensitization conversion through a computing model issued by the data center; wherein, the computing content contained in the class II computing type is executed by the data center, and the computing content contained in the class II computing type is executed locally;
and sending the generated encrypted data back to the data center.
On the other hand, the application provides a data processing device facing to user privacy data, which is applied to a data processing method facing to the user privacy data, and the device is built based on a CSP architecture and comprises:
the data access engine is internally provided with a communication interface/protocol and is used for receiving privacy data acquired and generated by each terminal device;
the data storage engine establishes unidirectional transmission connection with the data access engine, and stores the privacy data in a corresponding storage medium in a classified manner;
the service access engine establishes communication connection with each data center, receives a service request instruction initiated by the data center, and sends the encrypted data subjected to localization processing back to the data center;
the data protection engine establishes interaction with the service access engine, receives the service request instruction forwarded by the service access engine, desensitizes and converts target privacy data according to the privacy calculation type indicated by the service request instruction, converts the target privacy data into encrypted data which cannot acquire original information, and sends the encrypted data back to a data center; when the privacy calculation type is a calculation type, the target privacy data is directly subjected to desensitization conversion; when the privacy computing type is a second class computing type, processing and desensitizing conversion are carried out on the target privacy data through a computing model issued by the data center; the computing content contained in the first class of computing types is executed by a data center, the computing content contained in the second class of computing types is executed locally, different data centers correspond to different platform service providers, and the user side is equipment for controlling each terminal equipment to acquire private data;
the storage medium is connected with the data storage engine through a transmission interface/protocol and stores privacy data and storage parameters acquired by each terminal device.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least: the method for storing the private data into the local equipment can send a request to the data processing equipment under the condition that a service provider needs to call the private data of the user, then the private data is subjected to private data conversion locally in advance, the result (encrypted data) is returned to the data center through the service access engine after the conversion, the data center can conduct personalized recommendation based on the encrypted data, the encrypted data can be suitable for the data center to conduct personalized recommendation, original information cannot be directly obtained from the encrypted data, and the private content cannot be directly obtained naturally. The process avoids the situation that the manufacturer directly accesses the private data of the user, and the user can call or apply the private data for personalized analysis permission to the manufacturer.
From the perspective of manufacturers, the ecological chain can be further perfected, personalized intelligent service is provided for users in a targeted manner, from the perspective of users, privacy data is locally stored and processed, the cloud disclosure of the privacy data to service providers is avoided, the risk of privacy data leakage is reduced, and personalized service can be enjoyed.
Drawings
FIG. 1 is a block diagram of a data processing device for user privacy data according to an embodiment of the present application;
FIG. 2 is a block diagram of a data processing apparatus for user privacy oriented data provided by another embodiment;
fig. 3 is a flowchart of a data processing method for user privacy data according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
FIG. 1 is a block diagram of a data processing device for user privacy data according to an embodiment of the present application; the data processing device is designed based on a CSP architecture, which can be realized based on an FPGA, and the SoC can be designed based on the architecture. The method specifically comprises the following steps:
the data access engine is internally provided with a plurality of main stream communication interfaces/protocols which support various main stream communication interfaces/protocols such as Ethernet, WIFI, bluetooth, zigBee and the like, and privacy data collected and generated by each terminal device are received through the main stream communication interfaces/protocols. The terminal device may include a plurality of audio monitoring devices, video monitoring devices, etc. in an office environment or a home environment, and of course, the terminal device is not limited to a single sensor device, but may be smart products such as a smart phone, a bracelet, a wearable device, etc., and these smart products collect corresponding privacy data under the condition of user authorization, such as usage habits, sleeping habits, browsing records, account passwords, etc.
The data storage engine is used for collecting privacy data which belongs to sensitive data and is necessary for each manufacturer to customize personalized intelligent service for users, for example, recommending related products according to browsing records of shopping websites, or designating work and rest time according to sleeping conditions and the like. Traditional intelligent service providers (vendors) need to collect these private data to a data center (cloud server), and then analyze the private data to propose targeted intelligent services. On the premise that privacy protection is greatly achieved, most of user privacy data are revealed to different degrees at a service provider, so that a part of people who watch personal privacy refuses to open monitoring permission to terminal equipment, and the method is particularly obvious in smart phones, mobile wearing products and IOT equipment.
Based on the above, the application adds the data storage engine and the large-capacity storage medium on the CSP framework, the storage medium and the data storage engine are connected through a transmission interface/protocol, the data access engine receives the private data and then transmits the private data to the data storage engine, and the data storage engine calls the storage medium to carry out data localization storage. Managed storage media include emmc, mechanical disks, solid state disks, and mass storage devices that are transported via NVMe-of, etc. The privacy data storage space, classification and data management in the storage medium are uniformly scheduled and managed by the data protection engine, the control authority of the data protection center is controlled by the user side, and the storage capacity, the storage position and the control parameters of each terminal device are configured and allocated according to the received control instruction.
It should be noted that, the data access engine and the data storage engine are strictly transmitted in one direction, that is, the data flows from the data access engine to the data storage engine, so as to avoid the terminal device from unintentionally authorizing or maliciously reading the private data in the storage medium and avoid the data leakage caused by the offline end. The unidirectional transmission connection may be implemented using any unidirectional transmission device, and the embodiment is not limited.
As illustrated in fig. 1, the service access engine of the data processing device accesses each data center, and different data centers correspond to different APP program/platform service providers and provide different personalized intelligent services. The service access engine receives service request instructions initiated by each data center by establishing communication with each data center, and transmits the localized encrypted data to the designated data center. The implementation of the encrypted data is described in the following embodiments.
The data protection engine establishes interaction with the control and service access engine, receives a service request instruction forwarded by the service access engine, performs desensitization conversion according to target privacy data called by the acquired data storage engine, and forwards the converted encrypted data to the user side or the service access engine. The data center and the service provider are in one-to-one correspondence, and the user side is equipment for controlling each terminal equipment to collect private data, such as control equipment of a PC (personal computer) or a mobile phone and the like. When the privacy calculation type is a calculation type, the target privacy data is directly subjected to desensitization conversion; and when the privacy computing type is a second-class computing type, processing and desensitizing conversion are carried out on the target privacy data through a computing model issued by the data center.
The method for storing the private data into the local equipment can send a request to the data processing equipment under the condition that a service provider needs to call the private data of the user, then the private data is subjected to private data conversion locally in advance, the result (encrypted data) is returned to the data center through the service access engine after the conversion, the data center can conduct personalized recommendation based on the encrypted data, the encrypted data can be suitable for the data center to conduct personalized recommendation, and original information cannot be directly obtained from the encrypted data, so that the private content cannot be directly obtained naturally. The process avoids the situation that the manufacturer directly accesses the private data of the user, and the user can call or apply the private data for personalized analysis permission to the manufacturer.
From the perspective of manufacturers, the ecological chain can be further perfected, personalized intelligent service is provided for users in a targeted manner, from the perspective of users, privacy data is locally stored and processed, the cloud disclosure of the privacy data to service providers is avoided, the risk of privacy data leakage is reduced, and personalized service can be enjoyed.
Fig. 2 is a block diagram of a data processing device for user privacy data according to another embodiment, and is mainly directed to specific processing content of privacy data when a data center initiates different types of service request instructions. On the basis of comprising the service access engine, the data storage engine and the data protection engine in fig. 1, a client which communicates with a data center is built in the service access engine, and a service request instruction is received through the client. The client is installed under the conditions that the user uses the intelligent product and the manufacturer authorizes, and the installation of the client opens up the data center of the equipment and the manufacturer, so that the special intelligent service is provided for the user.
The service request instruction comprises a privacy calculation application initiated by the data center and used for calling target privacy data and carrying out privacy calculation, namely, the data analysis is needed to be carried out on the privacy data when personalized service is provided.
For a service request instruction for privacy calculation initiated by a data center, for example, retrieving sleep data, heart rate and pulse data of a user, and the like, reasonably recommending services such as work and rest time, diet and the like for the user, the related sleep data, heart rate data, pulse data and the like need to be retrieved from a storage medium. For analysis of the privacy data, an algorithm library is built in the data protection engine, the algorithm library comprises an algorithm for extracting vectors of various types of privacy data, the extracted vector data comprises some target dimension information which is necessary for the data analysis of the data center, the vector data can be sent to the data center, and the data center can carry out subsequent analysis without directly sending the original audio and video and other types of privacy data to manufacturers. Such requests that directly utilize the extracted vector for analysis calculations are referred to as a class of calculation types.
Optionally, the algorithm library may further include an encryption algorithm, the transformed vector is encrypted, specific encryption algorithm, encryption parameters and manufacturers agree in advance, the target privacy data is encrypted and transformed to obtain encrypted data, and then the encrypted data is returned to the data center by the client of the service access engine for data analysis.
Of course, for some sensitive and important private data, or private data which cannot be analyzed after vector extraction, or special data with insufficient analysis precision caused by information loss after extraction, in order to meet the requirement of only providing service and not directly uploading the private data, a model library set by a manufacturer is arranged in a service access engine and is used for storing a calculation model of some operation private data. The calculation model may be issued from the data center when processing specific privacy calculations. The data protection engine processes the target privacy data through the calculation model, namely, the data analysis step is sunk on the local equipment, and the data center performs subsequent processing according to the analysis result of the calculation model. The request for carrying out subsequent analysis and calculation by using the model calculation result is called a second-class calculation type, the second-class calculation type is more complex in processing process than the first-class calculation type, the request of the first-class calculation type only needs to extract vectors locally, the request of the second-class calculation type needs to provide more calculation power for local equipment, privacy data processing is carried out locally, simple model data is returned to a data center, and the processing mode meets the requirements of manufacturer data analysis and does not leak user privacy data.
Optionally, the data processing device in the embodiment of the present application may also be used for corpus analysis of large language models, such as large language models with instant dialogue questions and answers, including a heart word and ChatGPT. For vendors to better have a large language model serve a particular user or group of users, the model needs to be adapted by training with language data (privacy data). For users, training of large language models requires extremely high computational effort, which must be aided by the computational effort of the data center. The CSP architecture oriented to private data computation supports this function with the original private data protected.
As shown in fig. 2, the different clients further have a common token dictionary built in, and the data protection engine has a domain token dictionary built in each data center. The public word element token dictionary contains public token IDs for translating public linguistic data, and the public linguistic data is translated into the public token IDs. The domain token dictionary contains a domain token ID for translating a domain corpus, translates the domain corpus into the domain token ID, and the neighborhood token dictionary is a dictionary formed by private data of a user. The public corpus and the domain corpus are data information related to general languages and professional technical languages, so that the corpus is privacy data necessary for training a large language model. The data centers and the domain token dictionaries are in one-to-one correspondence, and the public word element token dictionary, the domain token ID dictionary and the domain token ID established by different data centers can be different.
The service access engine is used for controlling rights in a data center (manufacturer), the data center can manage a model library and a public token dictionary according to requirements, such as data cleaning and upgrading at regular intervals, adapting to a latest collected corpus and training results, the field token dictionary upgrading is carried out by a data protection engine, and original data are protected while latest corpus-ID (identity) entries are added, and corpus information corresponding to the existing dictionary IDs is perfected.
In addition, the (client side of the) data protection engine also establishes communication connection with the user side, and data management is performed based on the control instruction of the user side. The control authority of the data protection engine is at the user end, and the control authority configures and manages the storage capacity, the storage position and the control parameters of each terminal device according to the control instruction of the user end. The data protection engine invokes target privacy data based on a data application request initiated by a user terminal and carries out related operation, and then the data protection engine directly sends the privacy data, feedback data and the like invoked by the user to the user terminal through a protocol. The process may include operations such as private data viewing, deletion, and editing.
In some embodiments, the data processing device constructed by the CSP architecture mainly stores private data, performs various AI calculations and pre/post data processing operations on the private data, and further includes steps of desensitization conversion, localization data processing and the like, so that a calculation acceleration engine is further added on the CSP architecture. The computing acceleration engine integrates common computing acceleration cores such as a neural network computing acceleration core, a database acceleration core, a compression/decompression acceleration core, an encryption/decryption acceleration core, a general computing core and the like, can support common AI computing, pre/post data processing operation and the like, and is called in the process of executing related data processing by the data protection engine, so that the data processing is accelerated, and the response speed is improved.
The application provides a data processing method for user privacy data, which aims at CSP architecture and data processing equipment for privacy data calculation, and is shown in figure 3, and is a flow chart of the data processing method for the user privacy data, and specifically comprises the following steps:
step 301, collecting and generating privacy data of a user based on a communication protocol established by the data access engine and each terminal device, and storing the privacy data in a local storage medium in a classified manner through the data storage engine.
The data access engine is connected with a plurality of terminal devices, the received data is classified and stored by the data storage engine, and a unidirectional transmission process is arranged between the data access engine and the data storage engine.
Step 302, a service request instruction of a data center is received through a service access engine, and the request authority of the service request instruction and the requested target data type are determined.
The request authority is used for verifying whether the data center is authorized by the user side to call the private data, the follow-up steps are only carried out under the condition that the user side is authorized, and the target data type is used for determining the private data and the storage position which are specifically needed.
In step 303, the target private data is read from the local storage medium by the data storage engine, and is converted into encrypted data which cannot acquire the original information by invoking the data protection engine to perform desensitization conversion.
The local storage medium selects a particular desensitization conversion process based on the type of privacy calculations indicated by the service request instructions. The desensitization conversion step is carried out in local equipment, the converted encrypted data is data containing dimension information necessary for personalized analysis by a manufacturer or a data center, and the data is protected from the source directly according to the fact that the original data information can not be obtained by the encrypted data.
Step 304, the generated encrypted data is sent back to the data center through the service access engine.
The privacy data can be better protected by the cloud processing-free mode, and leakage caused by poor management of manufacturers is avoided. The data can be further encrypted by desensitization conversion, and the user side can provide the access authority of the manufacturer to the private data with confidence, so that the user side can experience personalized services of different terminal devices.
For the service request instruction sent by the data center, a specific privacy conversion mode for the target privacy data is specifically needed to be judged according to the actual privacy calculation content. Because the personalized services provided by different manufacturers are not consistent, and the content of the privacy data collected by different personalized functions and the calculation mode of the privacy data are not the same. Moreover, the subsequent calculation type is only determined if the data center is authorized. Determining the privacy computing content as a first computing type when it indicates that processing is performed on the vector; when the privacy calculation content indicates that the processing is performed on the model data, it is determined as the second type of calculation.
According to the calculation type division, the calculation content corresponding to one type of calculation type is returned to the data center to execute related operation based on the returned vector data, the calculation content corresponding to the second type of calculation type is calculated on the target privacy data by using a local calculation model, and then the model data is returned to the data center. The method can be determined according to a vector extraction operation, wherein the vector extraction is a method for rapidly extracting useful dimension information, and is determined according to specific personalized services, and a strategy of localization analysis is adopted for some private data without vector extraction. The determination of the specific class and class two calculation types may be specified by the data center when sending a service request instruction or determined empirically by the local device.
When the privacy computing type is determined to be a type of computing type, determining and reading related target privacy data based on computing content, and directly performing desensitization conversion on the target privacy data.
In one possible implementation, the desensitization conversion includes vector extraction and homomorphic encryption, and after the target privacy data is invoked by the data storage engine, the data protection engine performs vector extraction on the target privacy data based on target dimension information required for calculation, and obtains first vector data containing the target dimension information. And then homomorphic encryption is carried out on the first vector data according to encryption parameters agreed by the data center, so as to obtain a type of encryption vector. The specific algorithm and parameters of vector extraction and homomorphic encryption are obtained from the algorithm library of the data protection engine.
It should be noted that if the data is encrypted by using a common encryption algorithm, the data is decrypted first; if the data is encrypted by the homomorphic encryption algorithm, the data can be calculated under the condition of not decrypting, so that the data transmission is encrypted by the common algorithm to prevent the data leakage on the transmission link, and the data is encrypted by the homomorphic encryption algorithm to radically prevent the receiving person from knowing the data before encryption.
When the privacy computing type is determined to be the second class computing type, the data protection engine determines and reads target privacy data based on the required computing content, a computing model is requested to the data center through the service access engine, after the computing model corresponding to the second class computing is issued and processed by the data center, model computing is carried out on the target privacy data based on the computing model, and desensitization conversion is carried out on a model computing result, namely model data.
Optionally, when the data center sets up a model base in the service access engine, a target calculation model can be found from the model base, and when the target calculation model does not exist, the data center is requested to issue and store the target calculation model in the model base.
It should be noted that, the control authority of the service access engine is controlled by the manufacturer, including the upgrade of the client and the management and maintenance of the model library. In one possible implementation, after the data protection engine runs out of the target computing model, its deletion is controlled by the service access engine, avoiding revealing vendor technology.
In the data protection engine, in the process of privacy data, the calculation acceleration engine is also required to be scheduled to perform acceleration processing in consideration of AI calculation and some model calculation, so as to process some neural network acceleration operation, compression and decompression operation, encryption and decryption operation and the like.
The application of the CSP architecture and data processing method and apparatus for privacy-oriented data computing to real-time monitoring of a scene is described in some embodiments below.
1, a remote user side sends a data query application to a client side of a service provider in a service access engine;
2, the client of the service provider sends a data reading command to the data protection engine;
3, the data protection engine calls the data storage engine to read the target privacy data;
4, the data protection engine calls a calculation acceleration engine to encrypt and convert the target data engine;
and 5, the data protection engine forwards the encrypted data to the user side.
The specific process performed for a class of computing types includes the steps of:
1, a service provider server side applies to a client side in CSP architecture equipment;
2, the client forwards the command to the data protection engine;
3, the data protection engine invokes the target privacy data through the data storage engine;
4, the data protection engine calls a calculation acceleration engine to perform vector calculation on the target privacy data;
5, the data protection engine encrypts and protects the first vector data and then returns the first vector data to the data center through the service access engine;
6, the data center performs model training;
and 7, the data center transmits the trained AI model to the terminal equipment, and intelligent service is provided for the user.
The specific process performed for the class II computing type includes the steps of:
1, a data center issues a calculation model and an algorithm to a service access engine;
2, forwarding the data to a data protection engine through a client;
loading a calculation model and algorithm parameters by the data protection engine, and calling a calculation acceleration engine to analyze and calculate;
4, the data protection engine returns a model calculation result enabling to the client and returns a data center;
5, the data center carries out recall calculation according to the ebedding;
and 6, the data center provides personalized recommendation service for the user.
The recommendation service is one of the businesses with the most commercial value of internet manufacturers, and the CSP architecture facing the privacy data calculation can expand the recommendation based on the user browsing habit to the recommendation based on the user living habit, and meanwhile, the privacy of the user data is guaranteed. If the health state is analyzed and corresponding lifting strategies and products are recommended according to the information such as the activities and the sleeping of the user, the driving strategies are analyzed and recommended according to the driving habits of the user, and the recently focused products are analyzed and recommended according to the daily conversation information of the user.
The computation of the ebedding in the above flow uses a plurality of dimension data, such as image information, time information, voice information, health status information, weather information, action information, etc., and the ebedding computation is performed in a device based on a CSP architecture for privacy data computation, so that a service provider can accurately recommend to a user without acquiring privacy data. Of course, the data analysis and prediction service can be provided by local analysis without uploading the elevating table, such as analyzing behavior habit prediction for action preset equipment in a period of time, and the new energy vehicle can be automatically preheated in advance in winter, hot air in a bathroom can be pre-started, environmental parameters can be dynamically adjusted, and sleeping quality can be improved.
The method provided by the embodiment of the application can also be used for a large language model, the large language model realizes AI intelligence based on training of a large amount of corpus (corresponding to privacy data in the scheme), which also leads to the situation that more devices steal user privacy, and when a data center is used as the large language model to collect privacy information (corpus data), the following scheme is adopted:
after receiving a service request instruction of a data center and determining a target corpus (target privacy data), translating the target corpus into a public token ID (stream) through a public token dictionary, namely a public token ID stream;
when the public word element token dictionary does not contain token ID of the translation target corpus, assigning the domain token ID based on the target corpus, generating a domain token dictionary according to the data center, and adding the domain token ID into the domain token dictionary;
forming a token ID stream through the public token ID and/or the domain token ID, and feeding back to the data center;
the data center trains a specific field generation model according to the token ID stream, and provides AI generation service for users.
The public corpus is the process of converting the token into ID numbers, namely a process of converting a section of formal data into ID numbers. If the ID of "i am" is 0x0001 and the ID of "chinese" is 0x0002, the sentence "i am chinese" is uploaded to 0x00010002. The process of converting token ID stream is a privacy conversion process, similar to vector extraction, and optionally, the token ID stream may be further encrypted and sent to a data center.
Token is the smallest unit in text. In English, a token may be a word or a punctuation mark. In Chinese, a word or word is typically used as a token. The ChatGPT splits the input text into individual token so that the model can process and understand it.
Based on this flow, the AI computing power provided by the service provider can be used without providing the original domain data, preventing the internal leakage form through the operator or through the trained model.
In the foregoing, the technical effects brought by the method provided by the embodiment of the application include the following:
1) The data privacy is protected from the source, the original data flow through the nodes and the manager management main body is reduced, and the leakage hidden danger is reduced;
2) The network data transmission is reduced, the cost pressure caused by bandwidth construction is reduced, the service delay is reduced, and the service quality is improved;
3) The data multi-equipment/service provider shares, reduces the construction cost of the data collection terminal, and simultaneously allows the user to use a good service of a small factory which is not in the ecological chain;
4) The complete dependence of the data privacy protection on manufacturer credit is broken, and the data acquisition terminal is separated from the service provider to eliminate the possibility of unauthorized background transmission;
5) A reliable guarantee for protecting the privacy data is established between the user and the manufacturer, and the privacy data is protected while the user and the manufacturer enjoy the high-quality AI service.
The foregoing describes preferred embodiments of the present application; it is to be understood that the application is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art will make many possible variations and modifications, or adaptations to equivalent embodiments without departing from the technical solution of the present application, which do not affect the essential content of the present application; therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application still fall within the scope of the technical solution of the present application.

Claims (9)

1. A data processing method for user privacy data, comprising:
collecting and generating privacy data of a user based on a communication protocol established by each terminal device, and storing the privacy data in a local storage medium in a classified manner; the private data collection and storage process is unidirectional transmission;
receiving a service request instruction of a data center, and determining request authority of the data center and a requested target data type; wherein, different data centers correspond to different platform service providers;
reading target privacy data from a local storage medium, performing desensitization conversion on the target privacy data according to the privacy calculation type indicated by the service request instruction, and converting the target privacy data into encrypted data which can not acquire original information; when the privacy calculation type is a calculation type, the target privacy data is directly subjected to desensitization conversion; when the privacy computing type is a second class computing type, processing and desensitizing conversion are carried out on the target privacy data through a computing model issued by the data center; wherein, the computing content contained in the class II computing type is executed by the data center, and the computing content contained in the class II computing type is executed locally;
and sending the generated encrypted data back to the data center.
2. The data processing method for user-oriented private data according to claim 1, wherein when the data center has a request right for the target private data, determining the content of the private calculation performed by the data center on the target data; determining the privacy computing content as a first computing type when it indicates that processing is performed on the vector; determining the privacy calculation content as a second type of calculation type when the privacy calculation content indicates that the model data is processed; the privacy calculation content is determined based on personalized functions provided by manufacturers, and the privacy data content collected by different personalized functions and the calculation mode of the privacy data are different.
3. The data processing method for user-oriented privacy data according to claim 2, wherein when the privacy calculation content is of a type of calculation type, the desensitization conversion includes:
vector extraction is carried out on the target privacy data based on target dimension information required by the content calculation, and first vector data containing the target dimension information is obtained; homomorphic encryption is carried out on the first vector data according to encryption parameters agreed with the data center, and a type of encryption vector is obtained;
when the privacy computing content is of a second type of computing, the desensitization conversion includes:
and directly homomorphic encrypting the model calculation result to obtain second-class encryption model data.
4. The data processing method for user-oriented private data according to claim 1, wherein the user terminal has control authority over the private data, and when receiving a data application request initiated by the user terminal, the user terminal determines the invoked target private data based on the operation authority of the user terminal and the data application request;
and carrying out data encryption on the target privacy data, sending the target privacy data back to the user side, and carrying out related operations on the basis of receiving and decrypting the target privacy data which is not converted by the user side.
5. The data processing method for user-oriented privacy data according to any one of claims 1 to 4, wherein when the privacy data is corpus data trained by a large language model, after receiving a service request instruction of a data center and determining a target corpus, translating the target corpus into a public token ID through a public token dictionary;
when the public word element token dictionary does not contain token ID for translating the target corpus, assigning a domain token ID based on the target corpus, generating a domain token dictionary according to the data center, and adding the domain token ID into the domain token dictionary;
and forming a token ID stream through the public token ID and/or the domain token ID, and feeding back to the data center.
6. A data processing device for user privacy data, the device being built based on CSP architecture, comprising:
the data access engine is internally provided with a communication interface/protocol and is used for receiving privacy data acquired and generated by each terminal device;
the data storage engine establishes unidirectional transmission connection with the data access engine, and stores the privacy data in a corresponding storage medium in a classified manner;
the service access engine establishes communication connection with each data center, receives a service request instruction initiated by the data center, and sends the encrypted data subjected to localization processing back to the data center;
the data protection engine establishes interaction with the service access engine, receives the service request instruction forwarded by the service access engine, desensitizes and converts target privacy data according to the privacy calculation type indicated by the service request instruction, converts the target privacy data into encrypted data which cannot acquire original information, and sends the encrypted data back to a data center; when the privacy calculation type is a calculation type, the target privacy data is directly subjected to desensitization conversion; when the privacy computing type is a second class computing type, processing and desensitizing conversion are carried out on the target privacy data through a computing model issued by the data center; the computing content contained in the first class of computing types is executed by a data center, the computing content contained in the second class of computing types is executed locally, different data centers correspond to different platform service providers, and the user side is equipment for controlling each terminal equipment to acquire private data;
the storage medium is connected with the data storage engine through a transmission interface/protocol and stores privacy data and storage parameters acquired by each terminal device.
7. The data processing device for user-oriented private data according to claim 6, wherein the service access engine embeds a client in communication with the data center, and receives the service request instruction through the client; the service request instruction comprises a calculation application, wherein the calculation application is initiated by the data center and is used for calling the target privacy data and carrying out privacy calculation; wherein, the client and the data center are in one-to-one correspondence.
8. The data processing device for user privacy data according to claim 7, wherein the service access engine performs data communication between a client and a user terminal, performs data management based on a control command of the user terminal, and the data protection engine is controlled by the user terminal and configures and manages storage capacity, storage location and control parameters of each terminal device according to the control command.
9. The user privacy data oriented data processing device of claim 7, wherein the data protection engine comprises an algorithm library in which algorithms for performing desensitization transformations are stored; the service access engine stores a model library of each data center, wherein the model library comprises a calculation model for personalized analysis of private data;
when the service request instruction is a calculation application initiated by the data center and the privacy calculation type is determined to be a type of calculation type, vector extraction and encryption are carried out on the target privacy data through a corresponding algorithm;
when the privacy computing type is determined to be the class II computing type, selecting a computing model from a model library to perform model computing; and when the model library does not contain the calculation models for executing the two types of calculation types, sending a model request to the data center and adding the model request to the model library.
CN202310692841.2A 2023-06-13 2023-06-13 Data processing method and data processing equipment for user privacy data Active CN116436704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310692841.2A CN116436704B (en) 2023-06-13 2023-06-13 Data processing method and data processing equipment for user privacy data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310692841.2A CN116436704B (en) 2023-06-13 2023-06-13 Data processing method and data processing equipment for user privacy data

Publications (2)

Publication Number Publication Date
CN116436704A CN116436704A (en) 2023-07-14
CN116436704B true CN116436704B (en) 2023-08-18

Family

ID=87083631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310692841.2A Active CN116436704B (en) 2023-06-13 2023-06-13 Data processing method and data processing equipment for user privacy data

Country Status (1)

Country Link
CN (1) CN116436704B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118014011B (en) * 2024-04-07 2024-07-05 蚂蚁科技集团股份有限公司 Training method, training device, training data construction method, training device, training data construction equipment and training data construction medium for large language model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020199785A1 (en) * 2019-03-29 2020-10-08 华控清交信息科技(北京)有限公司 Processing method and computing method for private data, and applicable device
CN112818398A (en) * 2021-02-06 2021-05-18 陈笑男 Data processing method and big data processing equipment for big data privacy protection
CN113010919A (en) * 2021-03-22 2021-06-22 北京神州数字科技有限公司 Protection method for sensitive data and private data
WO2021228149A1 (en) * 2020-05-15 2021-11-18 支付宝(杭州)信息技术有限公司 Private data protection method, system, and device
CN115017107A (en) * 2022-06-02 2022-09-06 润联软件系统(深圳)有限公司 Data retrieval method and device based on privacy protection, computer equipment and medium
WO2022183794A1 (en) * 2021-03-03 2022-09-09 华为技术有限公司 Traffic processing method and protection system
CN115202908A (en) * 2022-09-09 2022-10-18 杭州海康威视数字技术股份有限公司 Privacy computation request response method and device based on dynamic arrangement
CN116011021A (en) * 2023-02-09 2023-04-25 安徽长泰科技有限公司 Data security protection method for private data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020199785A1 (en) * 2019-03-29 2020-10-08 华控清交信息科技(北京)有限公司 Processing method and computing method for private data, and applicable device
WO2021228149A1 (en) * 2020-05-15 2021-11-18 支付宝(杭州)信息技术有限公司 Private data protection method, system, and device
CN112818398A (en) * 2021-02-06 2021-05-18 陈笑男 Data processing method and big data processing equipment for big data privacy protection
WO2022183794A1 (en) * 2021-03-03 2022-09-09 华为技术有限公司 Traffic processing method and protection system
CN113010919A (en) * 2021-03-22 2021-06-22 北京神州数字科技有限公司 Protection method for sensitive data and private data
CN115017107A (en) * 2022-06-02 2022-09-06 润联软件系统(深圳)有限公司 Data retrieval method and device based on privacy protection, computer equipment and medium
CN115202908A (en) * 2022-09-09 2022-10-18 杭州海康威视数字技术股份有限公司 Privacy computation request response method and device based on dynamic arrangement
CN116011021A (en) * 2023-02-09 2023-04-25 安徽长泰科技有限公司 Data security protection method for private data

Also Published As

Publication number Publication date
CN116436704A (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN110399742B (en) Method and device for training and predicting federated migration learning model
US11321734B2 (en) Information processing method, server, and computer storage medium
CN116436704B (en) Data processing method and data processing equipment for user privacy data
KR101171321B1 (en) A system and method for car service management using mobile augmented reality in smart phone
CN109067697B (en) User account management and control method for hybrid cloud and readable medium
US20150134959A1 (en) Instant Communication Method and System
CN104967511A (en) Processing method for enciphered data, and apparatus thereof
CN110505201A (en) Conferencing information processing method, device, computer equipment and storage medium
CN111800418B (en) Use method of CDS data security encryption system
CN112231309B (en) Method, device, terminal equipment and medium for removing duplicate of longitudinal federal data statistics
CN105281907A (en) Encrypted data processing method and apparatus
KR101326789B1 (en) A system and method of Multiple Context-awareness for a customized cloud service distribution in Service Level Agreement
CN106599632A (en) Password input method and device
CN112436936A (en) Cloud storage method and system with quantum encryption function
CN108073820A (en) Security processing, device and the mobile terminal of data
CN110351225A (en) A kind of networking method of hardware device, device, system and storage medium
CN110363025A (en) A kind of user data privacy management method, apparatus and electronic equipment
CN112511892B (en) Screen sharing method, device, server and storage medium
US10726102B2 (en) Method of and system for providing access to access restricted content to a user
CN108109625B (en) Mobile phone voice recognition internal and external network transmission system and method
CN110691057B (en) Implicit authentication method and device and computer readable storage medium
CN114707169A (en) Input information privacy protection system and method based on safe two-party calculation
JP2020086116A (en) Information processing device and information processing method
CN113672954A (en) Feature extraction method and device and electronic equipment
CN117272358A (en) Data storage encryption method, device, electronic equipment and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant