CN113094376B - Data main body request processing method and system based on distributed machine learning - Google Patents

Data main body request processing method and system based on distributed machine learning Download PDF

Info

Publication number
CN113094376B
CN113094376B CN202110638898.5A CN202110638898A CN113094376B CN 113094376 B CN113094376 B CN 113094376B CN 202110638898 A CN202110638898 A CN 202110638898A CN 113094376 B CN113094376 B CN 113094376B
Authority
CN
China
Prior art keywords
data
personal information
index
main body
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110638898.5A
Other languages
Chinese (zh)
Other versions
CN113094376A (en
Inventor
王文宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shuanhang Technology Co ltd
Original Assignee
Beijing Shuanhang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shuanhang Technology Co ltd filed Critical Beijing Shuanhang Technology Co ltd
Priority to CN202110638898.5A priority Critical patent/CN113094376B/en
Publication of CN113094376A publication Critical patent/CN113094376A/en
Application granted granted Critical
Publication of CN113094376B publication Critical patent/CN113094376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method and a system for processing a data main body request based on distributed machine learning, wherein the method comprises the following steps: analyzing data of the multi-source data center through distributed machine learning, and establishing an index of first personal information in the multi-source data center; establishing an index of second personal information related to the first personal information in the Internet according to the index of the first personal information; establishing an index of the first personal information and an associated index of the second personal information; receiving a data processing request of a data main body, and identifying the authenticity of the data main body through the index of the first personal information; and confirming the request triggered by the data main body, acquiring all related data requested by the data main body in real time through the index of the first personal information, the index of the second personal information and the associated index through distributed machine learning, and executing the data processing request of the data main body. The method and the device solve the technical problems of inquiry and deletion required by data main body requests in the related technology.

Description

Data main body request processing method and system based on distributed machine learning
Technical Field
The application relates to the field of information security, in particular to a method and a system for processing a data main body request based on distributed machine learning.
Background
In the digital transformation, personal information is used as a type of important data, and the main body of the personal information, i.e., the data main body, needs to obtain control. Under the background that personal information is widely applied and personal information compliance is a key point concerned by all parties, how to ensure the legal right of a data main body on the premise of exerting data value by the personal information is a key point of digital transformation.
In the prior art, various enterprises acquire personal information such as names, identification cards, ages, sexes, telephones, hobbies and the like to different degrees, the personal information is stored by the enterprises in different modes, even in one enterprise, the fact that a data main body wants to inquire or delete all the personal information is impossible, the traditional safety means focuses on boundary protection or static storage protection, the data main body appeal of the personal information is not controlled, and an enterprise manager wants to acquire the value of the personal information and simultaneously guarantees the legal right of the data main body, and effective means is not provided.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a data body request method based on distributed machine learning, which aims to solve the technical problem that the data body request of personal information cannot be responded in the related technology.
According to an aspect of an embodiment of the present application, there is provided a method for data body request based on distributed machine learning, including:
analyzing data of the multi-source data center through distributed machine learning, and establishing an index of first personal information in the multi-source data center; according to the index of the first personal information, analyzing the data of the Internet through distributed machine learning, and establishing an index of second personal information related to the first personal information in the Internet; establishing an index of the first personal information and an associated index of the second personal information through distributed machine learning;
receiving a data processing request of a data main body, and identifying the authenticity of the data main body through the index of the first personal information; and confirming the request triggered by the data main body, acquiring all related data requested by the data main body in real time through the index of the first personal information, the index of the second personal information and the associated index through distributed machine learning, and executing the data processing request of the data main body.
Optionally, analyzing data of the multi-source data center through distributed machine learning, and establishing an index of first personal information in the multi-source data center, including:
analyzing data of a multi-source data center through distributed machine learning, extracting data characteristics belonging to a personal information class from the data center containing various data, and establishing an index of first personal information for the data of the personal information class, wherein the index of the first personal information does not contain original data or attributes of the personal information and is only used for further analysis of the distributed machine learning, and any component cannot deduce the original data or attributes of the personal information class through the index of the first personal information;
optionally, the step of analyzing, by distributed machine learning, data of the internet according to the index of the first personal information to establish an index of second personal information in the internet, where the second personal information is related to the first personal information, includes:
according to the index of the first personal information of the multi-source data center, the data of the internet are analyzed through distributed machine learning, the data characteristics which are related to the personal information of the multi-source data center and belong to the personal information class are extracted from a network containing various data, and the index of second personal information is established for the data of the personal information class, wherein the index of the second personal information does not contain the original data or attribute of the personal information and is only used for further analysis of the distributed machine learning, and any component cannot deduce the original data or attribute of the personal information class through the data characteristics of the second personal information class.
Optionally, establishing, by distributed machine learning, an associated index of the first personal information and the index of the second personal information includes:
the index of the first personal information extracted by the multi-source data center and the index of the second personal information extracted in the Internet are analyzed in an artificial intelligence mode through distributed machine learning, and an associated index is established for the index of the first personal information extracted by the multi-source data center and the index of the second personal information extracted in the Internet, wherein the associated index does not contain original data or attributes of the personal information and is only used for further management of the distributed machine learning and processing of a data main body request, and any component cannot deduce the original data or attributes of the personal information through the index.
Optionally, the data processing request of the data body includes a query request, and the processing method of the query request is as follows:
receiving a request of inquiring personal information from a data main body, carrying out identity authentication on the data main body, starting real-time personal information inquiry through distributed machine learning if the data main body is consistent with the personal information associated with the index of the first personal information, and feeding back an inquiry result to the data main body, wherein the inquiry result comprises the personal information storage condition of the data main body and the personal information storage condition associated with the data main body in the internet.
Optionally, the data processing request of the data body includes a deletion request, and a processing method of the deletion request is as follows:
receiving a request for deleting personal information, which is provided by a data main body, identifying the data main body according to the grasped index of the first personal information, starting real-time personal information query through distributed machine learning if the data main body is consistent with the personal information associated with the index of the first personal information, and feeding back a query result to the data main body, wherein the query result comprises the personal information storage condition of the data main body and the personal information storage condition associated with the data main body in the Internet;
receiving a confirmation execution deleting instruction submitted by the data main body according to the query result, and deleting personal information requested by the data main body and stored in the multi-source data center through distributed machine learning; and simultaneously feeding back the personal information of the data main body stored in the Internet to the data main body.
Optionally, the distributed machine learning refers to: different machine learning components are started for different data centers, so that the efficiency of personal information learning and index establishment is accelerated; aiming at the data of the Internet, a machine learning component is independently started, so that the efficiency of personal information learning and index establishment is accelerated, and meanwhile, the machine learning component is isolated from the index of the first personal information of the data center, and the influence on each other in the index process of the second personal information is isolated; when the association index is established between the index of the first personal information of the data center and the index of the second personal information of the Internet, the independent machine learning component performs mixed calculation on the index of the first personal information and the index of the second personal information of the Internet to form the personal information association index.
Optionally, all the data related to the data main body request obtained in real time by the distributed machine learning is obtained by the distributed machine learning in an iterative manner:
performing machine learning on the changed data of the multi-source data center based on the established index of the first personal information of the data of the multi-source data center, establishing a new index of the first personal information, and replacing the old index of the first personal information by taking the new index of the first personal information as the index of the data of the multi-source data center; performing machine learning on the data changed in the internet based on the index of the second personal information established by the internet data, establishing a new index of the second personal information, and replacing the old index of the second personal information by using the new index as the index of the data of the internet; and analyzing the new index of the first personal information and the new index of the second personal information through distributed machine learning, and further establishing a new association index.
Optionally, the authenticating the data subject refers to:
the personal information submitted by the data main body is used as input, artificial intelligence analysis and comparison are carried out on the input personal information and an index of first personal information established by data of the multi-source data center through distributed machine learning, and a question needing to be answered by the data main body is returned; the data main body responds to the problem, response information is input and learned through a distributed machine, artificial intelligence analysis and comparison are conducted on the response information and the index of the first personal information established by the data of the data center again, and after verification is passed, identity authentication of the data main body is completed.
According to another aspect of the embodiments of the present application, there is provided a data body request system based on distributed machine learning, including:
a data subject request interaction platform comprising: the personal information index display unit: the index for displaying the personal information comprises an index of first personal information of a data center, an index of second personal information of the Internet and an associated index of the first personal information and the second personal information; the data main body identity authentication unit: the identity authentication system is used for authenticating the identity information submitted by the data body and performing authentication on the identity of the data body; data body request result unit: the system comprises a data body, a data processing unit and a data processing unit, wherein the data processing unit is used for auditing and confirming a request initiated by the data body, and automatically acquiring data requested by the data body or executing a deletion processing result requested by the data body after auditing is finished;
a distributed machine learning subsystem comprising: data center personal information index unit: the system comprises a data center, a task for initiating analysis of personal information, a task for analyzing the data belonging to the personal information class in the data center, and a task for establishing an index of the first personal information; internet personal information indexing unit: the index analysis system is used for initiating a task of analyzing the personal information of the Internet by taking the index of the first personal information of the data center as a basis to obtain the index of the second personal information of the Internet; an associated personal information index unit: performing correlation analysis on the index of the first personal information of the data center and the index of the second personal information of the Internet to establish a correlation index;
a data body request execution subsystem comprising: the data body requests the query unit: when the data body initiates a query request, after the data body identity authentication of the data body request interaction platform, the personal information requested by the data body is automatically queried to the data center and the Internet, and the query result is displayed through the data body request interaction platform; data body request deletion unit: when the data body initiates a deletion request, after the data body identity authentication of the data body request interaction platform, the personal information requested to be deleted by the data body in the data center is automatically deleted, and meanwhile, the related information of the data body request still stored in the internet is obtained, and the data body request interaction platform feeds back the information to the data body.
By adopting the technical scheme of the application, the following effects can be realized:
firstly, in a distributed machine learning subsystem, automatically analyzing and selecting data of a multi-source data center and data of the Internet through the distributed machine learning subsystem to obtain data characteristics of personal information, and establishing an index of first personal information of the data center, an index of second personal information of the Internet and a correlation index of the first personal information and the second personal information of the Internet; secondly, in the data main body request interactive platform, the data main body submits the personal information of the data main body for identity authentication of the data main body while initiating a request for inquiring the personal information; thirdly, the identity of the data subject is identified through a distributed machine learning subsystem until the identity passes the identification; fourthly, in the distributed machine learning subsystem, quickly responding to the query request initiated by the data main body, and feeding back the query result to the data main body request interaction platform; fifthly, in the distributed machine learning subsystem, a deletion request initiated by the data main body is quickly responded, information of the data main body requesting deletion is obtained from the data center, deletion is executed, and a deletion result is fed back to the data main body request interaction platform; and sixthly, in the distributed machine learning subsystem, obtaining information related to the information which is requested to be deleted by the data main body in the internet, and feeding back the result to the data main body request interaction platform.
By the technologies, on one hand, the control right of a data main body on personal information is ensured; on the other hand, the method ensures the compliance of personal information and enables enterprises to continue to use data to create value. Compared with the related art, the application has the advantages that:
the method and the device are based on distributed machine learning, and can help enterprises to quickly process inquiry and deletion requests initiated by data main bodies. Through artificial intelligence analysis, an index of first personal information of a data center, an index of second personal information of the Internet and an associated index of the first personal information, the second personal information and the unstructured data are established. When the data main body initiates a query and deletion request, the data related to the information requested by the data main body stored by an enterprise can be fed back, and meanwhile, the data related to the information requested by the data main body existing in the internet can be fed back. By means of the index of the first personal information of the data center and the associated index of the second personal information of the Internet, the enterprise can provide more comprehensive compliance support for the data main body, so that the data main body can rapidly and comprehensively master the personal information, and further take control measures.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of a system for distributed machine learning-based data body request, according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative method for creating a personal information index according to an embodiment of the present application;
FIG. 3 is a flow chart of an alternative data subject authentication according to embodiments of the present application;
FIG. 4 is a flow diagram of an alternative data body request query according to an embodiment of the present application;
fig. 5 is a flow diagram of an alternative data body request deletion according to an embodiment of the application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The inventor analyzes the related art to recognize that: with the increasing awareness of individuals on the protection of personal information and the emphasis of countries on the compliance of personal information, enterprises that grasp personal information need to respond to personal information processing, including requests for queries, deletions, and the like, made by individuals, i.e., data bodies. Meanwhile, after long-term accumulation, enterprises mastering personal information have huge data scale and various data formats, and it is a very difficult or even difficult task to find all information requested by a data main body from a large amount of abundant data. Especially, the state clearly defines that the data body has the right of knowing in the personal information protection law (draft), when the data body makes a request, the enterprise should make the request of the data body according to the law. It is important to create value for personal information grasped by an enterprise and ensure that a data subject enjoys legal rights to meet the requirements of the data subject, which becomes a problem that needs to be solved urgently by enterprise managers. Aiming at the processing of data main body requests, different measures are taken by enterprises, and the data main body requests can be mainly divided into three types:
1) when a data main body provides a query and deletion request to an enterprise, the enterprise needs to search relevant information from the position of stored data one by one in a manual searching mode, the labor cost is high, and meanwhile, the relevant information of the data main body is dispersed and cannot be found by manual searching; 2) when a data main body provides a deletion request to an enterprise, the enterprise does not perform integral association analysis when deleting information, and the information is directly deleted, so that the analysis and the use of the later associated data are possibly influenced; 3) when the data body provides inquiry and deletion requests to enterprises, identity authentication is not carried out on the data body, and the enterprises directly execute the data body requests, so that misoperation is caused, and even the right of misprocessing the data body is violated. As described above, in the conventional method for processing a data body request, an enterprise manager cannot find a suitable method to realize a business appeal, and therefore, the data body request based on distributed machine learning is an important means for an enterprise to create value on the premise of meeting the compliance of the data body request.
Based on the characteristics of large data volume, various data formats, rich data contents and high data value, the particularity of the enterprise for processing the data main body request is determined. At present, the processing of the data body request has the following limitations:
1) the information which is requested by the data main body is manually identified, which is feasible for a small amount of data, but the data volume is large, and the manual work cannot be completed; 2) the information requested by the data main body is identified by a conventional means of manual searching and deleting, which is feasible for simple data of contents, but the data contents are many and rich, and the reliability of the method is not guaranteed; 3) personal information has diversity, and the association between the personal information with various attributes can not be found manually; 4) data related to personal information is dispersed, when a data main body makes a request, a manager can hardly find out where all data related to the data main body are, correspondingly, when the data main body makes requirements such as inquiry, deletion and the like, the manager can not provide specific data for the data main body, and the goal of personal information compliance is violated; 5) The amount of personal information in the data is huge, and the quick inquiry and deletion of the personal information requested by the data main body are time-consuming works, or after long-time processing, only a part of the personal information is processed, and the whole personal information cannot be processed.
The method and the device ensure that the enterprise makes quick and compliant processing on the data main body request. The identification and the association of the personal information are automatically completed by artificial intelligence without manual participation of a user. The method analyzes and indexes all structured data, semi-structured data and unstructured data, and the data form covers data streams, data fragments and various documents. According to the data processing method and device, the data main body data mastered by an enterprise are subjected to compliance processing, and meanwhile the data main body masters the data related to the data main body on the Internet, so that a comprehensive request is provided for the data main body.
According to an aspect of an embodiment of the present application, there is provided a method for data body request based on distributed machine learning, including:
1) the data of the multi-source data center is analyzed through distributed machine learning, data features belonging to personal information classes are extracted from the data center containing various data, and indexes of the personal information classes are established for the data of the personal information classes. The index of the personal information class does not contain the original data or attribute of the personal information, and is only used for further analysis of distributed machine learning, and any component cannot deduce the original data or attribute of the personal information class through the index of the personal information.
2) According to the index of the first personal information of the multi-source data center, the data of the internet are analyzed through distributed machine learning, data characteristics which are related to the personal information of the multi-source data center and belong to a personal information class are extracted from a network containing various data, and the index of the personal information class is established for the data of the personal information class. The index of the personal information class does not contain the original data or attribute of the personal information, and is only used for further analysis of distributed machine learning, and any component cannot deduce the original data or attribute of the personal information class through the data characteristics of the personal information class.
3) The index of the first personal information extracted by the data center and the index of the second personal information extracted by the internet are analyzed in an artificial intelligence mode through distributed machine learning, and an associated index is established for the index of the first personal information extracted by the data center and the index of the second personal information extracted by the internet. The association index does not contain the original data or attributes of the personal information, and is only used for further management of distributed machine learning and processing of data body requests, and any component cannot deduce the original data or attributes of the personal information class through the index.
4) According to the index of the personal information, the storage condition of the personal information in the data center and the storage condition in the internet related to the data main body can be grasped, and the index is uniformly managed.
5) When the data main body provides a request for inquiring personal information, the data main body is identified according to the grasped index, the identified data main body is consistent with the personal information related to the index, real-time personal information inquiry is started through distributed machine learning, and an inquiry result is fed back to the data main body.
6) When the data main body provides a request for deleting the personal information, the data main body is subjected to identity authentication according to the grasped index, the data main body is identified to be consistent with the personal information associated with the index, real-time personal information query is started through distributed machine learning, and a query result is fed back to the data main body. And the data main body executes deletion to the submission confirmation according to the query result. Personal information requested by a data main body of the data center is further deleted through distributed machine learning. Personal information requested by a data body stored in the internet is simultaneously fed back to the data body.
Optionally, the multi-source data center includes a file server, a database, a private cloud data center, and a public cloud data center; the data format comprises structured data, semi-structured data and unstructured data; the form of the data may be a data stream, a data fragment, and various documents.
Optionally, the data in the internet includes all visible data in the network, and the data format includes structured data, semi-structured data and unstructured data; the form of the data may be a data stream, a data fragment, and various documents.
Alternatively, distributed machine learning, which enables different machine learning components for different data centers, accelerates the efficiency of personal information learning and index establishment. Aiming at data of the Internet, a machine learning component is independently started, so that the efficiency of personal information learning and index establishment is accelerated, meanwhile, the machine learning component is isolated from the index of the first personal information of the data center, and the influence on each other in the index process of the second personal information is isolated. When the association index is established between the index of the first personal information of the data center and the index of the second personal information of the Internet, the independent machine learning component performs mixed calculation on the index of the first personal information and the index of the second personal information of the Internet to form the personal information association index.
Optionally, all the data related to the data main body request obtained in real time by the distributed machine learning is obtained by the distributed machine learning in an iterative manner: quickly obtaining relevant data requested by a data main body through an index established by data of a data center; based on the established index of the data center data, machine learning is carried out on the data with changed data center, a new index is established, and meanwhile, data related to the data main body request is obtained; quickly obtaining related data requested by a data main body through an index established by data of the Internet; based on the established index of the internet data, machine learning is carried out on the data which changes in the internet, a new index is established, and meanwhile, the data related to the data main body request is obtained. By the method, all relevant data requested by the data main body are obtained.
Optionally, based on the index already established for the data of the data center, machine learning is performed on the data with the changed data of the data center, and a new index is established, wherein the new index is used as the index for the data of the data center to replace the old index. And performing machine learning on the data changed in the Internet based on the established index of the Internet data to establish a new index, wherein the new index is used as the index of the data of the Internet to replace the old index. And analyzing the new index through distributed machine learning, and further establishing a new associated index.
Optionally, an index is established for the personal information in an iterative manner, so that the processing speed of the next data main body request is increased.
Optionally, the identity authentication of the data subject is performed by taking the personal information submitted by the data subject as input, performing artificial intelligence analysis and comparison on the input personal information and an index of the first personal information established by the data of the data center through distributed machine learning, and returning a question to be answered by the data subject. The data main body responds to the problem, response information is input and learned through a distributed machine, artificial intelligence analysis and comparison are conducted on the response information and the index of the first personal information established by the data of the data center again, and after verification is passed, identity authentication of the data main body is completed.
Optionally, the data principal requests to query personal information, and after the identity authentication of the data principal is completed, the related personal information requested by the data principal is quickly and automatically obtained through distributed machine learning, including information related to the data principal request in the data center and information related to the data principal request in the internet.
Optionally, the data body requests deletion of personal information, after the identity authentication of the data body is completed, the relevant personal information requested by the data body is quickly and automatically obtained through distributed machine learning, the data body further confirms the personal information, and after the confirmation is completed, the deletion is further executed through distributed machine learning.
Optionally, the data body requests deletion of the personal information, and information related to the personal information requested to be deleted by the data body, which is stored in the internet, is fed back to the data body. The information fed back includes but is not limited to personal information content, quantity, network location (IP, URL), name of the business.
According to another aspect of the embodiments of the present application, there is also provided a data body request system based on distributed machine learning, including: the distributed machine learning subsystem is used for analyzing data of the data center and establishing an index of first personal information; analyzing the data of the Internet and establishing an index of the second person information; establishing a correlation index for an index of first personal information established for data of a data center and an index of second personal information established for data of the Internet; carrying out rapid real-time analysis and iteration on query and deletion requests initiated by a data main body; feeding the index back to the data main body request interaction platform, and displaying; the data main body request execution subsystem responds to the inquiry and deletion requests initiated by the data main body, feeds the results back to the data main body request interaction platform and displays the results to the data main body or the data main body; and the data main body request interaction platform is used for displaying the index of the personal information, the interaction of the data main body identity identification, and the processing results of the data main body request query and the data main body request deletion.
Fig. 1 is a schematic diagram illustrating constituent units in a data body request system based on distributed machine learning. The application provides a data main body request system based on distributed machine learning, which comprises:
the data main body request interaction platform comprises: the personal information index display unit: the index for displaying the personal information comprises an index of first personal information of a data center, an index of second personal information of the Internet and an associated index of the first personal information and the second personal information; the data main body identity authentication unit: the identity authentication system is used for authenticating the identity information submitted by the data body and performing authentication on the identity of the data body; data body request result unit: the data processing system is used for auditing and confirming the request initiated by the data subject, and automatically obtaining the data requested by the data subject or executing the deletion processing result requested by the data subject after auditing is finished.
The distributed machine learning subsystem includes: data center personal information index unit: initiating a task of analyzing personal information, analyzing data belonging to the personal information class in a data center, and establishing an index of first personal information; internet personal information indexing unit: taking the index of the first personal information of the data center as a basis, initiating a task of analyzing the personal information of the Internet, and obtaining the index of the second personal information of the Internet; an associated personal information index unit: and performing correlation analysis on the index of the first personal information of the data center and the index of the second personal information of the Internet to establish a correlation index. Whether the data itself is structured, semi-structured, or unstructured, whether in the form of data streams, data fragments, and various documents, the index can be built through a distributed machine learning subsystem.
The data body request execution subsystem comprises: the data body requests the query unit: when a data main body initiates a query request, after the data main body identity authentication of a data main body request interaction platform, automatically querying personal information requested by the data main body to a data center and the Internet, and displaying a query result through the data main body request interaction platform; data body request deletion unit: when a data main body initiates a deletion request, after the data main body identity authentication of a data main body request interaction platform, personal information requested to be deleted by the data main body in a data center is automatically deleted, meanwhile, related information of the data main body request still stored in the Internet is obtained, and the data main body request interaction platform feeds back the information to the data main body.
And respectively installing the data main body request interactive platform, the distributed machine learning subsystem and the data main body request execution subsystem in different computers. The data main body request interactive platform is installed on a computer for control, and the distributed machine learning subsystem and the data main body request execution subsystem are installed in a user computer center computer. The working method of the data body request system based on distributed machine learning is described in detail below.
First, a personal information index is automatically created.
And establishing a personal information index through a distributed machine learning subsystem. As shown in fig. 2, the working method comprises the following steps:
step 201, starting an analysis task of the distributed machine learning subsystem.
Step 202, performing artificial intelligence analysis on the personal information of the multi-source data center through the distributed machine learning subsystem to obtain an index of the first personal information.
And step 203, starting an analysis task of the distributed machine learning subsystem.
And step 204, carrying out artificial intelligence analysis on the personal information of the internet according to the index of the first personal information of the multi-source data center through the distributed machine learning subsystem to obtain a related personal information index, namely the index of the second personal information.
Step 205, establishing a correlation index through the distributed machine learning subsystem according to the index of the first personal information of the multi-source data center and the index of the second personal information of the internet.
Second, authentication of the data body request.
Before the data body initiates a query and deletion request, the data body submits information and identity authentication is carried out through the data body request interaction subsystem. As shown in fig. 3, the working steps are as follows:
in step 301, a data body initiates a request.
Step 302, the data body submits personal identity information.
Step 303, the identity authentication is started, and the data main body requests the interactive subsystem to automatically authenticate the personal identity information.
And step 304, completing identity authentication and confirming the identity authentication result.
And step 305, feeding back the identity authentication result to the data body.
Third, the data body requests a query.
The data main body initiates a request for inquiring the personal information, and further executes inquiry after passing the identity authentication. The data body request query processing flow shown in fig. 4:
at step 401, a data body initiates a query request.
Step 402, identity authentication and confirmation are carried out on the request through the data main body request interactive platform, the request initiated by the data main body is consistent with the identity of the data main body, the confirmation is successful, step 404 is executed, otherwise, step 403 is executed.
In step 403, the request initiated by the data body is inconsistent with the identity of the data body, and the query is not allowed to be executed on the personal information.
Step 404, confirming the query is continuously executed, and step 405 is continuously executed.
Step 405, the data main body request subsystem obtains data related to the information requested to be queried by the data main body in the multi-source data center according to the index of the first personal information of the multi-source data center.
In step 406, the data main body request subsystem obtains data related to the information requested to be queried by the data main body in the internet according to the index of the first personal information in the data center and the associated index of the second personal information in the internet.
Step 407, integrating the above information, and feeding back the query result to the data main body.
Fourth, the data body requests deletion.
The data body initiates a request for deleting the personal information, and further executes deletion after passing the identity authentication. The data body request deletion processing flow shown in fig. 5:
step 501, a data body initiates a delete request.
Step 502, identity authentication and confirmation are carried out on the request through the data main body request interactive platform, the request initiated by the data main body is consistent with the identity of the data main body, the confirmation is successful, step 504 is executed, otherwise, step 503 is executed.
In step 403, the request initiated by the data body is inconsistent with the identity of the data body, and the deletion of the personal information is not allowed.
And step 504, confirming to continue to execute deletion, and continuing to execute step 505.
And 505, the data main body request subsystem deletes the data related to the information which the data main body requests to delete in the data center according to the index of the first personal information of the data center.
Step 506, the data main body request subsystem obtains data related to the information requested to be deleted by the data main body in the internet according to the index of the first personal information of the data center and the associated index of the second personal information of the internet.
Step 507, synthesizing the above information, and feeding back the deleted result and the related information requested by the data main body still visible in the internet to the data main body.
According to another aspect of the embodiment of the application, an apparatus for implementing the method is also provided. The apparatus may include: the distributed machine learning index module is used for analyzing data of the data center and establishing an index of first personal information; analyzing the data of the Internet and establishing an index of the second person information; establishing a correlation index for an index of first personal information established for data of a data center and an index of second personal information established for data of the Internet; the data main body request execution module executes inquiry and deletion requests initiated by the data main body through distributed machine learning; the data main body request interaction module is used for initiating a request by the data main body and carrying out identity authentication on the data main body; when the data body initiates a deletion request, the personal information to be deleted in the internet which needs further processing is displayed.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program for instructing device-associated hardware, and the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (9)

1. A processing method of data body request based on distributed machine learning is characterized by comprising the following steps:
analyzing data of the multi-source data center through distributed machine learning, and establishing an index of first personal information in the multi-source data center; according to the index of the first personal information, analyzing the data of the Internet through distributed machine learning, and establishing an index of second personal information related to the first personal information in the Internet; establishing an index of the first personal information and an associated index of the second personal information through distributed machine learning;
receiving a data processing request of a data main body, and identifying the authenticity of the data main body through the index of the first personal information; confirming the request triggered by the data main body, acquiring all related data requested by the data main body in real time through the index of the first personal information, the index of the second personal information and the associated index through distributed machine learning, and executing the data processing request of the data main body;
the distributed machine learning means: different machine learning components are started for different data centers, so that the efficiency of personal information learning and index establishment is accelerated; aiming at the data of the Internet, a machine learning component is independently started, so that the efficiency of personal information learning and index establishment is accelerated, and meanwhile, the machine learning component is isolated from the index of the first personal information of the data center, and the influence on each other in the index process of the second personal information is isolated; when the association index is established between the index of the first personal information of the data center and the index of the second personal information of the Internet, the independent machine learning component performs mixed calculation on the index of the first personal information and the index of the second personal information of the Internet to form the personal information association index.
2. The method of claim 1, wherein analyzing data of a multi-source data center through distributed machine learning to build an index of first personal information in the multi-source data center comprises:
the method comprises the steps of analyzing data of a multi-source data center through distributed machine learning, extracting data characteristics belonging to a personal information class from the data center containing various data, and establishing an index of first personal information for the data of the personal information class, wherein the index of the first personal information does not contain original data or attributes of the personal information and is only used for further analysis of the distributed machine learning, and any component cannot deduce the original data or attributes of the personal information class through the index of the first personal information.
3. The method of claim 1 or 2, wherein analyzing the data of the internet by distributed machine learning according to the index of the first personal information to establish an index of a second personal information related to the first personal information in the internet comprises:
according to the index of the first personal information of the multi-source data center, the data of the internet are analyzed through distributed machine learning, the data characteristics which are related to the personal information of the multi-source data center and belong to the personal information class are extracted from a network containing various data, and the index of second personal information is established for the data of the personal information class, wherein the index of the second personal information does not contain the original data or attribute of the personal information and is only used for further analysis of the distributed machine learning, and any component cannot deduce the original data or attribute of the personal information class through the data characteristics of the second personal information class.
4. The method of claim 1 or 2, wherein establishing the associated index of the first personal information and the index of the second personal information by distributed machine learning comprises:
the index of the first personal information extracted by the multi-source data center and the index of the second personal information extracted in the Internet are analyzed in an artificial intelligence mode through distributed machine learning, and an associated index is established for the index of the first personal information extracted by the multi-source data center and the index of the second personal information extracted in the Internet, wherein the associated index does not contain original data or attributes of the personal information and is only used for further management of the distributed machine learning and processing of a data main body request, and any component cannot deduce the original data or attributes of the personal information through the index.
5. The method according to claim 1 or 2, wherein the data processing request of the data body comprises a query request, and the query request is processed by the following method:
receiving a request of inquiring personal information from a data main body, carrying out identity authentication on the data main body, starting real-time personal information inquiry through distributed machine learning if the data main body is consistent with the personal information associated with the index of the first personal information, and feeding back an inquiry result to the data main body, wherein the inquiry result comprises the personal information storage condition of the data main body and the personal information storage condition associated with the data main body in the internet.
6. The method according to claim 1 or 2, wherein the data processing request of the data body comprises a deletion request, and the deletion request is processed by the following method:
receiving a request for deleting personal information, which is provided by a data main body, identifying the data main body according to the grasped index, starting real-time personal information query through distributed machine learning if the data main body is consistent with the personal information associated with the index of the first personal information, and feeding back a query result to the data main body, wherein the query result comprises the personal information storage condition of the data main body and the personal information storage condition associated with the data main body in the internet;
receiving a confirmation execution deleting instruction submitted by the data main body according to the query result, and deleting personal information requested by the data main body and stored in the multi-source data center through distributed machine learning; and simultaneously feeding back the personal information of the data main body stored in the Internet to the data main body.
7. The method of claim 1 or 2, wherein all data relevant to the data subject request obtained in real time by distributed machine learning is obtained by distributed machine learning in an iterative manner:
performing machine learning on the changed data of the multi-source data center based on the established index of the first personal information of the data of the multi-source data center, establishing a new index of the first personal information, and replacing the old index of the first personal information by taking the new index of the first personal information as the index of the data of the multi-source data center; performing machine learning on the data changed in the internet based on the index of the second personal information established by the internet data, establishing a new index of the second personal information, and replacing the old index of the second personal information by using the new index as the index of the data of the internet; and analyzing the new index of the first personal information and the new index of the second personal information through distributed machine learning, and further establishing a new association index.
8. The method according to claim 1 or 2, wherein the authentication of the data subject is:
the personal information submitted by the data main body is used as input, artificial intelligence analysis and comparison are carried out on the input personal information and an index of first personal information established by data of the multi-source data center through distributed machine learning, and a question needing to be answered by the data main body is returned; the data main body responds to the problem, response information is input and learned through a distributed machine, artificial intelligence analysis and comparison are conducted on the response information and the index of the first personal information established by the data of the data center again, and after verification is passed, identity authentication of the data main body is completed.
9. A distributed machine learning-based data body request system, comprising:
a data subject request interaction platform comprising: the personal information index display unit: the index for displaying the personal information comprises an index of first personal information of a data center, an index of second personal information of the Internet and an associated index of the first personal information and the second personal information; the data main body identity authentication unit: the identity authentication system is used for authenticating the identity information submitted by the data body and performing authentication on the identity of the data body; data body request result unit: the system comprises a data body, a data processing unit and a data processing unit, wherein the data processing unit is used for auditing and confirming a request initiated by the data body, and automatically acquiring data requested by the data body or executing a deletion processing result requested by the data body after auditing is finished;
a distributed machine learning subsystem comprising: data center personal information index unit: the system comprises a data center, a task for initiating analysis of personal information, a task for analyzing the data belonging to the personal information class in the data center, and a task for establishing an index of the first personal information; internet personal information indexing unit: the index of the first personal information of the data center is used as a basis, a task of analyzing the personal information of the Internet is initiated, and the index of the second personal information of the Internet is obtained; an associated personal information index unit: performing correlation analysis on the index of the first personal information of the data center and the index of the second personal information of the Internet to establish a correlation index;
a data body request execution subsystem comprising: the data body requests the query unit: when the data body initiates a query request, after the data body identity authentication of the data body request interaction platform, the personal information requested by the data body is automatically queried to the data center and the Internet, and the query result is displayed through the data body request interaction platform; data body request deletion unit: when a data main body initiates a deletion request, after data main body identity authentication of a data main body request interaction platform, personal information requested to be deleted by the data main body in a data center is automatically deleted, and meanwhile, related information of a data main body request still stored in the Internet is obtained and fed back to the data main body through the data main body request interaction platform;
the distributed machine learning means: different machine learning components are started for different data centers, so that the efficiency of personal information learning and index establishment is accelerated; aiming at the data of the Internet, a machine learning component is independently started, so that the efficiency of personal information learning and index establishment is accelerated, and meanwhile, the machine learning component is isolated from the index of the first personal information of the data center, and the influence on each other in the index process of the second personal information is isolated; when the association index is established between the index of the first personal information of the data center and the index of the second personal information of the Internet, the independent machine learning component performs mixed calculation on the index of the first personal information and the index of the second personal information of the Internet to form the personal information association index.
CN202110638898.5A 2021-06-08 2021-06-08 Data main body request processing method and system based on distributed machine learning Active CN113094376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110638898.5A CN113094376B (en) 2021-06-08 2021-06-08 Data main body request processing method and system based on distributed machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110638898.5A CN113094376B (en) 2021-06-08 2021-06-08 Data main body request processing method and system based on distributed machine learning

Publications (2)

Publication Number Publication Date
CN113094376A CN113094376A (en) 2021-07-09
CN113094376B true CN113094376B (en) 2021-09-03

Family

ID=76664525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110638898.5A Active CN113094376B (en) 2021-06-08 2021-06-08 Data main body request processing method and system based on distributed machine learning

Country Status (1)

Country Link
CN (1) CN113094376B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111033494A (en) * 2017-08-21 2020-04-17 费赛特实验室有限责任公司 Computing architecture for multiple search robots and behavioral robots, and related devices and methods
CN111723110A (en) * 2019-03-21 2020-09-29 阿里巴巴集团控股有限公司 Distributed cache system, and associated query and update method, device and storage medium
CN111860313A (en) * 2020-07-20 2020-10-30 长视科技股份有限公司 Information query method and device based on face recognition, computer equipment and medium
CN112181969A (en) * 2020-10-11 2021-01-05 北京维应科技有限责任公司 Second-hand book edition recognition device based on NLP and image recognition

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095500B2 (en) * 2003-06-13 2012-01-10 Brilliant Digital Entertainment, Inc. Methods and systems for searching content in distributed computing networks
EP3079116A1 (en) * 2015-04-10 2016-10-12 Tata Consultancy Services Limited System and method for generating recommendations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111033494A (en) * 2017-08-21 2020-04-17 费赛特实验室有限责任公司 Computing architecture for multiple search robots and behavioral robots, and related devices and methods
CN111723110A (en) * 2019-03-21 2020-09-29 阿里巴巴集团控股有限公司 Distributed cache system, and associated query and update method, device and storage medium
CN111860313A (en) * 2020-07-20 2020-10-30 长视科技股份有限公司 Information query method and device based on face recognition, computer equipment and medium
CN112181969A (en) * 2020-10-11 2021-01-05 北京维应科技有限责任公司 Second-hand book edition recognition device based on NLP and image recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于DNS大数据分析实现宽带共享监控系统架构设计;谭畅;《科技传播》;20171008(第19期);全文 *
医疗大数据搜索系统的建设与应用;郝梅 等;《医疗卫生装备》;20190215;第40卷(第02期);全文 *
基于对等计算的分布式时空查询处理系统的设计及应用研究;杨晓亮等;《计算机应用研究》;20110415(第04期);全文 *

Also Published As

Publication number Publication date
CN113094376A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
US11405468B2 (en) Forming activity streams across heterogeneous applications
US11132643B1 (en) Systems and methods for managing data in remote huddle sessions
US10565240B2 (en) Systems and methods for document analytics
CN110535660A (en) A kind of evidence obtaining service system based on block chain
EP2922273A1 (en) Resource sharing method, apparatus, system, and terminal, and resource management center
CN103353899B (en) The accurate searching method of a kind of integrated information
US20100023541A1 (en) Universal Data Management Interface
EP3232335B1 (en) Method and device for providing authentication information on web page
JP2009003549A (en) Data management device, data management method, data management program, and data management program storage medium
US8788460B2 (en) Exploring attached and unattached content databases
CN113377876B (en) Data database processing method, device and platform based on Domino platform
JP2008117220A (en) User management system, user management program and user management method
WO2022057525A1 (en) Method and device for data retrieval, electronic device, and storage medium
US20240232420A9 (en) System and method of dynamic search result permission checking
CN113094376B (en) Data main body request processing method and system based on distributed machine learning
CN112632489A (en) Police lockset knowledge sharing method and system based on wiki technology
CN117235400A (en) Unified multi-platform portal system based on Kafka technology
JP5224839B2 (en) Document management system, document management apparatus, document management method, and program
EP3794457A1 (en) Recommending secured content
JP2010079444A (en) File management method and system by metadata
US10719541B2 (en) Method and system to capture and find information and relationships
CN113128958A (en) Electronic archive management method based on enterprise chain code
CN109491800B (en) File pushing system and file pushing method based on product structure
CN111400556A (en) Data query method and device, computer equipment and storage medium
CN112783842A (en) Log collection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant