WO2022269504A1 - System and method for privacy risk assessment and mitigatory recommendation - Google Patents

System and method for privacy risk assessment and mitigatory recommendation Download PDF

Info

Publication number
WO2022269504A1
WO2022269504A1 PCT/IB2022/055779 IB2022055779W WO2022269504A1 WO 2022269504 A1 WO2022269504 A1 WO 2022269504A1 IB 2022055779 W IB2022055779 W IB 2022055779W WO 2022269504 A1 WO2022269504 A1 WO 2022269504A1
Authority
WO
WIPO (PCT)
Prior art keywords
privacy
data
risk
module
datasets
Prior art date
Application number
PCT/IB2022/055779
Other languages
French (fr)
Inventor
Abilash Soundararajan
Original Assignee
Abilash Soundararajan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Abilash Soundararajan filed Critical Abilash Soundararajan
Publication of WO2022269504A1 publication Critical patent/WO2022269504A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/67ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation

Abstract

A system and method for privacy risk assessment and mitigatory recommendation is disclosed. The system includes a processing subsystem including an input module (40), which receives multiple datasets for risk assessment, data collaboration context(s), and privacy categorization preference(s), or an approval for auto- categorization, a data categorization module (50), which identifies multiple features of attribute(s) and does auto -classification of them as per privacy requirements and preferences, a privacy risk assessment module (60), which identifies proportion of the multiple datasets at risk of being attacked by performing multiple privacy attack simulations and generates a unified privacy risk score corresponding to a risk level associated, an impact assessment module (70), which compares the unified privacy risk score with an entity necessity for impact assessment and create visualization(s), a recommendation module (80), which generates privacy regulation-aware recommendation(s) for - technical safeguards, regulatory compliance and cross-border transfer of data, and a report generation module (90), which automates reporting and API based privacy risk information sharing.

Description

SYSTEM AND METHOD FOR PRIVACY RISK ASSESSMENT AND MITIGATORY RECOMMENDATION
EARLIEST PRIORITY DATE:
This Application claims priority from a Provisional patent application filed in India having Patent Application No. 202141028231, filed on June 23, 2021, and titled “SYSTEM AND METHOD TO IDENTIFY PRIVACY RISK IN SHARING, STORING OR PROCESSING OF DATA”.
FIELD OF INVENTION
Embodiments of a present disclosure relate to a field of privacy of data in collection, storage, computing, and data collaboration, and more particularly to a system and a method for privacy risk assessment and mitigatory recommendation.
BACKGROUND
With the rapid advancement of technology, practically every aspect of our daily lives is becoming digital. Certain data, such as personal data, professional data, and the like, must be transferred between one or more entities in order to permit the digitalization of activities. Many times, during this process, applications on which the data has been shared may use such content and information to manipulate thoughts, emotions, behaviors, day-to-day activities or practices, relationships, lifestyle, mobility, political views, or the like associated with the corresponding entities using a set of algorithms imbibed in such digital platform application.
These gathered data are then analyzed and used to build a set of alternatives that entities can choose from in order for these apps to comprehend and manage their behavior. Because of these restrictions, such programs may be able to profile one or more users and digitally attack their privacy, making the system less dependable and user-friendly. Controlling the use of digital data has become a crucial demand as a result of such behaviors, as privacy has become a basic human right around the world.
Multiple methods of de-identification are being used today which neither effectively protect the privacy of the data subjects nor preserve the utility of the data. These de- identified data, however, can be re -identified, making the anonymization or the implemented privacy preservation process useless. Furthermore, these approaches do not ensure the privacy of the data given by the businesses. Because privacy is such a new technology, few people are aware of the various levels of privacy concerns associated with certain data when it is shared, stored, or processed.
Hence, there is a need for an improved system and method for privacy risk assessment and mitigatory recommendation which addresses the aforementioned issues.
BRIEF DESCRIPTION
In accordance with one embodiment of the disclosure, a system for privacy risk assessment and mitigatory recommendation is provided. The system includes a processing subsystem hosted on a server. The processing subsystem is configured to execute on a network to control bidirectional communications among a plurality of modules. The processing subsystem includes an input module. The input module is configured to receive a plurality of datasets from at least one of one or more entities and one or more data sources in real-time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration. The input module is also configured to receive at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto- categorization of the one or more attributes from the one or more entities. The processing subsystem also includes a data categorization module operatively coupled to the input module. The data categorization module is configured to identify a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence-based technique and one or more feature extraction techniques. The data categorization module is also configured to categorize the one or more attributes under one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto-categorization. Further, the processing subsystem also includes a privacy risk assessment module operatively coupled to the data categorization module. The privacy risk assessment module is configured to identify a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories. The privacy risk assessment module is also configured to generate a unified privacy risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks. Furthermore, the processing subsystem also includes an impact assessment module operatively coupled to the privacy risk assessment module. The impact assessment module is configured to compare the risk identified and the unified privacy risk score generated via the privacy risk assessment module with an entity necessity for data collection, storage, processing, and data collaboration using one or more statistical techniques along with an artificial intelligence-based technique. The impact assessment module is also configured to create one or more visualizations corresponding to a balance identified based on the comparison, wherein the one or more visualizations assist in Data protection impact assessment decision making. The processing subsystem further includes a recommendation module operatively coupled to the impact assessment module. The recommendation module is configured to generate one or more privacy regulation- aware recommendations using a machine learning technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware the one or more privacy regulation-aware recommendations are corresponding to at least one of technical safeguards, regulatory compliance, and cross-border transfer of the data. Moreover, the processing subsystem also includes a report generation module operatively coupled to the recommendation module. The report generation module is configured to generate a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems. The privacy risk assessment report corresponds to at least one of documentation, an Application Programming Interface-based risk information tagging, and an information flow along with the data.
In accordance with another embodiment, a method for privacy risk assessment and mitigatory recommendation is provided. The method includes receiving a plurality of datasets from at least one of one or more entities and one or more data sources in real time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration. The method also includes receiving at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto-categorization of the one or more attributes from the one or more entities. Further, the method also includes identifying a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence-based technique and one or more feature extraction techniques. Furthermore, the method also includes categorizing the one or more attributes under one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto-categorization. Moreover, the method also includes identifying a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories. The method further includes generating a unified privacy risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks . The method further includes comparing the risk identified and the unified privacy risk score generated via the privacy risk assessment module with an entity necessity for data collection, storage, processing, and data collaboration using one or more statistical techniques along with an artificial intelligence-based technique. Additionally, the method also includes creating one or more visualizations corresponding to a balance identified based on the comparison, wherein the one or more visualizations assist in Data protection impact assessment decision making. The method also includes generating one or more privacy regulation-aware recommendations using a machine learning technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware. The method also includes generating a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems.
To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure will follow by reference to specific embodiments thereof, which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure will be described and explained with additional specificity and detail with the appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:
FIG. 1 is a block diagram representation of a system for privacy risk assessment and mitigatory recommendation in accordance with an embodiment of the present disclosure; FIG. 2 is a block diagram representation of an exemplary embodiment of a system for privacy risk assessment and mitigatory recommendation of FIG. 1 in accordance with an embodiment of the present disclosure;
FIG. 3 is a block diagram of a privacy risk assessment computer or a privacy risk assessment server in accordance with an embodiment of the present disclosure; and FIG. 4 (a) is a flow chart representing steps involved in a method for privacy risk assessment and mitigatory recommendation in accordance with an embodiment of the present disclosure; and
FIG. 4 (b) is a flow chart representing continued steps involved in a method of FIG. 4 (a) in accordance with an embodiment of the present disclosure. Further, those skilled in the art will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been drawn to scale. Furthermore, in terms of the constmction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.
DETAILED DESCRIPTION
For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub-systems orelements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.
In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. Embodiments of the present disclosure relate to a system for privacy risk assessment and mitigatory recommendation. As used herein, the term “privacy risk” is defined as the likelihood that individuals will experience problems resulting from data processing, and the impact should they occur. Further, as used herein, the term “privacy risk assessment” refers to an assessment that is needed to ensure that you accurately measure and manage the risk to your customers and keep your organization compliant with global data protection regulations. Further, the system described hereafter in FIG. 1 is the system for privacy risk assessment and mitigatory recommendation .
FIG. 1 is a block diagram representation of a system (10) for privacy risk assessment and mitigatory recommendation in accordance with an embodiment of the present disclosure. The system (10) includes a processing subsystem (20) hosted on a server (30). In one embodiment, the server (30) may include a cloud server. In another embodiment, the server (30) may include a local server. The processing subsystem (20) is configured to execute on a network (not shown in FIG. 1) to control bidirectional communications among a plurality of modules. In one embodiment, the network may include a wired network such as a local area network (LAN). In another embodiment, the network may include a wireless network such as wireless fidelity (Wi-Fi), Bluetooth, Zigbee, near field communication (NFC), infrared communication, or the like.
Basically, every organization may have a group of people responsible for overseeing an organization’s data protection strategy and its implementation to ensure compliance with a plurality of privacy regulations across the globe like General Data Protection Regulation (GDPR) requirements. In one embodiment, the group of people may include a legal person, a data protection officer (DPO), a chief information security officer (CISO), a consultant, or the like. Further, such a group of people or the organization may use the system (10) to evaluate and visualize one or more risks involved in a dataset as part of a Data Protection Impact Assessment done before storing, processing, or sharing of personal data between one or more parties. Thus, the group of people may become a user of the system (10). Further, in an embodiment, the one or more parties may include healthcare, electronic commerce (e-commerce), telecommunication, automobile, and the like. Moreover, for the user to be able to use the system (10), the user may have to be registered with the system (10). Therefore, in an embodiment, the processing subsystem (20) may include a registration module (as shown in FIG. 2). The registration module may be configured to register the user with the system (10) upon receiving a plurality of user details via a user device. In one embodiment, the plurality of user details may include at least one of a name, contact details, an organization name, and the like corresponding to the user. Further, the plurality of user details may be stored in a database. In one embodiment, the database may include a local database or a cloud database. Also, in an embodiment, the user device may include a mobile phone, tablet, a laptop, or the like.
Further, for the system (10) to be able to assist the user in evaluating and visualizing the one or more risks involved in storing, processing, or sharing of personal data, the system (10) may have to receive certain inputs. Therefore, the processing subsystem (20) includes an input module (40). The input module (40) may be operatively coupled to the registration module. The input module (40) is configured to receive a plurality of datasets from at least one of one or more entities and one or more data sources in real-time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration. In one exemplary embodiment, the one or more entities may include the user, the one or more parties, the organization, an institute, or the like. In an embodiment, the plurality of datasets may include personal data, healthcare-related data, finance-related data, education-related data, or the like.
In an embodiment, when the plurality of datasets may be received from the one or more entities, the plurality of datasets may be received via a user interface (UI) associated with an entity device. In one embodiment, the entity device may be the user device. In another embodiment, when the plurality of datasets may be received from the one or more data sources, the system (10) may be linked with a data pipeline and receive the plurality of datasets via an Application Programming Interface (API). Therefore, in such an embodiment, as the plurality of datasets enters the data pipeline, the plurality of datasets may be received by the input module (40). In one exemplary embodiment, the one or more data sources may include at least one of an internal source, an external source, or a combination thereof.
Upon receiving, the plurality of datasets to be transmitted between the one or more entities, certain conditions or system requirements may have to be taken care. Therefore, the input module (40) is also configured to receive at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto-categorization of the one or more attributes from the one or more entities.
Upon receiving the plurality of datasets to be assessed for one or more privacy risks, one or more parts of the plurality of datasets may have to be analyzed and classified under one or more categories. Therefore, the processing subsystem (20) also includes a data categorization module (50) operatively coupled to the input module (40). The data categorization module (50) is configured to identify a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence (Al)-based technique and one or more feature extraction techniques. The data categorization module (50) is also configured to categorize the one or more attributes under the one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto-categorization.
In one exemplary embodiment, the one or more attributes may include name, address, contact details, a photo, a passport number, Location data, and the like. Basically, the plurality of features associated with one or more attributes may be identified using the AI-based technique and the feature extraction technique, for the system (10) to be able to classify the one or more attributes under the one or more categories. Therefore, in one embodiment, the one or more categories may include Personal Identifiable Information (PII), Protected Health Information (PHI), quasi-identifiers, statistical data-text, statistical data-numeric, and the like. In one embodiment, the PII may include login identities, digital images, social media posts, and the like. Similarly, in an embodiment, the PHI may include full names, one or more dates, bank account numbers, social security numbers, and the like. Further, the in an embodiment, the quasi-identifiers may include name, age, postcode, gender, location, and the like. In one exemplary embodiment, the statistical data may include statistical data-text, statistical data-numeric, sensitive statistical data, and the like. Also, in an embodiment, the statistical data may include weight, height, length, volume, and the like. As used herein, the term “artificial intelligence” is defined as a branch of computer sciences that emphasizes the development of intelligent machines, thinking, and working like humans. Also, as used herein, the term “feature extraction technique” is defined as a type of dimensionality reduction where a large number of pixels of the image are efficiently represented in such a way that interesting parts of the image are captured effectively. Further, in an embodiment, the AI-based technique may include a machine learning (ML)-based technique, an image processing technique, a natural language processing (NLP) technique, or the like. As used herein, the term “machine learning” is defined as a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy. Moreover, as used herein, the term “image processing” is defined as a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. Also, as used herein, the term “natural language processing” is defined as a branch of AI giving computers the ability to understand the text and spoken words in much the same way human beings can.
In one embodiment, the classification of the plurality of datasets under the one or more categories may be a dynamic classification. As used herein, the term “dynamic classification” refers to a type of classification in which the system (10) performs the classification of the data upon receiving certain insights about specific categories corresponding to the classification of the data based on a different scenario from the user dynamically. For example, in a specific scenario, the location may be the quasi identifier but the statistical data. Similarly, in a specific scenario, the age may not be the statistical data but the quasi-identifier, and the like. So, in such cases, the user may define the basis of classifying the one or more attributes under the one or more categories.
In one exemplary embodiment, the data categorization module (50) may also be configured to assign an appropriate attribute level privacy risk score for each of the one or more categories using an AI-based technique.
Upon categorizing, the one or more risks that may be associated with the plurality of datasets may become easy to identify. Therefore, the processing subsystem (20) also includes a privacy risk assessment module (60) operatively coupled to the data categorization module (50). The privacy risk assessment module (60) is configured to identify a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories. The privacy risk assessment module (60) is also configured to generate a unified privacy risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks.
In one exemplary embodiment, the one or more privacy attacks may include a singling out attack, a linkage attack, an inference attack, a data breach, an unconsented monitoring, an unconsented processing, a collective privacy attack, a background knowledge attack, a re-identification attack, a de-anonymization attack, or the like. Basically, identifying the proportion of the plurality of datasets at risk of being attacked by the one or more privacy attacks may refer to identifying the one or more categories at risk of being attacked by the one or more privacy attacks.
Further, as used herein, the term “privacy attack simulation” refers to a model that mimics the operation of an existing or proposed system such as a privacy attack, providing evidence for decision-making by being able to test different scenarios or process changes. In an embodiment, the privacy attack simulation may include at least one of a re-identification attack simulation, a de-anonymization attack simulation, a singling out attack simulation, a linkage attack simulation, an inference attack simulation, a homogeneity attack simulation, a background knowledge attack simulation, a reconstruction attack simulation, a tracing attack simulation, a collective privacy attack simulation, and the like.
Upon performing the privacy attack simulation, an individual risk score of each of the one or more privacy attacks is allocated to each of the one or more categories, thereby providing information about a percentage of risk of being attacked by the one or more privacy attacks that the one or more categories possess. Further, based on the attribute level privacy risk score for each of the one or more categories and the one or more categories of the plurality of datasets, the one or more risk assessment techniques may be implemented onto the plurality of datasets for each of the one or more privacy attacks. In one exemplary embodiment, the one or more risk assessment techniques may include one or more privacy-enhancing technology-based verification techniques including at least one of a k- Anonymity test to verify a level of aggregation, a t- Closeness test to verify a level of diversity within a class, an outlier analysis, a unique re-identifiable pattern identification, and the like. In one embodiment, the unified privacy risk score may be generated in a range of about 0 to about 100. Basically, the plurality of datasets may be at high risk when the unified privacy risk score may be greater than 50, and the plurality of datasets may be at a low risk when the unified privacy risk score may be less than 50.
Subsequently, in an embodiment, based on the unified privacy risk score obtained, certain insights may be obtained. Therefore, the processing subsystem (20) also includes an impact assessment module (70) operatively coupled to the privacy risk assessment module (60). The impact assessment module (70) is configured to compare the risk identified and the unified privacy risk score generated via the privacy risk assessment module (60) with an entity necessity for data collection, storage, processing, and data collaboration using one or more statistical techniques along with an AI-based technique. The impact assessment module (70) is also configured to create one or more visualizations corresponding to a balance identified based on the comparison. The one or more visualizations may assist in Data protection impact assessment decision-making.
In one exemplary embodiment, the one or more visualizations may include at least one of a risk visualization, a relationship visualization, and the like. The risk visualization may include a generation of a likelihood-severity matrix. The relationship visualization may include a graph-based visualization. In an embodiment, the likelihood- severity matrix may be generated by the impact assessment module (70) by implementing a plurality of steps. The plurality of steps may include generating a likelihood score in real-time, upon identifying a likelihood of re-identification of an individual for the corresponding plurality of datasets based on the risk score associated with the one or more categories. The plurality of steps may further include generating a severity score, upon identifying a severity of identification of the individual by correlating with the one or more privacy attacks identified to attack the plurality of datasets, upon generating the likelihood score. Further, the plurality of steps may also include generating a likelihood- severity matrix upon determining a correlation between the likelihood score and the severity score, for determining an impact of the risk on a privacy of the individual.
In an embodiment, the individual may be a person to whom the plurality of datasets may belong. For example, if the likelihood score and the severity score are high, then the one or more categories at risk of being attacked by the one or more privacy attacks may be PII, PHI, and the like, as the likelihood of identification of the individual is very high and the severity of the one or me privacy attacks is also very high when the personal information of the individual is known. Also, in one exemplary embodiment, the likelihood-severity matrix may be color-coded.
Further, in an embodiment, the impact assessment module (70) may also be configured to combine the one or more data collaboration contexts, along with the unified privacy risk score, for breaking down the risk of being attacked by the one or more privacy attacks and assist in Data protection impact assessment decision making.
The processing subsystem (20) further includes a recommendation module (80) operatively coupled to the impact assessment module (70). The recommendation module (80) is configured to generate one or more privacy regulation-aware recommendations using a ML technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware. The one or more privacy regulation-aware recommendations are corresponding to at least one of technical safeguards, regulatory compliance, cross-border transfer of the data, and the like.
Basically, in an embodiment, the recommendation module (80) may receive the plurality of regulatory requirements for one or more geographical locations based on the risk score, upon creating the one or more visualizations. The recommendation module (80) may then generate a recommendation model by training the recommendation model with at least one of the corresponding plurality of regulatory requirements and a plurality of risk mitigation techniques for several risk scores using an ML technique. The recommendation module (80) may finally generate the one or more privacy regulation-aware recommendations at every stage of a data life cycle, by mapping the risk score with the plurality of regulatory requirements using the recommendation model.
In one exemplary embodiment, the technical safeguards may include one or more preferred risk mitigation techniques for mitigation of the risk for data collaboration. In an embodiment, the one or more preferred risk mitigation techniques may include context-based anonymization, soft anonymization, moderate anonymization^ strong anonymization, anonymization with differential privacy, or the other input privacy/ output privacy technologies.
Similarly, in an embodiment, the one or more privacy regulation-aware recommendations corresponding to the regulatory compliance may correspond to recommendations that may be given to the user, when the user may be willing to continue to perform the data collaboration even with the risk score being high, so that the privacy of the individual may still be protected.
In addition, in an embodiment, the one or more privacy regulation-aware recommendations corresponding to the cross-border transfer of the data may correspond to recommendations that may be given to the user, when the user may be willing to perform cross-border data collaboration, so that the privacy of the individual may be protected.
Additionally, the processing subsystem (20) also includes a report generation module (90) operatively coupled to the recommendation module (80). The report generation module (90) is configured to generate a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems. In one exemplary embodiment, the one or more privacy-aware data ecosystems may include at least one of one or more data pipelines, one or more data lakes, cloud data flows, and on-premises application data flows. The privacy risk assessment report corresponds to at least one of documentation, an API-based risk information tagging, and an information flow along with the data. The privacy risk assessment report may include information about at least one of the unified privacy risk score, one or more privacy attack probabilities, the one or more visualizations, the one or more privacy regulation-aware recommendations, and one or more additional insights. Upon generating, the privacy risk assessment report, the corresponding privacy risk assessment report may be stored in the database, and may be available for the one or more entities or the user to go through and take appropriate actions to mitigate the risk to enable safe data collaboration between the one or more entities. Further, the user may also be alerted by sending one or more alerts to the user on the user device via a communication medium. In one embodiment, the one or more alerts may be in one or more forms such as, but not limited to, an alarm, a text message, an email, or the like. Also, in an embodiment, the communication medium may be a wired medium or a wireless medium.
FIG. 2 is a block diagram representation of an exemplary embodiment of the system (10) for performing the privacy risk assessment of the data of FIG. 1 in accordance with an embodiment of the present disclosure. Considering a non-limiting example in which the system (10) is used in a healthcare field. The system (10) includes the processing subsystem (20) hosted on a cloud server (92). Suppose hospital authorities (94) of a hospital ‘A’ (96) receive a request for patients-medical data of one or more patients (98) from a pharma company ‘B’ (100). The hospital authorities (94) are willing to check the privacy risk of the corresponding patients-medical data prior to sharing it with the pharma company ‘B’ (100). Further, for the hospital authorities (94) to be able to use the system (10) for checking the privacy risk, the hospital authorities (94) register with the system (10) via the registration module (110) upon providing a plurality of hospital details via a hospital laptop (120). The plurality of hospital details is stored in a cloud database (130).
Further, the patients’ medical data is received by the system (10) via the input module (40) along with the context-specific privacy risk assessment requirements. Later, the system (10) categories one or more attributes associated with the patients-medical data into the one or more categories such as PII, PHI, quasi-identifier, and the like via the data categorization module (50). Upon categorization, an appropriate attribute level privacy risk score for each of the one or more categories is also assigned. Then, a risk level associated with the one or more categories for the one or more privacy attacks is identified by generating a unified privacy risk score via the privacy risk assessment module (60). Furthermore, the impact of the risk on the privacy of the one or more patients (98) when the patients-medical data is shared with the pharma company Έ’ (100) is evaluated based on the unified privacy risk score by generating the one or more visualizations such as the likelihood-severity matrix via the impact assessment module (70).
Moreover, based on the impact evaluated, the one or more privacy regulation-aware recommendations are generated for the hospital authorities (94) either for mitigating the risk, following certain compliance, or following cross-border transfer-related regulations via the recommendation module (80). Finally, the privacy risk assessment report having all the above-mentioned results is generated and shared with the hospital authorities (94) for future reference via the report generation module (90). Thus, this is how the privacy risk assessment of the patients-medical data is performed.
FIG. 3 is a block diagram of a privacy risk assessment computer or a privacy risk assessment server (170) in accordance with an embodiment of the present disclosure. The privacy risk assessment server (170) includes processor(s) (180), and a memory (190) operatively coupled to abus (200). The processor(s) (180), as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instmction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
Computer memory elements may include any suitable memory device(s) for storing data and executable program, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, hard drive, removable media drive for handling memory cards and the like. Embodiments of the present subject matter may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Executable program stored on any of the above-mentioned storage media may be executable by the processor(s) (180).
The memory (190) includes a plurality of subsystems stored in the form of executable program which instructs the processor(s) (180) to perform method steps illustrated in FIG. 1. The memory (190) includes a processing subsystem (20) of FIG 1. The processing subsystem (20) further has following modules: an input module (40), a data categorization module (50), a privacy risk assessment module (60), an impact assessment module (70), a recommendation module (80), and a report generation module (90).
The input module (40) is configured to collect a plurality of datasets from at least one of one or more entities and one or more data sources in real-time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration. The input module (40) is also configured to receive at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto-categorization of the one or more attributes from the one or more entities.
The data categorization module (50) is configured to identify a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence-based technique and one or more feature extraction techniques. The data categorization module (50) is also configured to categorize the one or more attributes under one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto-categorization.
The privacy risk assessment module (60) is configured to identify a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories. The privacy risk assessment module (60) is also configured to generate a risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks, thereby performing the privacy risk assessment of the data in the plurality of datasets.
The impact assessment module (70) is configured to compare the risk identified and the unified privacy risk score generated via the privacy risk assessment module (60) with an entity necessity for data collection, storage, processing, and data collaboration using one or more statistical techniques along with an artificial intelligence-based technique. The impact assessment module (70) is also configured to create one or more visualizations corresponding to a balance identified based on the comparison, wherein the one or more visualizations assist in Data protection impact assessment decision making.
The recommendation module (80) is configured to generate one or more privacy regulation-aware recommendations using a machine learning technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware.
The report generation module (90) is configured to generate a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems.
The bus (200) as used herein refers to be internal memory channels or computer network that is used to connect computer components and transfer data between them. The bus (200) includes a serial bus or a parallel bus, wherein the serial bus transmits data in a bit-serial format and the parallel bus transmits data across multiple wires. The bus (200) as used herein, may include but not limited to, a system bus, an internal bus, an external bus, an expansion bus, a frontside bus, a backside bus, and the like.
FIG. 4 (a) is a flow chart representing steps involved in a method (210) for privacy risk assessment and mitigatory recommendation in accordance with an embodiment of the present disclosure. FIG. 4 (b) is a flow chart representing continued steps involved in the method (210) of FIG. 4 (a) in accordance with an embodiment of the present disclosure. The method (210) includes receiving a plurality of datasets from at least one of one or more entities and one or more data sources in real-time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration in step 220. In one embodiment, receiving the plurality of datasets may include receiving the plurality of datasets via an input module (40).
The method (210) also includes receiving at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto- categorization of the one or more attributes from the one or more entities in step 230. In one embodiment, receiving at least one of the one or more data collaboration contexts and the one or more privacy categorization preferences may include receiving at least one of the one or more data collaboration contexts and the one or more privacy categorization preferences via the input module (40).
Furthermore, the method (210) includes identifying a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence-based technique and one or more feature extraction techniques in step 240. In one embodiment, identifying the plurality of features may include identifying the plurality of features via a data categorization module (50).
Furthermore, the method (210) also includes categorizing the one or more attributes under one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto-categorization in step 250. In one embodiment, categorizing the one or more attributes may include categorizing the one or more attributes via the data categorization module (50).
In one exemplary embodiment, the method (210) may also include assigning an appropriate attribute level privacy risk score for each of the one or more categories using an artificial intelligence-based technique. In such embodiment, assigning the appropriate attribute level privacy risk score for each of the one or more categories may include assigning the appropriate attribute level privacy risk score for each of the one or more categories via the data categorization module (50).
Moreover, the method (210) also includes identifying a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories in step 260. In one embodiment, identifying the proportion of the plurality of datasets at risk of being attacked may include identifying the proportion of the plurality of datasets at risk of being attacked via a privacy risk assessment module (60). In one exemplary embodiment, performing the privacy attack simulation on the plurality of datasets may include performing at least one of a re-identification attack simulation, a de-anonymization attack simulation, a singling out attack simulation, a linkage attack simulation, an inference attack simulation, a homogeneity attack simulation, a background knowledge attack simulation, a reconstmction attack simulation, a tracing attack simulation, a collective privacy attack simulation on the plurality of datasets, and the like.
The method (210) further includes generating a unified privacy risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks in step 270. In one embodiment, generating the unified privacy risk score may include generating the unified privacy risk score via the privacy risk assessment module (60).
In a specific embodiment, implementing the one or more risk assessment techniques may include implementing one or more privacy -enhancing technology-based verification techniques including a k- Anonymity test to verify a level of aggregation, a t-Closeness test to verify a level of diversity within a class, an outlier analysis, and unique re-identifiable pattern identification.
The method (210) also includes comparing the risk identified and the unified privacy risk score generated via the privacy risk assessment module (60) with an entity necessity for data collection, storage, processing, and collaboration using one or more statistical techniques along with an artificial intelligence-based technique in step 280. In one embodiment, comparing the risk identified and the unified privacy risk score generated via the privacy risk assessment module (60) with the entity necessity may include comparing the risk identified and the unified privacy risk score generated via the privacy risk assessment module (60) with the entity necessity via an impact assessment module (70).
Further, the method (210) also includes creating one or more visualizations corresponding to a balance identified based on the comparison, wherein the one or more visualizations assist in Data protection impact assessment decision-making in step 290. In one embodiment, creating the one or more vi ualization may include creating the one or more visualizations via the impact assessment module (70).
In one exemplary embodiment, creating the one or more visualizations may include creating at least one of a risk visualization, a relationship visualization, and the like. The risk visualization may include generating a likelihood-severity matrix. The relationship visualization may include generating a graph-based visualization.
Further, in an embodiment, generating the likelihood- severity matrix may include generating the likelihood- severity matrix, by the impact assessment module (70), by implementing a plurality of steps. The plurality of steps may include generating a likelihood score in real-time, upon identifying a likelihood of re-identification of an individual for the corresponding plurality of datasets based on the risk score associated with the one or more categories. The plurality of steps may also include generating a severity score, upon identifying a severity of identification of the individual by correlating with the one or more privacy attacks identified to attack the plurality of datasets, upon generating the likelihood score. Further, the plurality of steps may include generating the likelihood- severity matrix upon determining a correlation between the likelihood score and the severity score, for determining an impact of the risk on a privacy of the individual.
In a specific embodiment, the method (210) may also include combining the one or more data collaboration contexts, along with the unified privacy risk score, for breaking down the risk of being attacked by the one or more privacy attacks and assist in Data protection impact assessment decision-making. In such embodiment, combining the one or more data collaboration contexts, along with the unified privacy risk score may include combining the one or more data collaboration contexts, along with the unified privacy risk score via the impact assessment module (70).
Furthermore, the method (210) also includes generating one or more privacy regulation-aware recommendations using a machine learning technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware, wherein the one or more privacy regulation aware recommendations are corresponding to at least one of technical safeguards, regulatory compliance, and cross border transfer of the data in step 300. In one embodiment, generating the one or more privacy regulation-aware recommendations may include generating the one or more privacy regulation-aware recommendations via a recommendation module (80).
In addition, the method (210) also includes generating a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems, wherein the privacy risk assessment report corresponds to at least one of documentation, an Application Programming Interface-based risk information tagging, and an information flow along with the data in step 310. In one embodiment, generating the privacy risk assessment report may include generating the privacy risk assessment report via the report generation module (90).
In one exemplary embodiment, enabling the creation of the one or more privacy-aware data ecosystems may include enabling the creation of at least one of one or more data pipelines, one or more data lakes, cloud data flows, and on-premises application data flows.
Also, in an embodiment, generating the privacy risk assessment report may include generating the privacy risk assessment report, wherein the privacy risk assessment report may include information about at least one of the unified privacy risk score, one or more privacy attack probabilities, the one or more visualizations, the one or more privacy regulation-aware recommendations, and one or more additional insights.
Various embodiments of the present disclosure enable decision-makers in data controller and data processor organizations, to evaluate and visualize the one or more risks involved in a dataset as a part of a Data Protection Impact Assessment done before storing, processing, or sharing of personal data. Basically, the system is providing a technical solution to the problem of identifying the privacy attack vulnerability level involved in sharing certain data on any platform, thereby enabling the one or more entities to be aware of the kind and level of the one or more risks of sharing the data. Also, upon integrating the system with APIs, data pipelines can be made privacy-aware, thereby making data collaboration more efficient, more reliable, and more secure. The system also significantly reduces delay due to human intervention, unpredictability, and biases due to human intervention.
While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.
The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.

Claims

LAIM:
1. A system (10) for privacy risk assessment and mitigatory recommendation, comprising: a processing subsystem (20) hosted on a server (30), and configured to execute on a network to control bidirectional communications among a plurality of modules comprising: an input module (40) is configured to: receive a plurality of datasets from at least one of one or more entities and one or more data sources in real-time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration; and receive at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto-categorization of the one or more attributes from the one or more entities; a data categorization module (50) operatively coupled to the input module (40), wherein the data categorization module (50) is configured to: identify a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence-based technique and one or more feature extraction techniques; and categorize the one or more attributes under one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto categorization; a privacy risk assessment module (60) operatively coupled to the data categorization module (50), wherein the privacy risk assessment module (60) is configured to: identify a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories; and generate a unified privacy risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks; an impact assessment module (70) operatively coupled to the privacy risk assessment module (60), wherein the impact assessment module (70) is configured to: compare the risk identified and the unified privacy risk score generated via the privacy risk assessment module (60) with an entity necessity for data collection, storage, processing, and data collaboration using one or more statistical techniques along with an artificial intelligence-based technique; and create one or more visualizations corresponding to a balance identified based on the comparison, wherein the one or more visualizations assist in Data protection impact assessment decision making; and a recommendation module (80) operatively coupled to the impact assessment module (70), wherein recommendation module (80) is configured to generate one or more privacy regulation-aware recommendations using a machine learning technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware, wherein the one or more privacy regulation-aware recommendations are corresponding to at least one of technical safeguards, regulatory compliance, and cross-border transfer of the data; and a report generation module (90) operatively coupled to the recommendation module (80), wherein the report generation module (90) is configured to generate a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems, wherein the privacy risk assessment report corresponds to at least one of documentation, an Application Programming Interface- based risk information tagging, and an information flow along with the data.
2. The system (10) as claimed in claim 1, wherein the data categorization module (50) is configured to assign an appropriate attribute level privacy risk score for each of the one or more categories using an artificial intelligence-based technique.
3. The system (10) as claimed in claim 1, wherein the privacy attack simulation comprises at least one of a re-identification attack simulation, a de anonymization attack simulation, a singling out attack simulation, a linkage attack simulation, an inference attack simulation, a homogeneity attack simulation, a background knowledge attack simulation, a reconstruction attack simulation, a tracing attack simulation, and a collective privacy attack simulation.
4. The system (10) as claimed in claim 1, wherein and the one or more risk assessment techniques comprises one or more privacy-enhancing technology-based verification techniques comprising a k-Anonymity test to verify a level of aggregation, a t-Closeness test to verify a level of diversity within a class, an outlier analysis, and a unique re-identifiable pattern identification.
5. The system (10) as claimed in claim 1, wherein the one or more visualizations comprises at least one of a risk visualization, a relationship visualization, wherein the risk visualization comprises a generation of a likelihood- severity matrix, wherein the relationship visualization comprises a graph-based visualization.
6. The system (10) as claimed in claim 5, wherein the likelihood- severity matrix is generated by the impact assessment module (70) by implementing a plurality of steps comprising: generating a likelihood score in real-time, upon identifying a likelihood of re-identification of an individual for the corresponding plurality of datasets based on the risk score associated with the one or more categories; generating a severity score, upon identifying a severity of identification of the individual by correlating with the one or more privacy attacks identified to attack the plurality of datasets, upon generating the likelihood score; and generating the likelihood- severity matrix upon determining a correlation between the likelihood score and the severity score, for determining an impact of the risk on a privacy of the individual.
7. The system (10) as claimed in claim 1, wherein the impact assessment module (70) is configured to combine the one or more data collaboration contexts, along with the unified privacy risk score, for breaking down the risk of being attacked by the one or more privacy attacks and assist in Data protection impact assessment decision making.
8. The system (10) as claimed in claim 1, wherein the one or more privacy- aware data ecosystems comprises at least one of one or more data pipelines, one or more data lakes, cloud data flows, and on-premises application data flows.
9. The system (10) as claimed in claim 1, wherein the privacy risk assessment report comprises information about at least one of the unified privacy risk score, one or more privacy attack probabilities, the one or more visualizations, the one or more privacy regulation-aware recommendations, and one or more additional insights.
10. A method (210) for privacy risk assessment and mitigatory recommendation, comprising: receiving, via an input module (40), a plurality of datasets from at least one of one or more entities and one or more data sources in real-time, upon receiving a privacy risk assessment request from the corresponding one or more entities upon registration; (220) receiving, via the input module (40), at least one of one or more data collaboration contexts of the plurality of datasets and one or more privacy categorization preferences of one or more attributes of one or more flows of data or an approval for auto -categorization of the one or more attributes from the one or more entities; (230) identifying, via a data categorization module (50), a plurality of features for each of the one or more attributes associated with the plurality of datasets using an artificial intelligence-based technique and one or more feature extraction techniques; (240) categorizing, via the data categorization module (50), the one or more attributes under one or more categories by creating and assigning one or more labels corresponding to the one or more attributes, based on at least one of the plurality of features, the one or more data collaboration contexts, the one or more privacy categorization preferences, and the approval for the auto-categorization; (250) identifying, via a privacy risk assessment module (60), a proportion of the plurality of datasets at risk of being attacked by one or more privacy attacks by performing a privacy attack simulation on the plurality of datasets, based on the one or more categories; (260) generating, via the privacy risk assessment module (60), a unified privacy risk score corresponding to a risk level associated with the corresponding one or more categories based on the privacy attack simulation performed on the plurality of datasets, by implementing one or more risk assessment techniques onto the plurality of datasets for each of the one or more privacy attacks; (270) comparing, via an impact assessment module (70), the risk identified, and the unified privacy risk score generated via the privacy risk assessment module (60) with an entity necessity for data collection, storage, processing, and data collaboration using one or more statistical techniques along with an artificial intelligence-based technique; (280) creating, via the impact assessment module (70), one or more visualizations corresponding to a balance identified based on the comparison, wherein the one or more visualizations assist in Data protection impact assessment decision making; (290) generating, via a recommendation module (80), one or more privacy regulation-aware recommendations using a machine learning technique, based on at least one of the unified privacy risk score generated for the plurality of datasets and a plurality of regulatory requirements, thereby making one or more data pipelines and one or more data ecosystems privacy risk-aware, wherein the one or more privacy regulation aware recommendations are corresponding to at least one of technical safeguards, regulatory compliance, and cross-border transfer of the data; and (300) generating, via a report generation module (90), a privacy risk assessment report upon receiving the one or more privacy regulation-aware recommendations, thereby enabling creation of one or more privacy-aware data ecosystems, wherein the privacy risk assessment report corresponds to at least one of documentation, an Application Programming Interface-based risk information tagging, and an information flow along with the data (310).
11. The method (210) as claimed in claim 10, comprises assigning, via the data categorization module (50), an appropriate attribute level privacy risk score for each of the one or more categories using an artificial intelligence-based technique.
12. The method (210) as claimed in claim 10, wherein performing the privacy attack simulation on the plurality of datasets comprises performing at least one of a re-identification attack simulation, a de-anonymization attack simulation, a singling out attack simulation, a linkage attack simulation, an inference attack simulation, a homogeneity attack simulation, a background knowledge attack simulation, a reconstruction attack simulation, a tracing attack simulation, and a collective privacy attack simulation on the plurality of datasets.
13. The method (210) as claimed in claim 10, wherein implementing the one or more risk assessment techniques comprises implementing one or more privacy enhancing technology-based verification techniques comprising a k-Anonymity test to verify a level of aggregation, a t-Closeness test to verify a level of diversity within a class, an outlier analysis, and a unique re-identifiable pattern identification.
14. The method (210) as claimed in claim 10, wherein creating the one or more visualizations comprises creating at least one of a risk visualization, a relationship visualization, wherein the risk visualization comprises generating a likelihood- severity matrix, wherein the relationship visualization comprises generating a graph-based visualization.
15. The method (210) as claimed in claim 14, wherein generating the likelihood- severity matrix comprises generating the likelihood-severity matrix, by the impact assessment module (70), by implementing a plurality of steps comprising: generating a likelihood score in real-time, upon identifying a likelihood of re-identification of an individual for the corresponding plurality of datasets based on the risk score associated with the one or more categories; generating a severity score, upon identifying a severity of identification of the individual by correlating with the one or more privacy attacks identified to attack the plurality of datasets, upon generating the likelihood score; and generating the likelihood- severity matrix upon determining a correlation between the likelihood score and the severity score, for determining an impact of the risk on a privacy of the individual.
16. The method (210) as claimed in claim 10, comprises combining, via the impact assessment module (70), the one or more data collaboration contexts, along with the unified privacy risk score, for breaking down the risk of being attacked by the one or more privacy attacks and assist in Data protection impact assessment decision making.
17. The method (210) as claimed in claim 10, wherein enabling the creation of the one or more privacy-aware data ecosystems comprises enabling the creation of at least one of one or more data pipelines, one or more data lakes, cloud data flows, and on-premises application data flows.
18. The method (210) as claimed in claim 10, wherein generating the privacy risk assessment report comprises generating the privacy risk assessment report, wherein the privacy risk assessment report comprises information about at least one of the unified privacy risk score, one or more privacy attack probabilities, the one or more visualizations, the one or more privacy regulation-aware recommendations, and one or more additional insights.
PCT/IB2022/055779 2021-06-23 2022-06-22 System and method for privacy risk assessment and mitigatory recommendation WO2022269504A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202141028231 2021-06-23
IN202141028231 2021-06-23

Publications (1)

Publication Number Publication Date
WO2022269504A1 true WO2022269504A1 (en) 2022-12-29

Family

ID=84545523

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2022/055779 WO2022269504A1 (en) 2021-06-23 2022-06-22 System and method for privacy risk assessment and mitigatory recommendation

Country Status (1)

Country Link
WO (1) WO2022269504A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271781A (en) * 2023-11-22 2023-12-22 深圳市信飞合创科技有限公司 Data cross-border compliance evaluation system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200327252A1 (en) * 2016-04-29 2020-10-15 Privitar Limited Computer-implemented privacy engineering system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200327252A1 (en) * 2016-04-29 2020-10-15 Privitar Limited Computer-implemented privacy engineering system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHANKAR SUSHANT, LIN IRVING: "Applying Machine Learning to Product Categorization", 17 December 2011 (2011-12-17), XP093020159, Retrieved from the Internet <URL:https://cs229.stanford.edu/proj2011/LinShankar-Applying%20Machine%20Learning%20to%20Product%20Categorization.pdf> [retrieved on 20230202] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271781A (en) * 2023-11-22 2023-12-22 深圳市信飞合创科技有限公司 Data cross-border compliance evaluation system
CN117271781B (en) * 2023-11-22 2024-01-19 深圳市信飞合创科技有限公司 Data cross-border compliance evaluation system

Similar Documents

Publication Publication Date Title
Guo et al. Big social data analytics in journalism and mass communication: Comparing dictionary-based text analysis and unsupervised topic modeling
Archenaa et al. A survey of big data analytics in healthcare and government
Martínez-Pérez et al. Privacy and security in mobile health apps: a review and recommendations
US11830099B2 (en) Machine learning modeling for protection against online disclosure of sensitive data
US20230023630A1 (en) Creating predictor variables for prediction models from unstructured data using natural language processing
Narkhede et al. Cloud computing in healthcare-a vision, challenges and future directions
Zhao et al. A comprehensive and systematic review of the banking systems based on pay-as-you-go payment fashion and cloud computing in the pandemic era
Washizaki et al. Software engineering patterns for machine learning applications (sep4mla) part 2
CN109147918A (en) Reserve matching process, device, electronic equipment and computer-readable medium
Katarahweire et al. Data classification for secure mobile health data collection systems
KR102192235B1 (en) Device for providing digital document de-identification service based on visual studio tools for office
Guillaudeux et al. Patient-centric synthetic data generation, no reason to risk re-identification in biomedical data analysis
WO2022269504A1 (en) System and method for privacy risk assessment and mitigatory recommendation
Rezazadeh et al. Computer-aided methods for combating Covid-19 in prevention, detection, and service provision approaches
Di Domenico et al. Linguistic drivers of misinformation diffusion on social media during the COVID-19 pandemic
Adekugbe et al. Harnessing data insights for crisis management in us public health: lessons learned and future directions
Khattak et al. Ethical considerations and challenges in the deployment of natural language processing systems in healthcare
Blohm et al. Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms.
Elsa et al. Sustainable Data Science in Healthcare: Harnessing AI for Better Patient Outcomes
Nevrataki et al. A survey on federated learning applications in healthcare, finance, and data privacy/data security
Kosa Towards measuring privacy
Casperson et al. Strategies for distributed curation of social media data for safety and pharmacovigilance
Bautista et al. Clinical, organizational and regulatory, and ethical and social (CORES) issues and recommendations on blockchain deployment for healthcare: evidence from experts
Saheb Mapping Ethical Artificial Intelligence Policy Landscape: A Mixed Method Analysis
US20240119175A1 (en) Machine learning for data anonymization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22827805

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE