WO2022192961A1 - Data management - Google Patents

Data management Download PDF

Info

Publication number
WO2022192961A1
WO2022192961A1 PCT/AU2022/050240 AU2022050240W WO2022192961A1 WO 2022192961 A1 WO2022192961 A1 WO 2022192961A1 AU 2022050240 W AU2022050240 W AU 2022050240W WO 2022192961 A1 WO2022192961 A1 WO 2022192961A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
management system
network
processing devices
sources
Prior art date
Application number
PCT/AU2022/050240
Other languages
French (fr)
Inventor
David Christmas
Original Assignee
Portfolio4 Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2021900804A external-priority patent/AU2021900804A0/en
Application filed by Portfolio4 Pty Ltd filed Critical Portfolio4 Pty Ltd
Priority to AU2022236779A priority Critical patent/AU2022236779A1/en
Publication of WO2022192961A1 publication Critical patent/WO2022192961A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/1396Protocols specially adapted for monitoring users' activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a data management system and associated data management methods, and in one particular example, a data management system and associated data management methods for automatically providing personal data from a network environment in response to a data request.
  • GDPR General Data Protection Regulation
  • DSAR Data Service Access Requests
  • an aspect of the present invention seeks to provide a data management system for automatically providing personal data from a network environment in response to a data request, the system including, one or more processing devices configured to: utilise a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; access a data source repository maintaining industry specific data constructs relating to the data sources; receive a data request relating to an individual; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, generate a search response including redacted personal data.
  • APIs Application Programming Interfaces
  • the one or more processing devices are configured to automatically perform redaction of the retrieved personal data.
  • the one or more processing devices are configured to use Robotic Process Automation (RPA) to at least one of: perform the discovery process; implement the API; perform the search; automatically redact personal data; and, validate search results.
  • RPA Robotic Process Automation
  • the one or more processing devices are configured to perform RPA in accordance with RPA logic, and wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning.
  • RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning.
  • the one or more processing devices are configured to perform the discovery process by: monitoring network traffic in the network environment; and, analysing the network traffic to identify the data sources.
  • the network traffic is analysed using at least one of: traffic analytics; and, Azure traffic analytics.
  • the one or more processing devices are configured to: connect to one or more network logs; and, scan the network logs to identify data sources within the network environment.
  • the one or more processing devices are configured to perform the discovery process using an identity and access management service to determine access levels and credentials associated with the data sources.
  • the one or more processing devices are configured to: generate a network visualisation indicative of the network topography; and, display the network visualisation to a user.
  • the network visualisation includes graphical elements indicative of: data sources; network hardware; and connections indicative of communication links between the data sources and network hardware.
  • each graphical element includes a status indicator indicative of a status of a respective data source, network hardware or connection.
  • the one or more processing devices are configured to: detect user selection of graphical element; retrieve additional information regarding the respective data source, network hardware or connection; and, display the additional information.
  • the one or more processing devices are configured to determine data source details by interrogating at least one of: a data sources; and, a configuration management database.
  • the one or more processing devices are configured to use the data source details to retrieve at least one of: an API; and configuration information.
  • the configuration information includes at least one of: an endpoint protocol; an authentication mechanism; and data source schema.
  • the data source details include at least one of: a server name; a database name; an IP address; a vendor name; and, a data source and/or software version.
  • the one or more processing devices are configured to select an API from an API repository, the API repository hosting APIs and configuration information for multiple different data sources.
  • the one or more processing devices are configured to select an API at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
  • RPA Robotic Process Automation
  • the one or more processing devices are configured to: retrieve credentials; and, access the data source using the API and the credentials.
  • the one or more processing devices are configured to: select sample data; generate one or more test data requests relating to the sample data; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on the test data request; automatically perform redaction of the retrieved personal data; and generate a test result including redacted personal data, wherein the test result is manually audited to confirm the personal data is redacted as required.
  • the one or more processing devices are configured to: analyse data stored in different data sources within a network environment; and, update a data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources.
  • the data constructs include at least one of: terminology; entities; attributes; relationships; and, rulesets.
  • the data constructs are usable in at least one of: analysing data from different data sources; combining data from different data sources; and, redacting data from the data sources.
  • the one or more processing devices are configured to: select sample data; and, analyse sample data to generate the data source repository, wherein the data source repository links to data associated with an individual in different data sources.
  • the one or more processing devices are configured to associate unique identifiers with each individual in the data source repository.
  • the one or more processing devices are configured to: receive a data request relating to an individual; perform a search of a network environment to thereby retrieve personal data based on an identity of the individual; automatically perform redaction of the retrieved personal data; and, generate a search response including redacted personal data.
  • the one or more processing devices are configured to: retrieve information regarding available data sources; identify one or more available data sources relevant to the data request; and, perform one or more searches of the available data sources.
  • the one ormore processing devices are configured to: receive search results; analyse the search results using a data source repository that maintains industry specific data constructs relating to the data in the different data sources; and, aggregate the search results using results of the analysis to create the personal data.
  • the one or more processing devices are configured to: perform pattern recognition to recognise patterns in the retrieved personal data; and, redact the personal data in accordance with the recognised patterns.
  • the one or more processing devices are configured to: perform natural language processing to understand a context of recognised patterns; and, redact the personal data in accordance with the context.
  • the one ormore processing devices are configured to redact the data using de-identification logic.
  • the one or more processing devices are configured to: receive a data request relating to an individual; validate the identity of the individual; and, perform the search in response to a successful validation.
  • the one or more processing devices are configured to validate the identity of the individual using a Know Your Client (KYC) procedure.
  • KYC Know Your Client
  • the redaction process is performed at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
  • RPA Robotic Process Automation
  • an aspect of the present invention seeks to provide a data management method for automatically providing personal data from a network environment in response to a data request, the method including, in one or more processing devices: utilising a discovery process to generate a network topography indicative of data sources within the network environment; implementing one or more Application Programming Interfaces (APIs) to access the data sources; generating a data source repository indicative of locations of personal data within the data sources; receiving a data request relating to an individual; performing a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, generating a search response including redacted personal data.
  • APIs Application Programming Interfaces
  • an aspect of the present invention seeks to provide a data management system including one or more processing devices configured to perform a discovery process to generate a network topography indicative of data sources within a network environment by: monitoring network traffic in the network environment; analysing the network traffic to identify data sources within the network environment; and, using an identity and access management service to determine access levels and credentials associated with the data sources.
  • the network traffic is analysed using at least one of: traffic analytics; and, Azure traffic analytics.
  • the one or more processing devices are configured to: connect to one or more network logs; and, scan the network logs to identify data sources within the network environment.
  • the Identity and Access Management Service is an Active Directory.
  • the one or more processing devices are configured to determine data source details by interrogating at least one of: a data sources; and, a configuration management database.
  • the one or more processing devices are configured to use the data source details to retrieve at least one of: an API; and configuration information.
  • the configuration information includes at least one of: an endpoint protocol; an authentication mechanism; and data source schema.
  • the data source details include at least one of: a server name; a database name; an IP address; a vendor name; and, a data source and/or software version.
  • the one or more processing devices are configured to select an API from an API repository, the API repository hosting APIs and configuration information for multiple different data sources.
  • the one or more processing devices are configured to select an API at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
  • RPA Robotic Process Automation
  • the one or more processing devices are configured to: retrieve credentials; and, access the data source using the API and the credentials.
  • an aspect of the present invention seeks to provide a data management method including, in one or more processing devices, performing a discovery process to generate a network topography indicative of data sources within a network environment by: monitoring network traffic in the network environment; analysing the network traffic to identify data sources within the network environment; and, using an identity and access management service to determine access levels and credentials associated with the data sources.
  • an aspect of the present invention seeks to provide a data management system for use in accessing multiple data sources, the system including one or more processing devices configured to: determine data source details by interrogating at least one of: a data sources; and, a configuration management database; and, implement one or more Application Programming Interfaces (APIs) to access the data sources by selecting an API from an API repository using the data source details, wherein the API repository hosts APIs and configuration information for multiple different data sources.
  • APIs Application Programming Interfaces
  • the configuration information includes at least one of: an endpoint protocol; an authentication mechanism; and data source schema.
  • the data source details include at least one of: a server name; a database name; an IP address; a vendor name; and, a data source and/or software version.
  • the one or more processing devices are configured to select an API at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
  • RPA Robotic Process Automation
  • the one or more processing devices are configured to: retrieve credentials; and, access the data source using the API and the credentials.
  • an aspect of the present invention seeks to provide a data management method for use in accessing multiple data sources, the method including, in one or more processing devices: determining data source details by interrogating at least one of: a data sources; and, a configuration management database; and, implementing one or more Application Programming Interfaces (APIs) to access the data sources by selecting an API from an API repository using the data source details, wherein the API repository hosts APIs and configuration information for multiple different data sources.
  • APIs Application Programming Interfaces
  • an aspect of the present invention seeks to provide a data management system for generating a network visualisation, the system including one or more processing devices configured to: perform a discovery process to generate a network topography indicative of data sources within a network environment by monitoring network traffic in the network environment; generate a network visualisation indicative of the network topography, the network visualisation including: graphical elements indicative of: data sources; network hardware; and connections indicative of communication links between the data sources and network hardware; and, a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection; and, display the network visualisation to a user.
  • the one or more processing devices are configured to: detect user selection of graphical element; retrieve additional information regarding the respective data source, network hardware or connection; and, display the additional information.
  • an aspect of the present invention seeks to provide a data management method for generating a network visualisation, the method including, in one or more processing devices: performing a discovery process to generate a network topography indicative of data sources within a network environment by monitoring network traffic in the network environment; generating a network visualisation indicative of the network topography, the network visualisation including: graphical elements indicative of: data sources; network hardware; and connections indicative of communication links between the data sources and network hardware; and, a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection; and, displaying the network visualisation to a user.
  • an aspect of the present invention seeks to provide a data management system for maintaining a data source repository relating to different data sources, the system including one or more processing devices configured to: analyse data stored in different data sources within a network environment; and, update the data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources.
  • the data constructs include at least one of: terminology; entities; attributes; relationships; and, rulesets.
  • the data constructs are usable in at least one of: analysing data from different data sources; combining data from different data sources; and, redacting data from the data sources.
  • an aspect of the present invention seeks to provide a data management method for maintaining a data source repository relating to different data sources, the method including, in one or more processing devices: analysing data stored in different data sources within a network environment; updating the data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources.
  • an aspect of the present invention seeks to provide a data management system for searching data relating to an individual, the system including one or more processing devices configured to: receive a data request relating to an individual; perform a search of a network environment to thereby retrieve personal data based on an identity of the individual; automatically perform redaction of the retrieved personal data; and, generate a search response including redacted personal data.
  • the one or more processing devices are configured to: retrieve information regarding available data sources; identify one or more available data sources relevant to the data request; and, perform one or more searches of the available data sources.
  • the one ormore processing devices are configured to: receive search results; analyse the search results using a data source repository that maintains industry specific data constructs relating to the data in the different data sources; and, aggregate the search results using results of the analysis to create the personal data.
  • the one or more processing devices are configured to: perform pattern recognition to recognise patterns in the retrieved personal data; and, redact the personal data in accordance with the recognised patterns.
  • the one or more processing devices are configured to: perform natural language processing to understand a context of recognised patterns; and, redact the personal data in accordance with the context.
  • the one ormore processing devices are configured to redact the data using de-identification logic.
  • the one or more processing devices are configured to: receive a data request relating to an individual; validate the identity of the individual; and, perform the search in response to a successful validation.
  • the one or more processing devices are configured to validate the identity of the individual using a Know Your Client (KYC) procedure.
  • KYC Know Your Client
  • the redaction process is performed at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
  • RPA Robotic Process Automation
  • an aspect of the present invention seeks to provide a data management method for searching data relating to an individual, the method including, in one or more processing devices: receiving a data request relating to an individual; performing a search of a network environment to thereby retrieve personal data based on an identity of the individual; automatically performing redaction of the retrieved personal data; and, generating a search response including redacted personal data.
  • an aspect of the present invention seeks to provide a data management system including one or more processing devices configured to: determine Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning; and, use the RPA logic to perform RPA to at least one of: perform a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; automatically redact personal data; and, validate the search results.
  • RPA Robotic Process Automation
  • an aspect of the present invention seeks to provide a data management method including, in one or more processing devices: determining Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning; and, using the RPA logic to perform RPA to at least one of: perform a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; automatically redact personal data; and, validate the search results.
  • RPA Robotic Process Automation
  • Figure 1A is a flow chart of an example of a data management method
  • Figure IB is a flow chart of an example of a discovery process
  • Figure 1C is a flow chart of an example of an API implementation process using a reference library
  • Figure ID is a flow chart of an example of network visualisation process
  • Figure IE is a flow chart of an example of a process for updating a data schema
  • Figure IF is a flow chart of an example of a data search process
  • Figure 2 is a schematic diagram of a specific example of a network architecture
  • Figure 3 is as schematic diagram of an example of a processing system
  • Figure 4 is a schematic diagram of an example of a client device
  • Figures 5 A to 5C are a flow chart of an example of a method of configuring a data management system
  • Figure 6 is a schematic diagram of an example of a network visualisation
  • Figure 7 is a flow chart of an example of a data search process
  • Figure 8A is a schematic diagram of an example of a network schematic when performing a discovery process using a data trace module
  • Figure 8B is a schematic diagram of an example of a component schematic when performing a discovery process
  • Figure 8C is a flow chart of a specific example of a discovery process
  • Figure 8D is a flow chart of a specific example of a process for handling a new data source identified using the discovery process
  • Figure 9A is a schematic diagram of an example of a network schematic when performing an API implementation process using a reference library
  • Figure 9B is a schematic diagram of an example of a component schematic when performing the API implementation process
  • Figure 9C is a flow chart of a specific example of the API implementation process
  • Figure 10A is a schematic diagram of an example of a component schematic when performing a network visualisation process using a data landscape construction process
  • Figure 10B is a flow chart of a specific example of an automated network landscape construction process
  • Figure IOC is a flow chart of a specific example of a manual network landscape construction process
  • Figure 10D is a flow chart of a specific example of a network landscape resolution process
  • Figure 10E is a schematic diagram of an example of a network visualisation user interface
  • Figure 10F is a schematic diagram of an example of the network visualisation user interface of Figure 10E with details of different data sources displayed;
  • Figure 10G is a schematic diagram of an example of the network visualisation user interface of Figure 10E with details of known issues displayed;
  • Figure 1 OH is a schematic diagram of an example of the network visualisation user interface of Figure 10E with details a selected issue displayed;
  • Figure 101 is a schematic diagram of an example of the network visualisation user interface of Figure 10E with data requests displayed;
  • Figure 11A is a schematic diagram of an example of a component schematic when updating a schema repository using a data construct module
  • Figure 1 IB is a flow chart of a specific example of a schema repository construction process
  • Figure 12A is a schematic diagram of an example of a component schematic when performing data screening using a data screen module
  • Figure 12B is a schematic diagram of an example of internal components of the data screen module
  • Figure 12C is a flow chart of a specific example of a data screening process
  • Figure 13A is a schematic diagram of an example of a component schematic when performing a search using a data request algorithm
  • Figure 13B is a flow chart of a specific example of a search process
  • Figure 13C is a flow chart of a specific example of data request algorithm training
  • Figure 13D is a flow chart of a specific example of a data access request process.
  • a system including one or more electronic processing devices forming part of one or more processing systems, connected to a network system including one or more data sources, such as databases or the like.
  • this could include a local or remote server, optionally including a cloud based architecture, which interfaces with a network system, such as a local area network (LAN) or wide area network (WAN) within an organisation.
  • LAN local area network
  • WAN wide area network
  • the process is broadly broken down into two stages, namely a configuration phase, including steps 100 to 102, in which the network is analysed to enable access to data sources, and an operational phase, including steps 103 to 105, in which search requests are processed.
  • a configuration phase including steps 100 to 102, in which the network is analysed to enable access to data sources
  • an operational phase including steps 103 to 105, in which search requests are processed.
  • the processing device utilises a discovery process to generate a network topography indicative of data sources within the network environment.
  • the nature of the discovery process will vary depending on the implementation, but typically this will involve monitoring traffic on the network, and analysing the traffic using machine learning to identify data sources.
  • the processing device implements one or more Application Programming Interfaces (APIs) to access the data sources. This can be achieved in any appropriate manner, but will typically involve retrieving APIs from a repository of pre-existing APIs based, for example, on an understanding of details of the data source, such as the type and version of the data source.
  • APIs Application Programming Interfaces
  • a data source repository is accessed and optionally constructed, updated or generated by the processing device, which relates to the data in the different data sources, and in one example is indicative of locations of personal data within the data sources, and optionally other information regarding the data sources, such as the data source formats and structures.
  • the repository can acts to assist as a mapping to the different data structures that might exist within different data source, allowing data stored in different formats and/or data structures, to be retrieved more easily.
  • testing and optional certification process might be performed, as will be described in more detail below, thereby ensuring the data sources identified are accessible and that responses can be provided in accordance with industry/organisation business data and legal compliance rules.
  • a visual representation of the resulting network topography could also be displayed, allowing this to be reviewed by administrators, or other operators. Additionally the above process it typically performed repeatedly, for example performing the process on an ongoing basis, either continuously or periodically, allowing additional data sources to be added to the topography as they are discovered, allowing a status of data sources and other network resources to be maintained.
  • a data request relating to an individual is received by the processing device.
  • the data request could be in any suitable form, and typically includes information regarding the identity of the individual making the request, which may require validation and/or authentication, to ensure resulting data is supplied in accordance with privacy requirements.
  • the processing device performs a search of different data sources in the network environment using the data source repository and the APIs, to obtain data from different data sources, typically based on the identity of an individual.
  • the search uses the data source repository and identity, to determine where data relating to the individual is stored in the data sources, with the APIs being used to construct queries, allowing relevant data to be retrieved.
  • the processing device optionally performs automatic redaction of the retrieved personal data, for example removing data that is subject to privacy or confidentiality limitations. It will be appreciated that such redaction may not be required for all requests, for example, in the event that the retrieved data already meets privacy and/or compliance requirements.
  • the redaction is typically performed, at least in part, using Robotic Process Automation (RPA), enabling the redaction process to be performed substantially automatically in accordance with relevant requirements, such as local laws and/or data holder requirements.
  • RPA Robotic Process Automation
  • a search response including redacted personal data is generated and provided to the individual submitting the request, with the response typically being generated in accordance with business and/or local Customer Data Request (CDR) requirements.
  • CDR Customer Data Request
  • the above described process enables a processing device to retrieve and redact data relating to an individual substantially automatically, even where data is stored in disparate data sources, often in different forms, formats or structures, which would not otherwise be the case.
  • This can be achieved using the combination of the configuration phase, in which a discovery process is used to map the network topology to establish APIs and a data source repository, so that searching can be performed, and the operational phase, in which search is performed using the APIs and schema, with data being automatically redacted as needed.
  • This allows organisations to more rapidly respond to data requests, whilst ensuring privacy and/or compliance requirements are met.
  • the data management system could include one or more processing devices configured to implement a module, referred to hereinafter as a data trace module, which performs a discovery process to generate a network topography indicative of data sources within a network environment, and an example of this will now be described with reference to Figure IB.
  • a data trace module which performs a discovery process to generate a network topography indicative of data sources within a network environment
  • the system monitors network traffic in the network environment, before analysing the network traffic to identify data sources within the network environment at step 111.
  • the system uses an Identity and Access Management Service (IDM), such as an Active Directory, to determine access levels and credentials associated with the data sources.
  • IDM Identity and Access Management Service
  • this provides a mechanism to perform a discovery process by continuously scanning an organisation’s infrastructure, and as data sources of the data environment are identified, the system will attempt to match and gain access to them via the credentials using an IDM (Identity and Access Management Service).
  • IDM Identity and Access Management Service
  • the data management system could include one or more processing devices configured to access a data store, referred to hereinafter as a reference library, to locate an API to facilitate access to the data source, as will now be described with reference to Figure 1C.
  • a data store referred to hereinafter as a reference library
  • the data management system interrogates the data source(s) identified in the network environment, for example using the discovery process described above with respect to Figure IB and/or a configuration management database. Results of the interrogation are used to determine a data source details at step 121, the data source details including information such as one or more of a server name; an IP address; a vendor name; and, a version. Having determined this information, one or more Application Programming Interfaces (APIs) and/or configuration information needed to access the data source(s) are retrieved at step 122 by selecting an API from an API repository using results of the interrogation, wherein the API repository hosts APIs and configuration information for multiple different data sources.
  • APIs Application Programming Interfaces
  • this mechanism allows the system to interrogate data sources and automatically retrieve the APIs and/or configuration information required to access the data sources.
  • this process is performed at least in part using RPA (Robotic Process Automation) to automatically locate the matching API from the reference library, then attempt to configure and test the API, and if necessary, create a ticket and assign to a support resource.
  • RPA Robot Process Automation
  • the system can call an API library which contains all known API’s and Configurations which the system will attempt to configure a valid connection and update the data landscape with the result and/or create a support ticket where it was unsuccessful.
  • This library is continuously enriched as a central repository as new data entities are encountered in other organisations.
  • the discovery process can also be used to construct a data landscape using a module referred to hereinafter as a data landscape module, which can be used present a visual representation of the network environment, and an example of this will now be described with reference to Figure ID.
  • the system performs a discovery process to generate a network topography indicative of data sources within a network environment. This is typically performed using the approached described above with respect to Figure IB, and hence involves monitoring network traffic in the network environment.
  • the system determines a status of the data sources, for example by monitoring traffic, using results of data source interrogation or the like.
  • the system generates a network visualisation indicative of the network topography, the network visualisation including graphical elements indicative of data sources, network hardware, and connections indicative of communication links between the data sources and network hardware.
  • the visualisation also includes a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection.
  • the network visualisation can then be displayed to a user at step 133, allowing the user to explore the network, and optionally interact with the network as required.
  • this process allows a visual representation of an organisation's data environment to be drawn dynamically and updated in real time, displaying information regarding data sources, such as a Data source Name, Access Status, Database Type (On Prem/Off Prem, Hybrid), linkages between the Data Sources and colour coded Consumer Data Request status (Green, Yellow, Red).
  • data sources such as a Data source Name, Access Status, Database Type (On Prem/Off Prem, Hybrid), linkages between the Data Sources and colour coded Consumer Data Request status (Green, Yellow, Red).
  • the network topography can also contain a context sensitive integration to a support task management module, so as a user selects individual data source items, related tasks are displayed.
  • a support task management module so as a user selects individual data source items, related tasks are displayed.
  • the system selects data from different database, for example by performing searches using the APIs at step 140, with the data optionally relating to one or more individuals.
  • data is analysed, with results of the analysis being used to update, including to construct, generate and/or maintain, a data source repository at step 142.
  • the data source repository maintains industry specific data constructs relating to the different data sources, including for example, details of relationships between the data in the different data sources, information on data source formats, structures, schemas of each data source, a context associated with the data or data source, RPA logic for accessing, retrieving and/or screening data in the data sources, or other relevant information.
  • the system also typically implements a module, hereinafter referred to as a data screen module, which is used to screen data retrieved from the data sources, allowing the data to be redacted as required, and used in fulfilling consumer data requests, and an example of this will now be described with reference to Figure IF.
  • a data screen module which is used to screen data retrieved from the data sources, allowing the data to be redacted as required, and used in fulfilling consumer data requests, and an example of this will now be described with reference to Figure IF.
  • the data screen module is configured to receive a data request relating to an individual at step 150.
  • the system then performs a search of a network environment at step 151 to thereby retrieve personal data based on an identity of the individual.
  • this will typically use the APIs and credentials identified above to access the data sources, using the data source repository to then access information regarding the identity of the individual.
  • the system automatically performs redaction of the retrieved personal data at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms, at step 152, to ensure any sensitive or private information is protected.
  • RPA Robotic Process Automation
  • the system generates a search response including redacted personal data.
  • this provides a mechanism to automatically redact data retrieved during a search.
  • the data screen module operates using an RPA supported process to learn from the organisation which data should be omitted, redacted or retained and then automating this process so that redactions, removals, and omissions are done based on industry & organisation specific rules reducing the necessity for human involvement leading to increases in accuracy & efficiency over time.
  • the data screen module can be pre-loaded with a standard set of pattern recognisers and can use natural language processor to understand the context of the recognised patterns, and thereby improve understanding of the content, and hence the redaction process.
  • de-identification logic is enacted to handle the required action.
  • the action may vary based on industry and organisation specific rules, which can be customised, tested and verified.
  • the output of the data screen module process can be verified by organisational staff, comparing the original dataset with the redacted copy. Refinements are fed back into a rules engine and redaction logic to improve the redaction process. Once a sufficient amount of data has been processed, results improve and the need for the manual verification process will reduce and be replaced with a fully automated, unattended process.
  • the data management system includes one or more processing devices configured to: determine Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning; and, use the RPA logic to perform RPA to at least one of: perform a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; automatically redact personal data; and, validate the search results.
  • RPA Robotic Process Automation
  • the processing device is configured to use Robotic Process Automation (RPA) to perform one or more tasks in the above described process.
  • RPA Robotic Process Automation
  • This can include, for example, using RPA algorithms to, perform the discovery process, implement the API, perform the search, automatically provide redacted personal data, and/or validate search results.
  • RPA occurs when basic tasks are automated through software or hardware systems that function across a variety of applications.
  • the software or bot can be instructed to follow a workflow with multiple steps and applications, such as retrieving, scanning, and collecting data from forms, sending a receipt message, checking the forms for completeness, filing the form in a folder, and updating a spreadsheet with the name of the form, the date filed, and so on.
  • RPA software is designed to reduce the burden for employees of completing repetitive, simple tasks.
  • the data retrieval and redaction processes are an ideal application of the technology, due to the volumes and the linear rules-based logic of the data retrieval and edit/redaction process, in addition this method is highly and instantly scalable.
  • RPA is performed in accordance with RPA logic, which sets out the steps to be performed as part of the RPA process.
  • the RPA logic is typically specific to an industry, business requirements and/or compliance rules. For example, it will be appreciated that the requirements associated with redaction of medical data might be significantly different to redaction of legal data, and hence a different redaction process be required. Similarly, different redaction processes might be required for different jurisdictions to comply with local law, and/or to comply with the requirements of different businesses.
  • RPA logic can be stored in, and hence retrieved from an RPA logic repository, such as the data source repository, as needed.
  • the relevant RPA logic can be retrieved and used in redacting medical data within Australia.
  • This obviates the need for new RPA logic to be established when the system is deployed in a scenario similar to existing scenarios.
  • suitable RPA logic might not be available, in which case new RPA logic might need to be generated by the processing device. This could be performed from scratch and/or could be based on existing similar RPA logic, for example using RPA logic from a similar scenario as a starting point.
  • the new RPA logic can be generated using machine learning and/or manual intervention, for example, by examining manually performed actions to train the processing device. Once generated, RPA logic can be stored in an RPA repository for subsequent reuse.
  • the processing device performs the discovery process by monitoring network traffic in the network environment and then analysing the network traffic to identify the data sources. Whilst monitoring can be performed in any appropriate manner, in one example this is achieved using Microsoft Azure traffic analytics, allowing the analysis to identify potential data sources on the network. Additionally, and/or alternatively, this can involve connecting to one or more network logs and scanning the network logs to identify data sources within the network environment.
  • the processing device can also perform the discovery process using an Identity and Access Management Service, such as an Active Directory, which is used to authenticate and authorise users, with the processing device using the Identity and Access Management Service to determine access levels and credentials associated with the data sources.
  • an Identity and Access Management Service such as an Active Directory, which is used to authenticate and authorise users
  • the processing device can optionally generate a network visualisation indicative of the network topography and display the network visualisation to a user.
  • the visualisation can be of any appropriate form, but typically includes graphical elements indicative of data sources, network hardware and connections indicative of communication links between the data sources and network hardware. Each graphical element can also include a status indicator indicative of a status of a respective data source, network hardware or connection.
  • the network visualisation is also typically interactive, allowing the user to view additional information associated with each network elements.
  • the processing device can detect user selection of one of the graphical elements, retrieve additional information regarding the respective data source, network hardware or connection and display the additional information. This allows an operator to easily visualise the network, in turn allowing for a manual review of the discovery process, for example, to allow an operator to identify network elements that have been incorrectly identified.
  • the processing device interrogates the data sources and/or a configuration management database and uses results of the interrogation to the determine data source details, which can in turn be used to select an API and/or configuration information.
  • This process is typically performed automatically, for example using RPA.
  • the processing device interrogates the data sources to determine information, such as a server name, an IP address, a database name, a vendor name and a data source and/or software version. This can be performed as part of the above described discovery process, or can be performed separately depending on the preferred implementation.
  • the processing device selects an API from an API repository that hosts pre-existing APIs and configuration information, such as an endpoint protocol, an authentication mechanism, and data source schema, for multiple different data sources.
  • an API may not be easily identifiable and hence the identification process could be assisted using machine learning and/or RPA approaches.
  • this allows the processing device to select an API that can be used to access the data source, thereby obviating the need for an API to be manually configured.
  • manual configuration may be required, and this typically involves having the processing device generate and issue a service request, which is forwarded to an operator to allow the issue to be resolved.
  • the processing device is configured to retrieve credentials, typically from the Identity and Access Management Service, and access the data source using the API and the credentials. This can be used to confirm that the API is configured correctly and that that credentials do indeed provide access to the data source. It will be appreciated that successfully accessing the data source allows the data source to be used in subsequent searching, and that as a result the status of the data source can be updated in the network visualisation mentioned above, to reflect the fact that the data source can now be queried automatically.
  • testing is typically performed in order to ensure data relating to an individual can be successfully retrieved.
  • the processing device selects sample data within the data sources, and then generates one or more test data requests relating to the sample data. This process can be automatic and/or performed with manual oversight by an operator.
  • the processing device then performs a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on the test data request, with automatic redaction of the retrieved personal data being performed, typically using RPA.
  • a test result is then generated including redacted personal data.
  • This test result is then manually audited, for example, by comparison to similar test result generated using an entirely manual process. This performed to confirm the personal data is redacted as required.
  • sample data from different data sources is typically analysed to update, including to construct, generate or maintain, a data source repository, which maintains industry specific data constructs relating to the different data sources.
  • the data constructs can include any one or more of terminology, entities, attributes, relationships and rulesets, and can be used to analyse, combine and/or redact data from different data sources.
  • the data source repository links data associated with an individual in different data sources.
  • an individual for which data is stored in different data sources may be referred to in different ways, for example using different identifiers, or the like.
  • the data source repository is used to create associations between the data in different data sources, for example by associating unique identifiers with each individual in the data source repository, so that data relating to a particular individual is linked even if identifiers used to identify the individual are different in the different data sources.
  • the processing devices receives a data request relating to an individual, and then performs a search of a network environment to thereby retrieve personal data based on an identity of the individual.
  • the personal data is then automatically redacted, with an optional manual verification step to check the redaction process, with a search response being generated including redacted personal data.
  • the processing device typically retrieves information regarding available data sources, identifies one or more available data sources relevant to the data request, for example using information from the data source repository, before performing one or more searches of the available data sources.
  • the processing device receives search results, analyses the search results using the data source repository and aggregates the search results using results of the analysis to create the personal data.
  • the system can use the relationships defined in the data source repository, to resolve records relating to the same individuals that are contained in different data sources, allowing these records to be aggregated into a single set of search results.
  • the processing device is configured to perform pattern recognition to recognise patterns in the retrieved personal data and redact the personal data in accordance with the recognised patterns.
  • the pattern recognition can be performed based on pre-defined pattern recognition templates, as well as industry and/or organisational specific templates, which might for example, be part of the data source repository. For example, this could be used to analyse results and identify particular alpha-numeric character sequences, representing identifiers associated with different individuals, or the like.
  • the processing devices can be configured to perform natural language processing to understand a context of recognised patterns and then redact the personal data in accordance with the context.
  • the system can also be configured to redact the data using de -identification logic, for example using de-identification algorithms, such as anonymization, pseudonymization or k- anonymization algorithms, as well as industry and/or organisational specific rules, optionally stored within the one or more schemas. Redaction can also be performed using machine learning and RPA processes.
  • de-identification logic for example using de-identification algorithms, such as anonymization, pseudonymization or k- anonymization algorithms, as well as industry and/or organisational specific rules, optionally stored within the one or more schemas. Redaction can also be performed using machine learning and RPA processes.
  • the processing device can receive a data request relating to an individual and then validate the identity of the individual, for example by authenticating the individual, and thereby ensure the individual has permission to seek the requested information. Assuming the validation is successful, a search can then be performed.
  • the validation could be achieved in any one of a number of ways, and this could include using a Know Your Client (KYC) procedure, biometric verification, or the like.
  • the network architecture includes a plurality of processing systems 310, such as servers, and data sources, such as databases 240, which in use are coupled to a communications network 220, such as a Local Area Network (LAN), or Wide Area Network (WAN), within an organisation.
  • a communications network 220 such as a Local Area Network (LAN), or Wide Area Network (WAN), within an organisation.
  • a number of client devices 230 are provided, which may be used to access data stored in the databases 240.
  • the network 220 is further connected to an external network 250, such as the Internet, which may include further servers 210 and client devices 230.
  • the configuration of the networks 220, 250 are for the purpose of example only, and in practice the client devices 230 and the processing systems 210 can communicate via any appropriate mechanism, such as via wired or wireless connections, including, but not limited to mobile networks, private networks, such as an 802.11 networks, the Internet, LANs, WANs, or the like, as well as via direct or point-to-point connections, such as Bluetooth, or the like.
  • processing systems 210 are shown as a single entity, it will be appreciated that in practice the processing systems 210 can be distributed over a number of geographically separate locations, for example as part of a cloud-based environment. However, the above described arrangement is not essential and other suitable configurations could be used.
  • FIG. 3 An example of a suitable processing system 210 is shown in Figure 3.
  • the processing system 210 includes at least one microprocessor 311, a memory 312, an optional input/output device 313, such as a keyboard and/or display, and an external interface 314, interconnected via a bus 315 as shown.
  • the external interface 314 can be utilised for connecting the processing system 210 to peripheral devices, such as the communications networks 220, 250, databases 240, other storage devices, or the like.
  • peripheral devices such as the communications networks 220, 250, databases 240, other storage devices, or the like.
  • a single external interface 314 is shown, this is for the purpose of example only, and in practice multiple interfaces using various methods (e.g. Ethernet, serial, USB, wireless or the like) may be provided.
  • the microprocessor 311 executes instructions in the form of applications software stored in the memory 312 to allow the required processes to be performed.
  • the applications software may include one or more software modules, and may be executed in a suitable execution environment, such as an operating system environment, or the like.
  • the processing system 210 may be formed from any suitable processing system, such as a suitably programmed client device, PC, web server, network server, or the like.
  • the processing system 210 is a standard processing system such as an Intel Architecture based processing system, which executes software applications stored on non-volatile (e.g., hard disk) storage, although this is not essential.
  • the processing system could be any electronic processing device such as a microprocessor, microchip processor, logic gate configuration, firmware optionally associated with implementing logic such as an FPGA (Field Programmable Gate Array), or any other electronic device, system or arrangement.
  • the client device 230 includes at least one microprocessor 431, a memory 432, an input/output device 433, such as a keyboard and/or display, and an external interface 434, interconnected via a bus 435 as shown.
  • the external interface 434 can be utilised for connecting the client device 230 to the communications networks 220, 250, databases, other storage devices, or the like.
  • a single external interface 434 is shown, this is for the purpose of example only, and in practice multiple interfaces using various methods (e.g. Ethernet, serial, USB, wireless or the like) may be provided.
  • the microprocessor 431 executes instructions in the form of applications software stored in the memory 432 to allow for communication with the processing systems 210, as well as to allow user interaction for example through a suitable user interface.
  • the client devices 230 may be formed from any suitable processing system, such as a suitably programmed PC, Internet terminal, lap-top, or hand-held PC, and in one preferred example is either a tablet, or smart phone, or the like.
  • the client device 230 is a standard processing system such as an Intel Architecture based processing system, which executes software applications stored on non volatile (e.g., hard disk) storage, although this is not essential.
  • the client devices 230 can be any electronic processing device such as a microprocessor, microchip processor, logic gate configuration, firmware optionally associated with implementing logic such as an FPGA (Field Programmable Gate Array), or any other electronic device, system or arrangement.
  • one or more processing systems 210 are servers, which communicate with the client devices 230 via a communications network, or the like, depending on the particular network infrastructure available.
  • the servers 210 typically execute applications software for performing required tasks including storing, searching and processing of data, with actions performed by the servers 210 being performed by the processor 311 in accordance with instructions stored as applications software in the memory 312 and/or input commands received from a user via the I/O device 313, or commands received from the client device 230.
  • GUI Graphic User Interface
  • Actions performed by the client devices 230 are performed by the processor 331 in accordance with instructions stored as applications software in the memory 332 and/or input commands received from a user via the I/O device 333.
  • a server 210 monitors network traffic, analysing this to at step 505 to identify data sources, with associated access permissions and credentials being determined at step 510.
  • this process is typically performed using a proprietary software module “Data Trace” which uses a network traffic analyser to generate a topography of the data environment and gain access via the credentials using an Identity and Access Management Service, such as Azure Active Directory, or the like.
  • Data Trace can be integrated with a support tool like ServiceNow to facilitate and automate the tasks required to prepare the network environment for Data Service Access Requests (DSAR) or Customer Data Request (CDR). This can also be used to give a clear picture of overall progress of search requests, as will be described in more detail below.
  • DSAR Data Service Access Requests
  • CDR Customer Data Request
  • the server 210 As data sources are identified, these are interrogated by the server 210 at step 515, allowing the server 210 to determine a data source configuration at step 520, including information such as a data source type, server/database name, IP address, version, or the like.
  • the server 210 access a repository of APIs and associated configuration files, and attempts to configure the API, to thereby access to the data source. This is performed to test the API at step 530, and hence update a status of the data source at step 535, depending on whether or not access is successful. If this process fails, the process is typically elevated for operator intervention.
  • the server 210 uses machine learning to automatically locate a matching API from a reference library, then attempt to configure and test the API and depending on the result, update the topography with a pass, or create a ticket and assign to a support resource.
  • the reference API library is updated with a copy of the API and configuration so this can be reused in future.
  • the server 210 implements a machine learning enabled algorithm, which attempts to configure an API (drawing from a proprietary Industry Reference Library (IRL)) and depending on the result, the topography is updated in real time with a pass, or in case of failure a support ticket is created for a resource to resolve, the result of which is used to enrich the IRL machine learning logic as well as give an overview of the progress of the DSAR/CDR preparation.
  • an API drawing from a proprietary Industry Reference Library (IRL)
  • IRL Industry Reference Library
  • steps 500 to 535 are designed to be run continuously, when breakages and errors are detected, the system will attempt to repair the issue and if unable create a task to be followed up by a support resource.
  • a network visualisation can be generated and displayed by the server 210 at steps 540 and 545. This is performed by the Data Trace toolset, and can be drawn and updated in real time, including information derived from the discovery process.
  • An example visualisation is shown in Figure 6.
  • the visualisation is presented in a window 610 of a user interface, and includes graphical elements in the form of icons 611 representing network hardware and/or data sources, with connections 612 showing communication link, and status indicators 613 showing a DSAR CDR status of respective hardware, data sources and communication links.
  • the Topography also contains a context sensitive integration to a support task management module, so as you click on individual network items, the related tasks are displayed in a details window 614.
  • the framework is integrated with the ServiceNow application and the visualisation is context sensitive, meaning a user can click on a specific area of the data environment and see and interact with any related cases and resolve them. For example, in the wireframe above the X-ray data server has been highlighted, and with it, all cases related to the status of that node/server have been extrapolated from the support application and presented. Case updates can update the status of the data environment in dynamically as well as adjust the Data Readiness Status giving stakeholder an understanding of overall progress.
  • sample data is gathered by the server 210, and used to generate a data source repository at step 555.
  • each organisation will have their own custom personal data construct in the form of a data source repository which builds a schema that correlates all known patient identifiers and updates the schema as new ones are identified, thereby reducing the chances that personal data will be missed due to dissimilar personal identifiers and improving search times and real time interoperability between systems.
  • These constructs can also be aggregated across organisations to create a unified industry template.
  • a test data request is then generated at step 560, with this being used by the server 210 to perform a search at step 565, and thereby retrieve data relating to an individual identified in the search request.
  • an Industry Business Rules (IBR) Engine automatically removes, redacts, and recalls information that is either not relevant to the customer or does not require disclosure at step 570.
  • This rules engine uses RPA techniques operates on Machine Learning based on the outcomes of other queries performed across multiple organisations, to ensure the redaction process is performed in accordance with current legislation and regulation, taking changes into account in real time.
  • results are compared to equivalent results generated using a manual process at step 575. This is used to evaluate the results of the search and redaction process, with the RPA logic being revised at step 580 if required.
  • the testing process uses empirically significant randomised selection, so that a personal data request is automatically generated for each sample, the same data is then cross validated by performing a manual audit and the results of both are compared and oversights and adjustments are made based on the result.
  • this approach uses RPA to effectively train bots how to emulate an initially mostly manual process, with the bots collecting data from known locations on the network, by submitting database queries, scraping screens, or the like.
  • this process is repeated over hundreds of requests and dozens of organisations, so that the algorithm becomes enriched and intelligent enough to become more efficient and reliable than a human user the more requests it handles.
  • DataBench awards the client with a certificate and ID and added to the central registry.
  • a data request is received from a client, with this typically being submitted via a client device 230.
  • the data request generally includes or is submitted in conjunction with identifying information, such as KYC details, which are used by the server 210 to validate the identity of the client at step 705.
  • the server 210 can optionally confirm receipt of the request and successful validation at step 710, for example by returning a message to the client via the client device 230.
  • a search is performed, and the results redacted at step 720, in a manner similar to that described above.
  • the redacted data is then used to generate a search response at step 725, which may optionally undergo manual screening at step 730, before being returned to the client.
  • the above described process uses a discovery process to generate a network topography, and secure access to data sources within the network. Searches can then be performed using a rules engine, using RPA developed using machine learning, to automatically process search requests, and redact search results, allowing DSAR/CDR requests to be automatically processed without any human involvement. Furthermore, as the system can adapt dynamically, this leads to increases in accuracy over time, whilst ensuring the system can be maintained in compliance with necessary legal and business requirements.
  • the system uses a data trace module to continuously scan an organisation’s infrastructure, and as data sources are identified, match and gain access to them via the credentials using an IDM (Identity and Access Management Service) and use RPA (Robotic Process Automation) to automatically locate a matching API from a reference library, then attempt to configure and test the API, update the Data Landscape with the result, and if necessary, create a ticket and assign to a support resource.
  • IDM Identity and Access Management Service
  • RPA Robot Process Automation
  • FIG. 8 A An example of the network components for the data trace module are shown in Figure 8 A, with a schematic of the operation of the data trace module being shown in Figure 8B.
  • the data trace module 801 is coupled to a data bank 802, having a knowledge base 802.1 and API configuration 802.2, and a data landscape module 803, described in more detail below. These components are implemented by the server 210.
  • the data trace module is connected to an IT service management (ITSM) service 804, an IDM service 805, and one or more data source APIs 806, that provide access to data stores 807.
  • the data trace module 801 also accesses network logs 808, generated by network devices 809.
  • the data trace module implements eight steps, including: (1) Searching the data environment; (2) Server/resource identification; (3) Obtaining API & permissions, (4) API matching and configuration; (5) Matching IDM credentials; (6) Testing API(s); (7) Creating a Ticket if required; and (8) Updating the knowledge base.
  • the data trace module connects to one or more network logs, and scans the network logs to identify data sources within the network environment of the organisation.
  • the data trace module connects to one or more identified data sources and/or a configuration management database (CMDB) infrastructure directory to extract key details regarding the data source, such as the IP address, server name, operating system, software versions, or the like.
  • CMDB configuration management database
  • the data trace module access the data bank 802 and attempts to find a matching API connector and configuration information for the respective data source.
  • the data bank acts as a repository of API's, which is populated over time as access to data sources is required.
  • the first time a data source is accessed if an API is not available, this can be created and added to the data bank, so that in future access to similar data sources can be achieved by retrieving the API.
  • this is returned by the data bank, allowing the data trace module to prepare the API.
  • the data trace module uses the organisation's identity management system to find and retrieve credentials for the respective data source, based on the API connector details.
  • the data trace module sends a request to the data source using the API connector, to test connectivity and validate the credentials.
  • the data trace module If the data trace module is unable to access the data source for any reason, for example if an API or configuration information is not present in the data bank, or if the credentials are invalid, the data trace module logs an incident with the organisations IT service management (ITSM) service 804. This allows the ITSM to resolve the issue, for example by generating an API, which can then be stored in the data bank, and/or providing alternative credentials to access the data source.
  • ITSM organisation IT service management
  • the data trace module then updates the data landscape, storing details of the data sources, and its current status, for example, if the data source is accessible at step 858.
  • An example of the process for acquiring data source details will now be described in more detail with reference to Figure 8D.
  • the data trace module initially uses one or more organisation directories, such as a CMDB to find details of the data source.
  • the data trace module uses one or more network modules to interrogate the data source to discover details at step 862.
  • the data trace module uses one or more platform specific APIs, such as Amazon Web Services (AWS) or Azure, to interrogate the data source to discover the details.
  • AWS Amazon Web Services
  • Azure Azure
  • the data bank 902 is connected to the data trace module 901 within an organisation, as well as data trace modules 913 in other organisations.
  • Each data trace module 901, 913 is connected to a network 911, 914 within the respective organisation, whilst the data bank 902 is connected to a network 912 of an entity responsible for management of the data discovery and retrieval process.
  • the data bank includes an API 902.1 that acts as an interface to provide connectivity to the data trace module 901, and a reference library 902.2.
  • the data bank is involved in seven steps, including: (1) receiving details of data sources from a data trace module; (2) retrieving matching APIs and configuration information; (3) Returning the API and configuration to the data trace module; (4) the data trace module raises an issue if an API or configuration is not available; (5) the ITSM resolve the issue and update the reference library 902.2; (6) the API and configuration are provided to the data trace module; (7) synchronisation is performed with external data trace modules.
  • the data trace module discovers one or more new data sources and makes requests to the data bank to retrieve matching API and configuration information, such as an endpoint protocol, an authentication mechanism, and data source schema.
  • the data bank 902 searches the reference library 902.2 for a matching API and configuration information, returning these to the data trace module at step 953, assuming these are available, thereby allowing the data trace module to update the data landscape, as described above.
  • step 954 when there is no matching API and/or configuration information, then an issue is raised with a supporting resource, such as ITSM, allowing an API and/or configuration information to be manually generated and/or sourced externally as required, with this being fed back into the data bank at step 955.
  • a supporting resource such as ITSM
  • the data trace module can be updated with the API and configuration information, for example by having this pushed to the data trace module by the data bank and/or by having the data trace module submit a further request.
  • data trace modules 913 at other organisations can access the newly sourced API and/or configuration information, thereby allowing the API / configuration to be utilised across multiple organisations.
  • the data trace module can call an API library that contains all known API’s and Configurations which the module will attempt to configure a valid connection and update the data landscape with the result and/or create a support ticket where it was unsuccessful.
  • This library is continuously enriched as a central repository as new data entities are encountered in other organisations.
  • the data landscape module 1003 is connected via an API 1003.1 acting as an interface, to the data trace module 1001, which are in turn connected to one or more data sources 1007.
  • the data landscape module also communicates with the ITSM 1004.
  • the data landscape module also includes a data store 1003.2, such as a database, and stores a topography 1003.3 of the network.
  • the data trace module 1001 As the data trace module 1001 discovers data sources and gathers connection details, it provides information to the data landscape module 1003.
  • the landscape module uses the information to draw an organisation data topography, adding data sources to the topography as they are discovered, at step 1052.
  • an authorised user is able to access the data landscape, via a visualisation presented on a user interface, as shown for example in Figures 10E to 101.
  • the visualisation is typically presented in a window 1010 of a user interface, and includes graphical elements in the form of icons 1011 representing network hardware and/or data sources, with connections 1012 showing communication link, and status indicators 1013 showing a DSAR CDR status of respective hardware, data sources and communication links.
  • the user interface also includes tabs 1015, allowing different additional information to be displayed.
  • a context tab is selected that displays information regarding the network environment as a whole, including a number of data sources, a number of issues and a general health indicator.
  • FIG 10F details of a number of data sources are shown in a data sources tab, allowing a user to easily view details of the data sources.
  • an issues tab is used to display any issues that need resolving, whilst a selected issue is shown in a pop-up window 1016 in Figure 10H.
  • data requests are shown in a data request tab.
  • the data landscape is dynamically updated as additional information is supplied from the data trace module, including adding updating or removing data sources, adjusting the status of the data sources, flagging issues, such as access problems or the like.
  • an authorized user accesses the visual representation of the landscape and reviews this to identify any data sources that are not present at step 1062.
  • the user can then manually add data sources at step 1063, adding in details, such as access credentials, or the like, as needed.
  • the landscape module then updates the visualization as needed, to reflect the update landscape.
  • the data landscape module can also be used to perform resolution of issues, and an example of this will now be described with reference to Figure 10D.
  • a user access the landscape visualization to review the overall landscape.
  • alerts and/or notifications are displayed, directing the user to any issues, such as a connection issues, missing data source details, ongoing consumer requests, or the like.
  • the user can explore each issue, viewing contextual information to assist the user in resolving the issue.
  • related information can be sourced from the ITSM to provide additional context and issue tracking capabilities.
  • Issue can then be resolved from the landscape view at step 1075, with information being propagated to the ITSM and/or other system modules as needed.
  • this allows a visual representation of organizational data environment to be drawn dynamically and updated in real time which includes Data source Name, Access Status, Database Type (On Prem/Off Prem, Hybrid), linkages between the Data Sources and colour coded Consumer Data Request status (Green, Yellow, Red).
  • the Topography also contains a context sensitive integration to a support task management module, so related tasks can be viewed and resolved as needed.
  • the data source repository module 1121 is connected via an API 1121.1 acting as an interface, to the data landscape module 1103, a data screen module 1122 and other modules 1123.
  • the data source repository module 1121 also includes a data store 1121.2, such as a database, which stores the data construct and an update module 1121.3 that updates the data construct.
  • the data construct is an industry specific repository which gleans and cross-references personal identifiers, relationships, and entity maps from other organisations to augment the data trace modules ability to identify personal data and correlate it across separate data sources even though the personal identifiers are dissimilar.
  • the data source repository is pre -configured with one or more schemas containing industry specific constructs of personal data.
  • the data constructs include terminology, entities, attributes, relationships and rule sets, such as redaction and archival rulesets, relevant to the respective industry.
  • each schema and the associated constructs are developed by analysing data within a variety of different data sources to identify and map different fields within the data sources.
  • entities are typically identified using identifiers, but these identifiers might be different within different data sources, or may be stored within different fields.
  • identifiers might be different within different data sources, or may be stored within different fields.
  • a person might be identified by their name, which might be stored in a single field, or across multiple fields, such as first, last and middle name fields.
  • an individual's identity might be associated with a separate identifier, such as a social security number.
  • the data source repository is developed by analysing sample records from multiple data sources, and comparing these to identify relationships between the different records.
  • the data source repository can also provide context to data within the data sources, for example, identifying types of information that are private or confidential.
  • the data source repository can then be used to assist with processing data from the different data sources, for example, allowing records from different data sources to be combined or consolidated, allowing data to be screened, or the like.
  • the one or more modules 1123 retrieve data source repository information from the repository 1121.2 via the API 1121.1, with the information being filtered by the API to ensure it is relevant to the modules.
  • the modules 1123 use the information when processing data, for example when identifying relationships between records, when redacting data, or the like.
  • the modules 1123 are also adapted to provide feedback to the data source repository 1121 via the API 1121.1 when gaps are identified in the data constructs. This may require manual intervention, via the ITSM or similar, for example to create a mapping, relationship or provide context or other information, allowing the feedback to be used to update the constructs at step 1155, and thereby ensure these are up to date.
  • the data screen module 1122 receives personal data 1224 as part of search results, and then operates to redact the data, typically based on rules stored in the data construct, providing redacted data 1225 as an output.
  • the data is also verified by an operator 1226, with results of the verification being fed back to the data screen module to allow the redaction process to be optimised.
  • This allows the data screen module to use an RPA supported process to leam from the organisation which data should be omitted, redacted or retained and then automating this process so that redactions, removals, and omissions are done based on industry and organisation specific rules reducing the necessity for human involvement leading to increases in accuracy and efficiency overtime.
  • the data screen module includes pattern recognition process 1222.1, which uses business rules 1222.2.
  • the data screen module comes pre-loaded with a standard set of pattern recognisers, which are developed over time, providing organisations with an enhanced base set.
  • Organisations can also provide their own custom patterns based on their industry and organisational knowledge, by providing regular expression syntax. Custom regular expression patterns can be tested and verified through the data screen testing and verification process.
  • the data screen module implements a language context process 1222.3, which employs a natural language processor 1222.4 to further understand the context of the recognised patterns.
  • the context is used to provide a greater degree of confidence in the recognition of the identified data item. For example, “my name is Mark” provides a greater degree of confidence in the identification of a person’s name of “Mark” compared to “the item was returned due to a red mark on the cover”.
  • de-identification logic 1222.5 is enacted to handle the required action, based on redaction definitions 1222.6.
  • the action may vary based on industry and organisation specific rules, which can be customised, tested and verified.
  • the output 1225 of the data screen process is verified by organisational staff 1226, comparing the original dataset with the redacted copy. Refinements are fed back into the Data Screen business rules engine and redaction logic. Once a sufficient amount of data has been processed by Data Screen and the results become more in line with expectations, the need for the manual verification process will reduce and be replaced with a fully automated, unattended process.
  • step 1251 personal data is provided to the data screen module for processing.
  • the data can be in a wide range of formats, including for example, documents, JSON, XML, text and images.
  • the data screen module uses industry specific and organisational specific logic and business rules, including rules defined within the data construct, to identify personal data that should be redacted.
  • the output is validated by an operator, allowing the operator to manually review the redacted data and confirm there is no personal information that breaches industry specific privacy rules or laws.
  • step 1254 If anomalies are encountered, then at step 1254, the issues are resolved, with feedback being used to update the relevant rules as needed using RPA. In this manner, the data screen module continues to improve its screening logic based on the outcome of previous processes at step 1255, thereby reducing the need for manual intervention and ultimately leading to an entirely automated approach.
  • the data request module 1331 is connected to the data landscape module 1303, the data source repository module 1321 and the data screen module 1322, as well as a search module 1334, which is in turn connected to data sources 1307 via API connectors 1306.
  • a search request received from a user 1332 can be processed, performed, and redacted, with optional manual review by an operator 1333.
  • the data request module 1331 receives a request containing verified consumer identification information.
  • the request module 1331 uses the data request to query the landscape module to identify one or more available data sources containing data relevant to the data request at step 1352.
  • the search module 1334 is then used to perform a series of searches of the data sources, to thereby retrieve search results at step 1353.
  • the search results are aggregated into a data package of personal data using the data construct, for example using the schemas to identify relationships between different search results, so that the results can be consolidated.
  • the resulting personal data is sent to the data screen module for further processing, in particular to allow the data screen module to perform recognition and redaction of the personal data at step 1356, using the process described above.
  • results of the search are provided for verification by an operator 1333, that is typically a member of the organisation hosting the data.
  • the operator verifies the results, and/or provides feedback, which is then used to further train the data source repository, data screen or data request modules. This allows the algorithms used to be trained over time so that RPA can be used to perform the searches in a substantially automated fashion.
  • the data request module 1331 receives a data package containing test data covering one or more industry scenarios.
  • the data request module 1331 sends the data package to the data screen module 1322, allowing the data screen module 1322 to perform recognition and redaction of the data package at step 1363, using the process described above.
  • the redacted data is returned to the data request module 1331, allowing results to be presented to an operative 1333, such as a member of organisational staff, at step 1364.
  • the operative reviews the results and either verifies these are suitably redacted and/or provides feedback, such as redacting further data, providing annotations or similar at step 1365.
  • This feedback is then used to further train the data request module 1331 and/or data screen module 1322, thereby improve the process for performing a search and redacting search results.
  • a consumer 1322 submits a request to data access request, also referred to as a Data Subject Access Request (DSAR), through an appropriate channel, such as email, phone, direct message, or the like, at step 1371.
  • the data access request is added to a request queue at step 1372, together with any captured information, such as details of the requestor, for further processing.
  • the data request module 1331 monitors the request queue and triggers when a new request is added to the queue. This causes the data request module 1331 to validates the queue entry, and assess the completeness of the request at step 1374.
  • the data request module 1331 will review the data access request to ensure the request contains sufficient information to allow the search to be performed, including any necessary details of the requestor, and sufficient information to identify the subjects of the search.
  • the data request module 1331 will also ensure that the organisations data landscape is ready for the search to be performed, for example ensuring the landscape module 1303 has identified relevant data sources, or the like. If issues are encountered with a request, a notification is sent to an operative, such as an administrator, allowing them to resolve the issue at step 1375.
  • the data request module invokes the request algorithm, to perform the search as described above with respect to Figure 13B.
  • results are returned for verification.
  • verification is performed to industry and organisation standards, with verification being attempted automatically using a model trained by machine learning. In the event that the verification cannot be performed automatically, this is manually performed at step 1378, for example, having results reviewed by an operative, with any feedback being used for further training as described above with respect to Figure 13C.
  • results are returned to the requestor via a nominated channel at step 1379.
  • the above approach facilitates the request via a series of independent enterprise searches, aided by the personal data construct with the finalised data package validated utilising industry/organisation business data rules through to delivery is unique, in that it is combining technological processes enhanced and improved continuously through human and machine learning intervention.
  • the above described system provides a DSAR/CDR process broken into 4 main stages, including data trace and test and certify processes performed during a configuration phase, and business as usual and outsource processes being performed as part of the operational phase.
  • Data Trace The activities and resources required to clearly map the data sources where personal data is stored, the relationships between and current access levels to them. This is used to give an overview of the project and depict the data landscape status at any given time.
  • Test & Certify Immediately following Data Trace, a series of manual and automated tests are performed to simulate genuine personal data requests where the results are audited and assessed, and adjustments made to give an organisation confidence in their ability to respond to these requests and fulfil data standard CDR obligations.
  • DSAR/CDRBAU This process covers the BAU “business as usual” activities related to receiving and responding to a DSAR/CDR request, from customer verification, using RPA (Robotic Process Automation) enabled business rules to perform redaction and editing of the request, to automatically notifying the customer as to the status of their request.
  • RPA Robot Process Automation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Security & Cryptography (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computer And Data Communications (AREA)
  • Circuits Of Receivers In General (AREA)
  • Communication Control (AREA)

Abstract

A data management system for automatically providing personal data from a network environment in response to a data request, the system including, one or more processing devices configured to utilise a discovery process to generate a network topography indicative of data sources within the network environment, implement one or more Application Programming Interfaces (APIs) to access the data sources, access a data source repository maintaining industry specific data constructs relating to the data sources, receive a data request relating to an individual, perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, generate a search response including redacted personal data.

Description

DATA MANAGEMENT
Background of the Invention
[0001] The present invention relates to a data management system and associated data management methods, and in one particular example, a data management system and associated data management methods for automatically providing personal data from a network environment in response to a data request.
Description of the Prior Art
[0002] The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgement or admission or any form of suggestion that the prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
[0003] The General Data Protection Regulation (GDPR) is a set of EU Laws on Data Protection and Privacy in the European Union and the European Economic Area. Since its introduction in May 2018, the GDPR has resulted in steep fines to several organisations for leaking personal data as well as not fulfilling Data Service Access Requests (DSAR) obligations. It is also of note that the rules impact companies who are not domiciled in the EU, if they provide services to the EU and collect data on EU residents. Globally the GDPR approach has been viewed as a best-in-class approach, resulting in other countries seeking to implement GDPR style legislation.
[0004] However, in many organisations, being able to provide access to data, whilst preventing leaking of personal details is highly problematic. For example, many organisations handle large volumes of data, whilst in many cases data is stored across multiple different databases, often inconsistently. For example, health care facilities may have data spread across internal records relating to patients and consultations, centralised medical records, accounts records and customer relationship databases. In each of these, the same individual may be associated with different identifying information, with data often being stored in different formats, for example, in accordance with different schemas. This makes it impossible for computer systems to automatically retrieve data consistently across all sources, meaning responding to data requests is currently a largely manual process.
Summary of the Present Invention
[0005] In one broad form, an aspect of the present invention seeks to provide a data management system for automatically providing personal data from a network environment in response to a data request, the system including, one or more processing devices configured to: utilise a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; access a data source repository maintaining industry specific data constructs relating to the data sources; receive a data request relating to an individual; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, generate a search response including redacted personal data.
[0006] In one embodiment the one or more processing devices are configured to automatically perform redaction of the retrieved personal data.
[0007] In one embodiment the one or more processing devices are configured to use Robotic Process Automation (RPA) to at least one of: perform the discovery process; implement the API; perform the search; automatically redact personal data; and, validate search results.
[0008] In one embodiment the one or more processing devices are configured to perform RPA in accordance with RPA logic, and wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning.
[0009] In one embodiment the one or more processing devices are configured to perform the discovery process by: monitoring network traffic in the network environment; and, analysing the network traffic to identify the data sources. [0010] In one embodiment the network traffic is analysed using at least one of: traffic analytics; and, Azure traffic analytics.
[0011] In one embodiment the one or more processing devices are configured to: connect to one or more network logs; and, scan the network logs to identify data sources within the network environment.
[0012] In one embodiment the one or more processing devices are configured to perform the discovery process using an identity and access management service to determine access levels and credentials associated with the data sources.
[0013] In one embodiment the one or more processing devices are configured to: generate a network visualisation indicative of the network topography; and, display the network visualisation to a user.
[0014] In one embodiment the network visualisation includes graphical elements indicative of: data sources; network hardware; and connections indicative of communication links between the data sources and network hardware.
[0015] In one embodiment each graphical element includes a status indicator indicative of a status of a respective data source, network hardware or connection.
[0016] In one embodiment the one or more processing devices are configured to: detect user selection of graphical element; retrieve additional information regarding the respective data source, network hardware or connection; and, display the additional information.
[0017] In one embodiment the one or more processing devices are configured to determine data source details by interrogating at least one of: a data sources; and, a configuration management database.
[0018] In one embodiment the one or more processing devices are configured to use the data source details to retrieve at least one of: an API; and configuration information.
[0019] In one embodiment the configuration information includes at least one of: an endpoint protocol; an authentication mechanism; and data source schema. [0020] In one embodiment the data source details include at least one of: a server name; a database name; an IP address; a vendor name; and, a data source and/or software version.
[0021] In one embodiment the one or more processing devices are configured to select an API from an API repository, the API repository hosting APIs and configuration information for multiple different data sources.
[0022] In one embodiment the one or more processing devices are configured to select an API at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
[0023] In one embodiment the one or more processing devices are configured to: retrieve credentials; and, access the data source using the API and the credentials.
[0024] In one embodiment the one or more processing devices are configured to: select sample data; generate one or more test data requests relating to the sample data; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on the test data request; automatically perform redaction of the retrieved personal data; and generate a test result including redacted personal data, wherein the test result is manually audited to confirm the personal data is redacted as required.
[0025] In one embodiment the one or more processing devices are configured to: analyse data stored in different data sources within a network environment; and, update a data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources.
[0026] In one embodiment the data constructs include at least one of: terminology; entities; attributes; relationships; and, rulesets.
[0027] In one embodiment the data constructs are usable in at least one of: analysing data from different data sources; combining data from different data sources; and, redacting data from the data sources. [0028] In one embodiment the one or more processing devices are configured to: select sample data; and, analyse sample data to generate the data source repository, wherein the data source repository links to data associated with an individual in different data sources.
[0029] In one embodiment the one or more processing devices are configured to associate unique identifiers with each individual in the data source repository.
[0030] In one embodiment the one or more processing devices are configured to: receive a data request relating to an individual; perform a search of a network environment to thereby retrieve personal data based on an identity of the individual; automatically perform redaction of the retrieved personal data; and, generate a search response including redacted personal data.
[0031] In one embodiment the one or more processing devices are configured to: retrieve information regarding available data sources; identify one or more available data sources relevant to the data request; and, perform one or more searches of the available data sources.
[0032] In one embodiment the one ormore processing devices are configured to: receive search results; analyse the search results using a data source repository that maintains industry specific data constructs relating to the data in the different data sources; and, aggregate the search results using results of the analysis to create the personal data.
[0033] In one embodiment the one or more processing devices are configured to: perform pattern recognition to recognise patterns in the retrieved personal data; and, redact the personal data in accordance with the recognised patterns.
[0034] In one embodiment the one or more processing devices are configured to: perform natural language processing to understand a context of recognised patterns; and, redact the personal data in accordance with the context.
[0035] In one embodiment the one ormore processing devices are configured to redact the data using de-identification logic.
[0036] In one embodiment the one or more processing devices are configured to: receive a data request relating to an individual; validate the identity of the individual; and, perform the search in response to a successful validation. [0037] In one embodiment the one or more processing devices are configured to validate the identity of the individual using a Know Your Client (KYC) procedure.
[0038] In one embodiment the redaction process is performed at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
[0039] In one broad form, an aspect of the present invention seeks to provide a data management method for automatically providing personal data from a network environment in response to a data request, the method including, in one or more processing devices: utilising a discovery process to generate a network topography indicative of data sources within the network environment; implementing one or more Application Programming Interfaces (APIs) to access the data sources; generating a data source repository indicative of locations of personal data within the data sources; receiving a data request relating to an individual; performing a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, generating a search response including redacted personal data.
[0040] In one broad form, an aspect of the present invention seeks to provide a data management system including one or more processing devices configured to perform a discovery process to generate a network topography indicative of data sources within a network environment by: monitoring network traffic in the network environment; analysing the network traffic to identify data sources within the network environment; and, using an identity and access management service to determine access levels and credentials associated with the data sources.
[0041] In one embodiment the network traffic is analysed using at least one of: traffic analytics; and, Azure traffic analytics.
[0042] In one embodiment the one or more processing devices are configured to: connect to one or more network logs; and, scan the network logs to identify data sources within the network environment.
[0043] In one embodiment the Identity and Access Management Service is an Active Directory. [0044] In one embodiment the one or more processing devices are configured to determine data source details by interrogating at least one of: a data sources; and, a configuration management database.
[0045] In one embodiment the one or more processing devices are configured to use the data source details to retrieve at least one of: an API; and configuration information.
[0046] In one embodiment the configuration information includes at least one of: an endpoint protocol; an authentication mechanism; and data source schema.
[0047] In one embodiment the data source details include at least one of: a server name; a database name; an IP address; a vendor name; and, a data source and/or software version.
[0048] In one embodiment the one or more processing devices are configured to select an API from an API repository, the API repository hosting APIs and configuration information for multiple different data sources.
[0049] In one embodiment the one or more processing devices are configured to select an API at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
[0050] In one embodiment the one or more processing devices are configured to: retrieve credentials; and, access the data source using the API and the credentials.
[0051] In one broad form, an aspect of the present invention seeks to provide a data management method including, in one or more processing devices, performing a discovery process to generate a network topography indicative of data sources within a network environment by: monitoring network traffic in the network environment; analysing the network traffic to identify data sources within the network environment; and, using an identity and access management service to determine access levels and credentials associated with the data sources.
[0052] In one broad form, an aspect of the present invention seeks to provide a data management system for use in accessing multiple data sources, the system including one or more processing devices configured to: determine data source details by interrogating at least one of: a data sources; and, a configuration management database; and, implement one or more Application Programming Interfaces (APIs) to access the data sources by selecting an API from an API repository using the data source details, wherein the API repository hosts APIs and configuration information for multiple different data sources.
[0053] In one embodiment the configuration information includes at least one of: an endpoint protocol; an authentication mechanism; and data source schema.
[0054] In one embodiment the data source details include at least one of: a server name; a database name; an IP address; a vendor name; and, a data source and/or software version.
[0055] In one embodiment the one or more processing devices are configured to select an API at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
[0056] In one embodiment the one or more processing devices are configured to: retrieve credentials; and, access the data source using the API and the credentials.
[0057] In one broad form, an aspect of the present invention seeks to provide a data management method for use in accessing multiple data sources, the method including, in one or more processing devices: determining data source details by interrogating at least one of: a data sources; and, a configuration management database; and, implementing one or more Application Programming Interfaces (APIs) to access the data sources by selecting an API from an API repository using the data source details, wherein the API repository hosts APIs and configuration information for multiple different data sources.
[0058] In one broad form, an aspect of the present invention seeks to provide a data management system for generating a network visualisation, the system including one or more processing devices configured to: perform a discovery process to generate a network topography indicative of data sources within a network environment by monitoring network traffic in the network environment; generate a network visualisation indicative of the network topography, the network visualisation including: graphical elements indicative of: data sources; network hardware; and connections indicative of communication links between the data sources and network hardware; and, a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection; and, display the network visualisation to a user.
[0059] In one embodiment the one or more processing devices are configured to: detect user selection of graphical element; retrieve additional information regarding the respective data source, network hardware or connection; and, display the additional information.
[0060] In one broad form, an aspect of the present invention seeks to provide a data management method for generating a network visualisation, the method including, in one or more processing devices: performing a discovery process to generate a network topography indicative of data sources within a network environment by monitoring network traffic in the network environment; generating a network visualisation indicative of the network topography, the network visualisation including: graphical elements indicative of: data sources; network hardware; and connections indicative of communication links between the data sources and network hardware; and, a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection; and, displaying the network visualisation to a user.
[0061] In one broad form, an aspect of the present invention seeks to provide a data management system for maintaining a data source repository relating to different data sources, the system including one or more processing devices configured to: analyse data stored in different data sources within a network environment; and, update the data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources.
[0062] In one embodiment the data constructs include at least one of: terminology; entities; attributes; relationships; and, rulesets.
[0063] In one embodiment the data constructs are usable in at least one of: analysing data from different data sources; combining data from different data sources; and, redacting data from the data sources.
[0064] In one broad form, an aspect of the present invention seeks to provide a data management method for maintaining a data source repository relating to different data sources, the method including, in one or more processing devices: analysing data stored in different data sources within a network environment; updating the data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources.
[0065] In one broad form, an aspect of the present invention seeks to provide a data management system for searching data relating to an individual, the system including one or more processing devices configured to: receive a data request relating to an individual; perform a search of a network environment to thereby retrieve personal data based on an identity of the individual; automatically perform redaction of the retrieved personal data; and, generate a search response including redacted personal data.
[0066] In one embodiment the one or more processing devices are configured to: retrieve information regarding available data sources; identify one or more available data sources relevant to the data request; and, perform one or more searches of the available data sources.
[0067] In one embodiment the one ormore processing devices are configured to: receive search results; analyse the search results using a data source repository that maintains industry specific data constructs relating to the data in the different data sources; and, aggregate the search results using results of the analysis to create the personal data.
[0068] In one embodiment the one or more processing devices are configured to: perform pattern recognition to recognise patterns in the retrieved personal data; and, redact the personal data in accordance with the recognised patterns.
[0069] In one embodiment the one or more processing devices are configured to: perform natural language processing to understand a context of recognised patterns; and, redact the personal data in accordance with the context.
[0070] In one embodiment the one ormore processing devices are configured to redact the data using de-identification logic. [0071] In one embodiment the one or more processing devices are configured to: receive a data request relating to an individual; validate the identity of the individual; and, perform the search in response to a successful validation.
[0072] In one embodiment the one or more processing devices are configured to validate the identity of the individual using a Know Your Client (KYC) procedure.
[0073] In one embodiment the redaction process is performed at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms.
[0074] In one broad form, an aspect of the present invention seeks to provide a data management method for searching data relating to an individual, the method including, in one or more processing devices: receiving a data request relating to an individual; performing a search of a network environment to thereby retrieve personal data based on an identity of the individual; automatically performing redaction of the retrieved personal data; and, generating a search response including redacted personal data.
[0075] In one broad form, an aspect of the present invention seeks to provide a data management system including one or more processing devices configured to: determine Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning; and, use the RPA logic to perform RPA to at least one of: perform a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; automatically redact personal data; and, validate the search results.
[0076] In one broad form, an aspect of the present invention seeks to provide a data management method including, in one or more processing devices: determining Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; and, compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning; and, using the RPA logic to perform RPA to at least one of: perform a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; automatically redact personal data; and, validate the search results.
[0077] It will be appreciated that the broad forms of the invention and their respective features can be used in conjunction and/or independently, and reference to separate broad forms is not intended to be limiting. Furthermore, it will be appreciated that features of the method can be performed using the system or apparatus and that features of the system or apparatus can be implemented using the method.
Brief Description of the Drawings
[0078] Various examples and embodiments of the present invention will now be described with reference to the accompanying drawings, in which: -
[0079] Figure 1A is a flow chart of an example of a data management method;
[0080] Figure IB is a flow chart of an example of a discovery process;
[0081] Figure 1C is a flow chart of an example of an API implementation process using a reference library;
[0082] Figure ID is a flow chart of an example of network visualisation process;
[0083] Figure IE is a flow chart of an example of a process for updating a data schema;
[0084] Figure IF is a flow chart of an example of a data search process;
[0085] Figure 2 is a schematic diagram of a specific example of a network architecture;
[0086] Figure 3 is as schematic diagram of an example of a processing system; [0087] Figure 4 is a schematic diagram of an example of a client device;
[0088] Figures 5 A to 5C are a flow chart of an example of a method of configuring a data management system;
[0089] Figure 6 is a schematic diagram of an example of a network visualisation;
[0090] Figure 7 is a flow chart of an example of a data search process;
[0091] Figure 8A is a schematic diagram of an example of a network schematic when performing a discovery process using a data trace module;
[0092] Figure 8B is a schematic diagram of an example of a component schematic when performing a discovery process;
[0093] Figure 8C is a flow chart of a specific example of a discovery process;
[0094] Figure 8D is a flow chart of a specific example of a process for handling a new data source identified using the discovery process;
[0095] Figure 9A is a schematic diagram of an example of a network schematic when performing an API implementation process using a reference library;
[0096] Figure 9B is a schematic diagram of an example of a component schematic when performing the API implementation process;
[0097] Figure 9C is a flow chart of a specific example of the API implementation process;
[0098] Figure 10A is a schematic diagram of an example of a component schematic when performing a network visualisation process using a data landscape construction process;
[0099] Figure 10B is a flow chart of a specific example of an automated network landscape construction process;
[0100] Figure IOC is a flow chart of a specific example of a manual network landscape construction process; [0101] Figure 10D is a flow chart of a specific example of a network landscape resolution process;
[0102] Figure 10E is a schematic diagram of an example of a network visualisation user interface;
[0103] Figure 10F is a schematic diagram of an example of the network visualisation user interface of Figure 10E with details of different data sources displayed;
[0104] Figure 10G is a schematic diagram of an example of the network visualisation user interface of Figure 10E with details of known issues displayed;
[0105] Figure 1 OH is a schematic diagram of an example of the network visualisation user interface of Figure 10E with details a selected issue displayed;
[0106] Figure 101 is a schematic diagram of an example of the network visualisation user interface of Figure 10E with data requests displayed;
[0107] Figure 11A is a schematic diagram of an example of a component schematic when updating a schema repository using a data construct module;
[0108] Figure 1 IB is a flow chart of a specific example of a schema repository construction process;
[0109] Figure 12A is a schematic diagram of an example of a component schematic when performing data screening using a data screen module;
[0110] Figure 12B is a schematic diagram of an example of internal components of the data screen module;
[0111] Figure 12C is a flow chart of a specific example of a data screening process;
[0112] Figure 13A is a schematic diagram of an example of a component schematic when performing a search using a data request algorithm;
[0113] Figure 13B is a flow chart of a specific example of a search process; [0114] Figure 13C is a flow chart of a specific example of data request algorithm training; and, [0115] Figure 13D is a flow chart of a specific example of a data access request process.
Detailed Description of the Preferred Embodiments
[0116] An example of a data management method for automatically providing personal data from a network environment in response to a data request will now be described with reference to Figure 1A.
[0117] For the purpose of illustration, it is assumed that the process is performed at least in part using a system including one or more electronic processing devices forming part of one or more processing systems, connected to a network system including one or more data sources, such as databases or the like. In one example, this could include a local or remote server, optionally including a cloud based architecture, which interfaces with a network system, such as a local area network (LAN) or wide area network (WAN) within an organisation. Whilst the process can be performed using multiple processing devices, with processing performed by one or more of the devices, for the purpose of ease of illustration, the following examples will refer to a single device, but it will be appreciated that reference to a singular processing device should be understood to encompass multiple processing devices and vice versa, with processing being distributed between the devices as appropriate.
[0118] In this example, the process is broadly broken down into two stages, namely a configuration phase, including steps 100 to 102, in which the network is analysed to enable access to data sources, and an operational phase, including steps 103 to 105, in which search requests are processed. Although not explicitly described, it will be appreciated from the following description that the configuration and operational phases could be implemented using different processing devices.
[0119] At step 100, the processing device utilises a discovery process to generate a network topography indicative of data sources within the network environment. The nature of the discovery process will vary depending on the implementation, but typically this will involve monitoring traffic on the network, and analysing the traffic using machine learning to identify data sources. [0120] At step 101, the processing device implements one or more Application Programming Interfaces (APIs) to access the data sources. This can be achieved in any appropriate manner, but will typically involve retrieving APIs from a repository of pre-existing APIs based, for example, on an understanding of details of the data source, such as the type and version of the data source.
[0121] At step 102, a data source repository is accessed and optionally constructed, updated or generated by the processing device, which relates to the data in the different data sources, and in one example is indicative of locations of personal data within the data sources, and optionally other information regarding the data sources, such as the data source formats and structures. The repository can acts to assist as a mapping to the different data structures that might exist within different data source, allowing data stored in different formats and/or data structures, to be retrieved more easily.
[0122] Once the configuration process is completed, testing and optional certification process might be performed, as will be described in more detail below, thereby ensuring the data sources identified are accessible and that responses can be provided in accordance with industry/organisation business data and legal compliance rules. A visual representation of the resulting network topography could also be displayed, allowing this to be reviewed by administrators, or other operators. Additionally the above process it typically performed repeatedly, for example performing the process on an ongoing basis, either continuously or periodically, allowing additional data sources to be added to the topography as they are discovered, allowing a status of data sources and other network resources to be maintained.
[0123] During the operational phase, at step 103, a data request relating to an individual is received by the processing device. The data request could be in any suitable form, and typically includes information regarding the identity of the individual making the request, which may require validation and/or authentication, to ensure resulting data is supplied in accordance with privacy requirements.
[0124] At step 104 the processing device performs a search of different data sources in the network environment using the data source repository and the APIs, to obtain data from different data sources, typically based on the identity of an individual. Thus, the search uses the data source repository and identity, to determine where data relating to the individual is stored in the data sources, with the APIs being used to construct queries, allowing relevant data to be retrieved.
[0125] At step 105, the processing device optionally performs automatic redaction of the retrieved personal data, for example removing data that is subject to privacy or confidentiality limitations. It will be appreciated that such redaction may not be required for all requests, for example, in the event that the retrieved data already meets privacy and/or compliance requirements. The redaction is typically performed, at least in part, using Robotic Process Automation (RPA), enabling the redaction process to be performed substantially automatically in accordance with relevant requirements, such as local laws and/or data holder requirements.
[0126] At step 106, a search response including redacted personal data is generated and provided to the individual submitting the request, with the response typically being generated in accordance with business and/or local Customer Data Request (CDR) requirements.
[0127] Accordingly, the above described process enables a processing device to retrieve and redact data relating to an individual substantially automatically, even where data is stored in disparate data sources, often in different forms, formats or structures, which would not otherwise be the case. This can be achieved using the combination of the configuration phase, in which a discovery process is used to map the network topology to establish APIs and a data source repository, so that searching can be performed, and the operational phase, in which search is performed using the APIs and schema, with data being automatically redacted as needed. This in turn allows organisations to more rapidly respond to data requests, whilst ensuring privacy and/or compliance requirements are met.
[0128] In one example, the data management system could include one or more processing devices configured to implement a module, referred to hereinafter as a data trace module, which performs a discovery process to generate a network topography indicative of data sources within a network environment, and an example of this will now be described with reference to Figure IB.
[0129] In this example, at step 110, the system monitors network traffic in the network environment, before analysing the network traffic to identify data sources within the network environment at step 111. At step 112, the system uses an Identity and Access Management Service (IDM), such as an Active Directory, to determine access levels and credentials associated with the data sources.
[0130] Accordingly, it will be appreciated that this provides a mechanism to perform a discovery process by continuously scanning an organisation’s infrastructure, and as data sources of the data environment are identified, the system will attempt to match and gain access to them via the credentials using an IDM (Identity and Access Management Service).
[0131] In one example, the data management system could include one or more processing devices configured to access a data store, referred to hereinafter as a reference library, to locate an API to facilitate access to the data source, as will now be described with reference to Figure 1C.
[0132] In this example, at step 120 the data management system interrogates the data source(s) identified in the network environment, for example using the discovery process described above with respect to Figure IB and/or a configuration management database. Results of the interrogation are used to determine a data source details at step 121, the data source details including information such as one or more of a server name; an IP address; a vendor name; and, a version. Having determined this information, one or more Application Programming Interfaces (APIs) and/or configuration information needed to access the data source(s) are retrieved at step 122 by selecting an API from an API repository using results of the interrogation, wherein the API repository hosts APIs and configuration information for multiple different data sources.
[0133] Accordingly, this mechanism allows the system to interrogate data sources and automatically retrieve the APIs and/or configuration information required to access the data sources. In one example, this process is performed at least in part using RPA (Robotic Process Automation) to automatically locate the matching API from the reference library, then attempt to configure and test the API, and if necessary, create a ticket and assign to a support resource.
[0134] Thus, as new data sources are uncovered, the system can call an API library which contains all known API’s and Configurations which the system will attempt to configure a valid connection and update the data landscape with the result and/or create a support ticket where it was unsuccessful. This library is continuously enriched as a central repository as new data entities are encountered in other organisations.
[0135] The discovery process can also be used to construct a data landscape using a module referred to hereinafter as a data landscape module, which can be used present a visual representation of the network environment, and an example of this will now be described with reference to Figure ID.
[0136] In this example, at step 130, the system performs a discovery process to generate a network topography indicative of data sources within a network environment. This is typically performed using the approached described above with respect to Figure IB, and hence involves monitoring network traffic in the network environment.
[0137] At step 131, the system determines a status of the data sources, for example by monitoring traffic, using results of data source interrogation or the like. At step 132 the system generates a network visualisation indicative of the network topography, the network visualisation including graphical elements indicative of data sources, network hardware, and connections indicative of communication links between the data sources and network hardware. The visualisation also includes a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection. The network visualisation can then be displayed to a user at step 133, allowing the user to explore the network, and optionally interact with the network as required.
[0138] Thus, this process allows a visual representation of an organisation's data environment to be drawn dynamically and updated in real time, displaying information regarding data sources, such as a Data source Name, Access Status, Database Type (On Prem/Off Prem, Hybrid), linkages between the Data Sources and colour coded Consumer Data Request status (Green, Yellow, Red).
[0139] In one example, the network topography can also contain a context sensitive integration to a support task management module, so as a user selects individual data source items, related tasks are displayed. [0140] In order to facilitate access to information from within different data sources, it is typical to construct a database schema, hereinafter referred to as a personal data construct, and an example of this will now be described with reference to Figure IE.
[0141] In this example, the system selects data from different database, for example by performing searches using the APIs at step 140, with the data optionally relating to one or more individuals. At step 141 data is analysed, with results of the analysis being used to update, including to construct, generate and/or maintain, a data source repository at step 142. In this regard, the data source repository maintains industry specific data constructs relating to the different data sources, including for example, details of relationships between the data in the different data sources, information on data source formats, structures, schemas of each data source, a context associated with the data or data source, RPA logic for accessing, retrieving and/or screening data in the data sources, or other relevant information.
[0142] This allows the system to construct an industry specific data source repository which gleans and cross-references personal identifiers, relationships, and entity maps from other organisations to augment the tracing module's ability to identify personal data and correlate it across separate data sources even though the personal identifiers and data structures may be dissimilar.
[0143] The system also typically implements a module, hereinafter referred to as a data screen module, which is used to screen data retrieved from the data sources, allowing the data to be redacted as required, and used in fulfilling consumer data requests, and an example of this will now be described with reference to Figure IF.
[0144] In this example, the data screen module is configured to receive a data request relating to an individual at step 150. The system then performs a search of a network environment at step 151 to thereby retrieve personal data based on an identity of the individual. Thus, this will typically use the APIs and credentials identified above to access the data sources, using the data source repository to then access information regarding the identity of the individual. At step 152, the system automatically performs redaction of the retrieved personal data at least in part using at least one of: machine learning; and, Robotic Process Automation (RPA) algorithms, at step 152, to ensure any sensitive or private information is protected. Following this, at step 153 the system generates a search response including redacted personal data. Thus, this provides a mechanism to automatically redact data retrieved during a search.
[0145] In one example, the data screen module operates using an RPA supported process to learn from the organisation which data should be omitted, redacted or retained and then automating this process so that redactions, removals, and omissions are done based on industry & organisation specific rules reducing the necessity for human involvement leading to increases in accuracy & efficiency over time.
[0146] The data screen module can be pre-loaded with a standard set of pattern recognisers and can use natural language processor to understand the context of the recognised patterns, and thereby improve understanding of the content, and hence the redaction process. As patterns are recognised within the data item, de-identification logic is enacted to handle the required action. The action may vary based on industry and organisation specific rules, which can be customised, tested and verified.
[0147] The output of the data screen module process can be verified by organisational staff, comparing the original dataset with the redacted copy. Refinements are fed back into a rules engine and redaction logic to improve the redaction process. Once a sufficient amount of data has been processed, results improve and the need for the manual verification process will reduce and be replaced with a fully automated, unattended process.
[0148] It will be appreciated that the above approaches utilise progressive RPA to automate a data request and facilitate the request via a series of independent enterprise searches, aided by the personal data construct with the finalised data package validated using utilising industry/organisation business data rules through to delivery is unique, in that it is combining technological processes enhanced and improved continuously through human and machine learning intervention.
[0149] Accordingly, in one example, the data management system includes one or more processing devices configured to: determine Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: specific to at least one of: an industry; business requirements; compliance rules; retrieved from an RPA logic repository; generated by the processing device; generated by the processing device and stored in an RPA repository for subsequent reuse; and, generated by the processing device using machine learning; and, use the RPA logic to perform RPA to at least one of: perform a discovery process to generate a network topography indicative of data sources within the network environment; implement one or more Application Programming Interfaces (APIs) to access the data sources; perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; automatically redact personal data; and, validate the search results.
[0150] A number of further features will now be described.
[0151] In one example, the processing device is configured to use Robotic Process Automation (RPA) to perform one or more tasks in the above described process. This can include, for example, using RPA algorithms to, perform the discovery process, implement the API, perform the search, automatically provide redacted personal data, and/or validate search results.
[0152] In this regard, RPA occurs when basic tasks are automated through software or hardware systems that function across a variety of applications. The software or bot can be instructed to follow a workflow with multiple steps and applications, such as retrieving, scanning, and collecting data from forms, sending a receipt message, checking the forms for completeness, filing the form in a folder, and updating a spreadsheet with the name of the form, the date filed, and so on. RPA software is designed to reduce the burden for employees of completing repetitive, simple tasks. As the data retrieval and redaction processes are an ideal application of the technology, due to the volumes and the linear rules-based logic of the data retrieval and edit/redaction process, in addition this method is highly and instantly scalable.
[0153] In general, RPA is performed in accordance with RPA logic, which sets out the steps to be performed as part of the RPA process. The RPA logic is typically specific to an industry, business requirements and/or compliance rules. For example, it will be appreciated that the requirements associated with redaction of medical data might be significantly different to redaction of legal data, and hence a different redaction process be required. Similarly, different redaction processes might be required for different jurisdictions to comply with local law, and/or to comply with the requirements of different businesses. [0154] In general, such RPA logic can be stored in, and hence retrieved from an RPA logic repository, such as the data source repository, as needed. Thus, for example, if redaction rules have already been established for medical data in Australia, the relevant RPA logic can be retrieved and used in redacting medical data within Australia. This obviates the need for new RPA logic to be established when the system is deployed in a scenario similar to existing scenarios. However, in the event that the data management approach is deployed in a new scenario, suitable RPA logic might not be available, in which case new RPA logic might need to be generated by the processing device. This could be performed from scratch and/or could be based on existing similar RPA logic, for example using RPA logic from a similar scenario as a starting point. The new RPA logic can be generated using machine learning and/or manual intervention, for example, by examining manually performed actions to train the processing device. Once generated, RPA logic can be stored in an RPA repository for subsequent reuse.
[0155] Thus, it will be appreciated that as the above process is rolled out in new scenarios, associated RPA logic for that scenario will be generated, with this then being available for reuse in subsequent similar scenarios. Consequently implementation of the above approach becomes easier as it is implemented more widely.
[0156] In one example, the processing device performs the discovery process by monitoring network traffic in the network environment and then analysing the network traffic to identify the data sources. Whilst monitoring can be performed in any appropriate manner, in one example this is achieved using Microsoft Azure traffic analytics, allowing the analysis to identify potential data sources on the network. Additionally, and/or alternatively, this can involve connecting to one or more network logs and scanning the network logs to identify data sources within the network environment. The processing device can also perform the discovery process using an Identity and Access Management Service, such as an Active Directory, which is used to authenticate and authorise users, with the processing device using the Identity and Access Management Service to determine access levels and credentials associated with the data sources.
[0157] Following the discovery process, the processing device can optionally generate a network visualisation indicative of the network topography and display the network visualisation to a user. The visualisation can be of any appropriate form, but typically includes graphical elements indicative of data sources, network hardware and connections indicative of communication links between the data sources and network hardware. Each graphical element can also include a status indicator indicative of a status of a respective data source, network hardware or connection. The network visualisation is also typically interactive, allowing the user to view additional information associated with each network elements. In this instance, the processing device can detect user selection of one of the graphical elements, retrieve additional information regarding the respective data source, network hardware or connection and display the additional information. This allows an operator to easily visualise the network, in turn allowing for a manual review of the discovery process, for example, to allow an operator to identify network elements that have been incorrectly identified.
[0158] In one example, the processing device interrogates the data sources and/or a configuration management database and uses results of the interrogation to the determine data source details, which can in turn be used to select an API and/or configuration information. This process is typically performed automatically, for example using RPA. In this regard the processing device interrogates the data sources to determine information, such as a server name, an IP address, a database name, a vendor name and a data source and/or software version. This can be performed as part of the above described discovery process, or can be performed separately depending on the preferred implementation. Using this information, the processing device selects an API from an API repository that hosts pre-existing APIs and configuration information, such as an endpoint protocol, an authentication mechanism, and data source schema, for multiple different data sources. It will be appreciated that an API may not be easily identifiable and hence the identification process could be assisted using machine learning and/or RPA approaches. In any event, by interrogating and hence identifying the data sources, this allows the processing device to select an API that can be used to access the data source, thereby obviating the need for an API to be manually configured. However, it will be appreciated that where an API does not exist, then manual configuration may be required, and this typically involves having the processing device generate and issue a service request, which is forwarded to an operator to allow the issue to be resolved.
[0159] The above described process also leads to a self-healing capability. In this regard, in the event that operation of an API, or its configuration, fails, then the respective data source will be identified as no longer available as the above described discovery process is repeated. In this instance, the processing device will re-interrogate the data source and again select an API using results of the interrogation, with this being performed until access is restored or the process is elevated for action by an operator. In either case, access will ultimately be restored. Thus, for example, if the data source is updated and the previous API is no longer applicable, this will lead to an updated API being identified and implemented.
[0160] In one example, having determined the API, the processing device is configured to retrieve credentials, typically from the Identity and Access Management Service, and access the data source using the API and the credentials. This can be used to confirm that the API is configured correctly and that that credentials do indeed provide access to the data source. It will be appreciated that successfully accessing the data source allows the data source to be used in subsequent searching, and that as a result the status of the data source can be updated in the network visualisation mentioned above, to reflect the fact that the data source can now be queried automatically.
[0161] Once data sources are deemed accessible, testing is typically performed in order to ensure data relating to an individual can be successfully retrieved. To achieve this, the processing device selects sample data within the data sources, and then generates one or more test data requests relating to the sample data. This process can be automatic and/or performed with manual oversight by an operator. The processing device then performs a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on the test data request, with automatic redaction of the retrieved personal data being performed, typically using RPA. A test result is then generated including redacted personal data. This test result is then manually audited, for example, by comparison to similar test result generated using an entirely manual process. This performed to confirm the personal data is redacted as required.
[0162] Assuming the test is successful, this allows subsequent requests to be processed. In the event the test is unsuccessful, the underlying RPA logic driving the search and/or redaction, can be amended, either manually or using machine learning, with the process being repeated until the test request is processed successfully. [0163] As part of this process, sample data from different data sources is typically analysed to update, including to construct, generate or maintain, a data source repository, which maintains industry specific data constructs relating to the different data sources. The data constructs can include any one or more of terminology, entities, attributes, relationships and rulesets, and can be used to analyse, combine and/or redact data from different data sources.
[0164] In one example, the data source repository links data associated with an individual in different data sources. For example, an individual for which data is stored in different data sources may be referred to in different ways, for example using different identifiers, or the like. Accordingly, the data source repository is used to create associations between the data in different data sources, for example by associating unique identifiers with each individual in the data source repository, so that data relating to a particular individual is linked even if identifiers used to identify the individual are different in the different data sources.
[0165] In one example, the processing devices receives a data request relating to an individual, and then performs a search of a network environment to thereby retrieve personal data based on an identity of the individual. The personal data is then automatically redacted, with an optional manual verification step to check the redaction process, with a search response being generated including redacted personal data.
[0166] To perform the searches, the processing device typically retrieves information regarding available data sources, identifies one or more available data sources relevant to the data request, for example using information from the data source repository, before performing one or more searches of the available data sources.
[0167] In one example, the processing device receives search results, analyses the search results using the data source repository and aggregates the search results using results of the analysis to create the personal data. Thus, the system can use the relationships defined in the data source repository, to resolve records relating to the same individuals that are contained in different data sources, allowing these records to be aggregated into a single set of search results.
[0168] In one example, the processing device is configured to perform pattern recognition to recognise patterns in the retrieved personal data and redact the personal data in accordance with the recognised patterns. The pattern recognition can be performed based on pre-defined pattern recognition templates, as well as industry and/or organisational specific templates, which might for example, be part of the data source repository. For example, this could be used to analyse results and identify particular alpha-numeric character sequences, representing identifiers associated with different individuals, or the like.
[0169] Similarly, the processing devices can be configured to perform natural language processing to understand a context of recognised patterns and then redact the personal data in accordance with the context.
[0170] The system can also be configured to redact the data using de -identification logic, for example using de-identification algorithms, such as anonymization, pseudonymization or k- anonymization algorithms, as well as industry and/or organisational specific rules, optionally stored within the one or more schemas. Redaction can also be performed using machine learning and RPA processes.
[0171] When a search is performed in the operational phase, the processing device can receive a data request relating to an individual and then validate the identity of the individual, for example by authenticating the individual, and thereby ensure the individual has permission to seek the requested information. Assuming the validation is successful, a search can then be performed. The validation could be achieved in any one of a number of ways, and this could include using a Know Your Client (KYC) procedure, biometric verification, or the like.
[0172] It will be appreciated that in the above the different aspects of the above process can be used independently, for example allowing network discovery to be performed independently of the search, but typically are also performed in conjunction to enable a network to be analysed and then searched, with results being redacted automatically as needed.
[0173] A specific example of a network architecture will now be described in more detail with reference to Figures 2 to 4.
[0174] In this example, the network architecture includes a plurality of processing systems 310, such as servers, and data sources, such as databases 240, which in use are coupled to a communications network 220, such as a Local Area Network (LAN), or Wide Area Network (WAN), within an organisation. A number of client devices 230 are provided, which may be used to access data stored in the databases 240. In this example, the network 220 is further connected to an external network 250, such as the Internet, which may include further servers 210 and client devices 230.
[0175] It will be appreciated that the configuration of the networks 220, 250 are for the purpose of example only, and in practice the client devices 230 and the processing systems 210 can communicate via any appropriate mechanism, such as via wired or wireless connections, including, but not limited to mobile networks, private networks, such as an 802.11 networks, the Internet, LANs, WANs, or the like, as well as via direct or point-to-point connections, such as Bluetooth, or the like.
[0176] Whilst the processing systems 210 are shown as a single entity, it will be appreciated that in practice the processing systems 210 can be distributed over a number of geographically separate locations, for example as part of a cloud-based environment. However, the above described arrangement is not essential and other suitable configurations could be used.
[0177] An example of a suitable processing system 210 is shown in Figure 3.
[0178] In this example, the processing system 210 includes at least one microprocessor 311, a memory 312, an optional input/output device 313, such as a keyboard and/or display, and an external interface 314, interconnected via a bus 315 as shown. In this example the external interface 314 can be utilised for connecting the processing system 210 to peripheral devices, such as the communications networks 220, 250, databases 240, other storage devices, or the like. Although a single external interface 314 is shown, this is for the purpose of example only, and in practice multiple interfaces using various methods (e.g. Ethernet, serial, USB, wireless or the like) may be provided.
[0179] In use, the microprocessor 311 executes instructions in the form of applications software stored in the memory 312 to allow the required processes to be performed. The applications software may include one or more software modules, and may be executed in a suitable execution environment, such as an operating system environment, or the like.
[0180] Accordingly, it will be appreciated that the processing system 210 may be formed from any suitable processing system, such as a suitably programmed client device, PC, web server, network server, or the like. In one particular example, the processing system 210 is a standard processing system such as an Intel Architecture based processing system, which executes software applications stored on non-volatile (e.g., hard disk) storage, although this is not essential. However, it will also be understood that the processing system could be any electronic processing device such as a microprocessor, microchip processor, logic gate configuration, firmware optionally associated with implementing logic such as an FPGA (Field Programmable Gate Array), or any other electronic device, system or arrangement.
[0181] As shown in Figure 4, in one example, the client device 230 includes at least one microprocessor 431, a memory 432, an input/output device 433, such as a keyboard and/or display, and an external interface 434, interconnected via a bus 435 as shown. In this example the external interface 434 can be utilised for connecting the client device 230 to the communications networks 220, 250, databases, other storage devices, or the like. Although a single external interface 434 is shown, this is for the purpose of example only, and in practice multiple interfaces using various methods (e.g. Ethernet, serial, USB, wireless or the like) may be provided.
[0182] In use, the microprocessor 431 executes instructions in the form of applications software stored in the memory 432 to allow for communication with the processing systems 210, as well as to allow user interaction for example through a suitable user interface.
[0183] Accordingly, it will be appreciated that the client devices 230 may be formed from any suitable processing system, such as a suitably programmed PC, Internet terminal, lap-top, or hand-held PC, and in one preferred example is either a tablet, or smart phone, or the like. Thus, in one example, the client device 230 is a standard processing system such as an Intel Architecture based processing system, which executes software applications stored on non volatile (e.g., hard disk) storage, although this is not essential. However, it will also be understood that the client devices 230 can be any electronic processing device such as a microprocessor, microchip processor, logic gate configuration, firmware optionally associated with implementing logic such as an FPGA (Field Programmable Gate Array), or any other electronic device, system or arrangement. [0184] For the purpose of the following examples, it is assumed that one or more processing systems 210 are servers, which communicate with the client devices 230 via a communications network, or the like, depending on the particular network infrastructure available. The servers 210 typically execute applications software for performing required tasks including storing, searching and processing of data, with actions performed by the servers 210 being performed by the processor 311 in accordance with instructions stored as applications software in the memory 312 and/or input commands received from a user via the I/O device 313, or commands received from the client device 230.
[0185] It will also be assumed that the user interacts with the client device 230 via a GUI (Graphical User Interface), or the like presented on a display of the client device 230, and in one particular example via a browser application that displays webpages, or an App that displays relevant information. Actions performed by the client devices 230 are performed by the processor 331 in accordance with instructions stored as applications software in the memory 332 and/or input commands received from a user via the I/O device 333.
[0186] However, it will be appreciated that the above described configuration assumed for the purpose of the following examples is not essential, and numerous other configurations may be used. It will also be appreciated that the partitioning of functionality between the client devices 230, and the servers 210 may vary, depending on the particular implementation.
[0187] An example of the configuration phase will now be described in more detail with reference to Figures 5A to 5C.
[0188] In this example, at step 500, a server 210 monitors network traffic, analysing this to at step 505 to identify data sources, with associated access permissions and credentials being determined at step 510. In practice, this process is typically performed using a proprietary software module “Data Trace” which uses a network traffic analyser to generate a topography of the data environment and gain access via the credentials using an Identity and Access Management Service, such as Azure Active Directory, or the like. Data Trace can be integrated with a support tool like ServiceNow to facilitate and automate the tasks required to prepare the network environment for Data Service Access Requests (DSAR) or Customer Data Request (CDR). This can also be used to give a clear picture of overall progress of search requests, as will be described in more detail below.
[0189] As data sources are identified, these are interrogated by the server 210 at step 515, allowing the server 210 to determine a data source configuration at step 520, including information such as a data source type, server/database name, IP address, version, or the like. At step 525, the server 210 access a repository of APIs and associated configuration files, and attempts to configure the API, to thereby access to the data source. This is performed to test the API at step 530, and hence update a status of the data source at step 535, depending on whether or not access is successful. If this process fails, the process is typically elevated for operator intervention.
[0190] Accordingly, the server 210 uses machine learning to automatically locate a matching API from a reference library, then attempt to configure and test the API and depending on the result, update the topography with a pass, or create a ticket and assign to a support resource. When the issue is resolved, then the reference API library is updated with a copy of the API and configuration so this can be reused in future.
[0191] Thus, as connections are identified, the server 210 implements a machine learning enabled algorithm, which attempts to configure an API (drawing from a proprietary Industry Reference Library (IRL)) and depending on the result, the topography is updated in real time with a pass, or in case of failure a support ticket is created for a resource to resolve, the result of which is used to enrich the IRL machine learning logic as well as give an overview of the progress of the DSAR/CDR preparation.
[0192] As the topography and Data Trace processes of steps 500 to 535 are designed to be run continuously, when breakages and errors are detected, the system will attempt to repair the issue and if unable create a task to be followed up by a support resource.
[0193] Concurrently with this process, a network visualisation can be generated and displayed by the server 210 at steps 540 and 545. This is performed by the Data Trace toolset, and can be drawn and updated in real time, including information derived from the discovery process. An example visualisation is shown in Figure 6. [0194] In this example, the visualisation is presented in a window 610 of a user interface, and includes graphical elements in the form of icons 611 representing network hardware and/or data sources, with connections 612 showing communication link, and status indicators 613 showing a DSAR CDR status of respective hardware, data sources and communication links. The Topography also contains a context sensitive integration to a support task management module, so as you click on individual network items, the related tasks are displayed in a details window 614.
[0195] Thus, in one example, the framework is integrated with the ServiceNow application and the visualisation is context sensitive, meaning a user can click on a specific area of the data environment and see and interact with any related cases and resolve them. For example, in the wireframe above the X-ray data server has been highlighted, and with it, all cases related to the status of that node/server have been extrapolated from the support application and presented. Case updates can update the status of the data environment in dynamically as well as adjust the Data Readiness Status giving stakeholder an understanding of overall progress.
[0196] Once the discovery process is complete, testing is performed to ensure personal data can be retrieved and redacted as required so that DSAR CDR requests can be processed. Accordingly, at step 550 sample data is gathered by the server 210, and used to generate a data source repository at step 555. In this regard, as personal data is expected to be found across various data sources, each organisation will have their own custom personal data construct in the form of a data source repository which builds a schema that correlates all known patient identifiers and updates the schema as new ones are identified, thereby reducing the chances that personal data will be missed due to dissimilar personal identifiers and improving search times and real time interoperability between systems. These constructs can also be aggregated across organisations to create a unified industry template.
[0197] A test data request is then generated at step 560, with this being used by the server 210 to perform a search at step 565, and thereby retrieve data relating to an individual identified in the search request. When the process has compiled the results, an Industry Business Rules (IBR) Engine automatically removes, redacts, and recalls information that is either not relevant to the customer or does not require disclosure at step 570. This rules engine uses RPA techniques operates on Machine Learning based on the outcomes of other queries performed across multiple organisations, to ensure the redaction process is performed in accordance with current legislation and regulation, taking changes into account in real time.
[0198] The results are compared to equivalent results generated using a manual process at step 575. This is used to evaluate the results of the search and redaction process, with the RPA logic being revised at step 580 if required.
[0199] Thus, the testing process uses empirically significant randomised selection, so that a personal data request is automatically generated for each sample, the same data is then cross validated by performing a manual audit and the results of both are compared and oversights and adjustments are made based on the result. Thus, this approach uses RPA to effectively train bots how to emulate an initially mostly manual process, with the bots collecting data from known locations on the network, by submitting database queries, scraping screens, or the like. Furthermore, this process is repeated over hundreds of requests and dozens of organisations, so that the algorithm becomes enriched and intelligent enough to become more efficient and reliable than a human user the more requests it handles. Once the process is sufficiently configured, DataBench awards the client with a certificate and ID and added to the central registry.
[0200] Once the configuration phase is complete, DSAR/ CDR requests can be processed, and an example of this process will now be described with reference to Figure 7.
[0201] In this example, at step 700 a data request is received from a client, with this typically being submitted via a client device 230. The data request generally includes or is submitted in conjunction with identifying information, such as KYC details, which are used by the server 210 to validate the identity of the client at step 705. The server 210 can optionally confirm receipt of the request and successful validation at step 710, for example by returning a message to the client via the client device 230.
[0202] At step 715, a search is performed, and the results redacted at step 720, in a manner similar to that described above. The redacted data is then used to generate a search response at step 725, which may optionally undergo manual screening at step 730, before being returned to the client. [0203] Accordingly, the above described process uses a discovery process to generate a network topography, and secure access to data sources within the network. Searches can then be performed using a rules engine, using RPA developed using machine learning, to automatically process search requests, and redact search results, allowing DSAR/CDR requests to be automatically processed without any human involvement. Furthermore, as the system can adapt dynamically, this leads to increases in accuracy over time, whilst ensuring the system can be maintained in compliance with necessary legal and business requirements.
[0204] Further details of each of the above processes will now be described.
[0205] In one example, the system uses a data trace module to continuously scan an organisation’s infrastructure, and as data sources are identified, match and gain access to them via the credentials using an IDM (Identity and Access Management Service) and use RPA (Robotic Process Automation) to automatically locate a matching API from a reference library, then attempt to configure and test the API, update the Data Landscape with the result, and if necessary, create a ticket and assign to a support resource.
[0206] An example of the network components for the data trace module are shown in Figure 8 A, with a schematic of the operation of the data trace module being shown in Figure 8B.
[0207] In this example, the data trace module 801 is coupled to a data bank 802, having a knowledge base 802.1 and API configuration 802.2, and a data landscape module 803, described in more detail below. These components are implemented by the server 210. The data trace module is connected to an IT service management (ITSM) service 804, an IDM service 805, and one or more data source APIs 806, that provide access to data stores 807. The data trace module 801 also accesses network logs 808, generated by network devices 809.
[0208] In use, the data trace module implements eight steps, including: (1) Searching the data environment; (2) Server/resource identification; (3) Obtaining API & permissions, (4) API matching and configuration; (5) Matching IDM credentials; (6) Testing API(s); (7) Creating a Ticket if required; and (8) Updating the knowledge base.
[0209] Operation of the data trace module will now be described in more detail below with reference to Figure 8C. [0210] In this example, at step 851, the data trace module connects to one or more network logs, and scans the network logs to identify data sources within the network environment of the organisation.
[0211] At step 852, the data trace module connects to one or more identified data sources and/or a configuration management database (CMDB) infrastructure directory to extract key details regarding the data source, such as the IP address, server name, operating system, software versions, or the like. An example of this process is described in more detail below with reference to Figure 8D.
[0212] At step 853, the data trace module access the data bank 802 and attempts to find a matching API connector and configuration information for the respective data source. In this regard, the data bank acts as a repository of API's, which is populated over time as access to data sources is required. Thus, the first time a data source is accessed, if an API is not available, this can be created and added to the data bank, so that in future access to similar data sources can be achieved by retrieving the API. Assuming such a connector and configuration information is available, then at step 854, this is returned by the data bank, allowing the data trace module to prepare the API.
[0213] At step 855, the data trace module uses the organisation's identity management system to find and retrieve credentials for the respective data source, based on the API connector details. At step 856, the data trace module sends a request to the data source using the API connector, to test connectivity and validate the credentials.
[0214] If the data trace module is unable to access the data source for any reason, for example if an API or configuration information is not present in the data bank, or if the credentials are invalid, the data trace module logs an incident with the organisations IT service management (ITSM) service 804. This allows the ITSM to resolve the issue, for example by generating an API, which can then be stored in the data bank, and/or providing alternative credentials to access the data source.
[0215] The data trace module then updates the data landscape, storing details of the data sources, and its current status, for example, if the data source is accessible at step 858. [0216] An example of the process for acquiring data source details will now be described in more detail with reference to Figure 8D.
[0217] In this example, at step 861, the data trace module initially uses one or more organisation directories, such as a CMDB to find details of the data source. Following this, the data trace module uses one or more network modules to interrogate the data source to discover details at step 862. Then the data trace module uses one or more platform specific APIs, such as Amazon Web Services (AWS) or Azure, to interrogate the data source to discover the details. Thus, three different approaches are used in combination in order to attempt to discover all the necessary data source details. In the event that the data source details cannot be discovered, this can then be logged at step 864, with the ITSM, allowing this issue to be resolved.
[0218] An example of the data bank will now be described in further detail, with reference to Figures 9A and 9B.
[0219] In this example, in Figure 9A, the data bank 902 is connected to the data trace module 901 within an organisation, as well as data trace modules 913 in other organisations. Each data trace module 901, 913 is connected to a network 911, 914 within the respective organisation, whilst the data bank 902 is connected to a network 912 of an entity responsible for management of the data discovery and retrieval process.
[0220] As shown in Figure 9B, the data bank includes an API 902.1 that acts as an interface to provide connectivity to the data trace module 901, and a reference library 902.2.
[0221] In use, the data bank is involved in seven steps, including: (1) receiving details of data sources from a data trace module; (2) retrieving matching APIs and configuration information; (3) Returning the API and configuration to the data trace module; (4) the data trace module raises an issue if an API or configuration is not available; (5) the ITSM resolve the issue and update the reference library 902.2; (6) the API and configuration are provided to the data trace module; (7) synchronisation is performed with external data trace modules.
[0222] An example of this will now be described in more detail with reference to Figure 9C. [0223] In this example, at step 951, the data trace module discovers one or more new data sources and makes requests to the data bank to retrieve matching API and configuration information, such as an endpoint protocol, an authentication mechanism, and data source schema.
[0224] At step 952, the data bank 902 searches the reference library 902.2 for a matching API and configuration information, returning these to the data trace module at step 953, assuming these are available, thereby allowing the data trace module to update the data landscape, as described above.
[0225] At step 954, when there is no matching API and/or configuration information, then an issue is raised with a supporting resource, such as ITSM, allowing an API and/or configuration information to be manually generated and/or sourced externally as required, with this being fed back into the data bank at step 955.
[0226] Afterthe issue has been resolved, at step 956 the data trace module can be updated with the API and configuration information, for example by having this pushed to the data trace module by the data bank and/or by having the data trace module submit a further request.
[0227] At this point, data trace modules 913 at other organisations can access the newly sourced API and/or configuration information, thereby allowing the API / configuration to be utilised across multiple organisations.
[0228] Thus, as new data sources are uncovered, the data trace module can call an API library that contains all known API’s and Configurations which the module will attempt to configure a valid connection and update the data landscape with the result and/or create a support ticket where it was unsuccessful. This library is continuously enriched as a central repository as new data entities are encountered in other organisations.
[0229] An example of the data landscape module, which can be used to generate a visual representation of the network environment, will now be described with reference to Figure 10A. [0230] In this example, the data landscape module 1003, is connected via an API 1003.1 acting as an interface, to the data trace module 1001, which are in turn connected to one or more data sources 1007. The data landscape module also communicates with the ITSM 1004. The data landscape module also includes a data store 1003.2, such as a database, and stores a topography 1003.3 of the network.
[0231] An example of an automated landscape generation process will now be described with reference to Figure 10B.
[0232] In this example, as the data trace module 1001 discovers data sources and gathers connection details, it provides information to the data landscape module 1003. The landscape module uses the information to draw an organisation data topography, adding data sources to the topography as they are discovered, at step 1052.
[0233] At step 1053, an authorised user is able to access the data landscape, via a visualisation presented on a user interface, as shown for example in Figures 10E to 101.
[0234] Similar to the arrangement of Figure 6, the visualisation is typically presented in a window 1010 of a user interface, and includes graphical elements in the form of icons 1011 representing network hardware and/or data sources, with connections 1012 showing communication link, and status indicators 1013 showing a DSAR CDR status of respective hardware, data sources and communication links. The user interface also includes tabs 1015, allowing different additional information to be displayed.
[0235] For example, in Figure 10E a context tab is selected that displays information regarding the network environment as a whole, including a number of data sources, a number of issues and a general health indicator. In Figure 10F details of a number of data sources are shown in a data sources tab, allowing a user to easily view details of the data sources. In Figure 10G, an issues tab is used to display any issues that need resolving, whilst a selected issue is shown in a pop-up window 1016 in Figure 10H. In Figure 101, data requests are shown in a data request tab. [0236] At step 1054, the data landscape is dynamically updated as additional information is supplied from the data trace module, including adding updating or removing data sources, adjusting the status of the data sources, flagging issues, such as access problems or the like.
[0237] In addition to automatically updating the landscape, manual intervention can also be used as will now be described with reference to Figure IOC.
[0238] In this example, at step 1061, an authorized user accesses the visual representation of the landscape and reviews this to identify any data sources that are not present at step 1062. The user can then manually add data sources at step 1063, adding in details, such as access credentials, or the like, as needed. The landscape module then updates the visualization as needed, to reflect the update landscape.
[0239] In addition to adding in data sources, the data landscape module can also be used to perform resolution of issues, and an example of this will now be described with reference to Figure 10D.
[0240] In this example, at step 1071, a user access the landscape visualization to review the overall landscape. At step 1072, alerts and/or notifications are displayed, directing the user to any issues, such as a connection issues, missing data source details, ongoing consumer requests, or the like.
[0241] At step 1073, the user can explore each issue, viewing contextual information to assist the user in resolving the issue. At step 1074, related information can be sourced from the ITSM to provide additional context and issue tracking capabilities.
[0242] Issue can then be resolved from the landscape view at step 1075, with information being propagated to the ITSM and/or other system modules as needed.
[0243] Accordingly, this allows a visual representation of organizational data environment to be drawn dynamically and updated in real time which includes Data source Name, Access Status, Database Type (On Prem/Off Prem, Hybrid), linkages between the Data Sources and colour coded Consumer Data Request status (Green, Yellow, Red). The Topography also contains a context sensitive integration to a support task management module, so related tasks can be viewed and resolved as needed.
[0244] An example of the data source repository module, will now be described with reference to Figure 11A.
[0245] In this example, the data source repository module 1121, is connected via an API 1121.1 acting as an interface, to the data landscape module 1103, a data screen module 1122 and other modules 1123. The data source repository module 1121 also includes a data store 1121.2, such as a database, which stores the data construct and an update module 1121.3 that updates the data construct.
[0246] The data construct is an industry specific repository which gleans and cross-references personal identifiers, relationships, and entity maps from other organisations to augment the data trace modules ability to identify personal data and correlate it across separate data sources even though the personal identifiers are dissimilar.
[0247] An example of operation of the data source repository module, will now be described with reference to Figure 1 IB.
[0248] In this example, at step 1151 the data source repository is pre -configured with one or more schemas containing industry specific constructs of personal data. The data constructs include terminology, entities, attributes, relationships and rule sets, such as redaction and archival rulesets, relevant to the respective industry.
[0249] It will be appreciated that in practice, each schema and the associated constructs are developed by analysing data within a variety of different data sources to identify and map different fields within the data sources. For example, entities are typically identified using identifiers, but these identifiers might be different within different data sources, or may be stored within different fields. For example, in some data sources, a person might be identified by their name, which might be stored in a single field, or across multiple fields, such as first, last and middle name fields. Conversely, in other data sources, an individual's identity might be associated with a separate identifier, such as a social security number. Accordingly, the data source repository is developed by analysing sample records from multiple data sources, and comparing these to identify relationships between the different records. The data source repository can also provide context to data within the data sources, for example, identifying types of information that are private or confidential.
[0250] It will be appreciated from this that the data source repository can then be used to assist with processing data from the different data sources, for example, allowing records from different data sources to be combined or consolidated, allowing data to be screened, or the like.
[0251] At step 1152, the one or more modules 1123 retrieve data source repository information from the repository 1121.2 via the API 1121.1, with the information being filtered by the API to ensure it is relevant to the modules.
[0252] At step 1153, the modules 1123 use the information when processing data, for example when identifying relationships between records, when redacting data, or the like.
[0253] The modules 1123 are also adapted to provide feedback to the data source repository 1121 via the API 1121.1 when gaps are identified in the data constructs. This may require manual intervention, via the ITSM or similar, for example to create a mapping, relationship or provide context or other information, allowing the feedback to be used to update the constructs at step 1155, and thereby ensure these are up to date.
[0254] An example of the data screen module, will now be described with reference to Figure 12A.
[0255] In this example, the data screen module 1122 receives personal data 1224 as part of search results, and then operates to redact the data, typically based on rules stored in the data construct, providing redacted data 1225 as an output. In one example, the data is also verified by an operator 1226, with results of the verification being fed back to the data screen module to allow the redaction process to be optimised. This allows the data screen module to use an RPA supported process to leam from the organisation which data should be omitted, redacted or retained and then automating this process so that redactions, removals, and omissions are done based on industry and organisation specific rules reducing the necessity for human involvement leading to increases in accuracy and efficiency overtime. [0256] To achieve this, as shown in Figure 12B, the data screen module includes pattern recognition process 1222.1, which uses business rules 1222.2. In this regard, the data screen module comes pre-loaded with a standard set of pattern recognisers, which are developed over time, providing organisations with an enhanced base set. Organisations can also provide their own custom patterns based on their industry and organisational knowledge, by providing regular expression syntax. Custom regular expression patterns can be tested and verified through the data screen testing and verification process.
[0257] After using pattern recognition, the data screen module implements a language context process 1222.3, which employs a natural language processor 1222.4 to further understand the context of the recognised patterns. The context is used to provide a greater degree of confidence in the recognition of the identified data item. For example, “my name is Mark” provides a greater degree of confidence in the identification of a person’s name of “Mark” compared to “the item was returned due to a red mark on the cover”.
[0258] As patterns are recognised within the data, de-identification logic 1222.5 is enacted to handle the required action, based on redaction definitions 1222.6. The action may vary based on industry and organisation specific rules, which can be customised, tested and verified.
[0259] The output 1225 of the data screen process is verified by organisational staff 1226, comparing the original dataset with the redacted copy. Refinements are fed back into the Data Screen business rules engine and redaction logic. Once a sufficient amount of data has been processed by Data Screen and the results become more in line with expectations, the need for the manual verification process will reduce and be replaced with a fully automated, unattended process.
[0260] An example of the process will now be described in further detail with reference to Figure 12C.
[0261] In this example, at step 1251, personal data is provided to the data screen module for processing. The data can be in a wide range of formats, including for example, documents, JSON, XML, text and images. [0262] At step 1252, the data screen module uses industry specific and organisational specific logic and business rules, including rules defined within the data construct, to identify personal data that should be redacted.
[0263] At step 1253, the output is validated by an operator, allowing the operator to manually review the redacted data and confirm there is no personal information that breaches industry specific privacy rules or laws.
[0264] If anomalies are encountered, then at step 1254, the issues are resolved, with feedback being used to update the relevant rules as needed using RPA. In this manner, the data screen module continues to improve its screening logic based on the outcome of previous processes at step 1255, thereby reducing the need for manual intervention and ultimately leading to an entirely automated approach.
[0265] An example of the data request module, which uses progressive RPA to automate a data retrieval request will now be described with reference to Figure 13 A.
[0266] In this example, the data request module 1331 is connected to the data landscape module 1303, the data source repository module 1321 and the data screen module 1322, as well as a search module 1334, which is in turn connected to data sources 1307 via API connectors 1306. In use a search request received from a user 1332 can be processed, performed, and redacted, with optional manual review by an operator 1333.
[0267] The search process will now be described in more detail with reference to Figure 13B.
[0268] In this example, at step 1351 the data request module 1331 receives a request containing verified consumer identification information. The request module 1331 uses the data request to query the landscape module to identify one or more available data sources containing data relevant to the data request at step 1352. The search module 1334 is then used to perform a series of searches of the data sources, to thereby retrieve search results at step 1353. The search results are aggregated into a data package of personal data using the data construct, for example using the schemas to identify relationships between different search results, so that the results can be consolidated. [0269] At step 1355, the resulting personal data is sent to the data screen module for further processing, in particular to allow the data screen module to perform recognition and redaction of the personal data at step 1356, using the process described above.
[0270] At step 1357, results of the search are provided for verification by an operator 1333, that is typically a member of the organisation hosting the data. At step 1358, the operator verifies the results, and/or provides feedback, which is then used to further train the data source repository, data screen or data request modules. This allows the algorithms used to be trained over time so that RPA can be used to perform the searches in a substantially automated fashion.
[0271] An example of the training process will now be described with reference to Figure 13C.
[0272] In this example, at step 1361 the data request module 1331 receives a data package containing test data covering one or more industry scenarios. At step 1362 the data request module 1331 sends the data package to the data screen module 1322, allowing the data screen module 1322 to perform recognition and redaction of the data package at step 1363, using the process described above.
[0273] The redacted data is returned to the data request module 1331, allowing results to be presented to an operative 1333, such as a member of organisational staff, at step 1364. The operative reviews the results and either verifies these are suitably redacted and/or provides feedback, such as redacting further data, providing annotations or similar at step 1365. This feedback is then used to further train the data request module 1331 and/or data screen module 1322, thereby improve the process for performing a search and redacting search results.
[0274] An example of a data access request process will now be described in further detail with reference to Figure 13D.
[0275] In this example, a consumer 1322 submits a request to data access request, also referred to as a Data Subject Access Request (DSAR), through an appropriate channel, such as email, phone, direct message, or the like, at step 1371. The data access request is added to a request queue at step 1372, together with any captured information, such as details of the requestor, for further processing. [0276] At step 1373, the data request module 1331 monitors the request queue and triggers when a new request is added to the queue. This causes the data request module 1331 to validates the queue entry, and assess the completeness of the request at step 1374. Specifically the data request module 1331 will review the data access request to ensure the request contains sufficient information to allow the search to be performed, including any necessary details of the requestor, and sufficient information to identify the subjects of the search. The data request module 1331 will also ensure that the organisations data landscape is ready for the search to be performed, for example ensuring the landscape module 1303 has identified relevant data sources, or the like. If issues are encountered with a request, a notification is sent to an operative, such as an administrator, allowing them to resolve the issue at step 1375.
[0277] Otherwise, at step 1376, the data request module invokes the request algorithm, to perform the search as described above with respect to Figure 13B. At step 1377, results are returned for verification. In this regard, verification is performed to industry and organisation standards, with verification being attempted automatically using a model trained by machine learning. In the event that the verification cannot be performed automatically, this is manually performed at step 1378, for example, having results reviewed by an operative, with any feedback being used for further training as described above with respect to Figure 13C. Once the verification is completed, results are returned to the requestor via a nominated channel at step 1379.
[0278] Accordingly, the above approach facilitates the request via a series of independent enterprise searches, aided by the personal data construct with the finalised data package validated utilising industry/organisation business data rules through to delivery is unique, in that it is combining technological processes enhanced and improved continuously through human and machine learning intervention.
[0279] Accordingly, the above described system provides a DSAR/CDR process broken into 4 main stages, including data trace and test and certify processes performed during a configuration phase, and business as usual and outsource processes being performed as part of the operational phase. [0280] 1. Data Trace: The activities and resources required to clearly map the data sources where personal data is stored, the relationships between and current access levels to them. This is used to give an overview of the project and depict the data landscape status at any given time.
[0281] 2. Test & Certify: Immediately following Data Trace, a series of manual and automated tests are performed to simulate genuine personal data requests where the results are audited and assessed, and adjustments made to give an organisation confidence in their ability to respond to these requests and fulfil data standard CDR obligations.
[0282] 3. DSAR/CDRBAU: This process covers the BAU “business as usual” activities related to receiving and responding to a DSAR/CDR request, from customer verification, using RPA (Robotic Process Automation) enabled business rules to perform redaction and editing of the request, to automatically notifying the customer as to the status of their request.
[0283] 4. Outsource: As the DSAR/CDR request process and supporting technology was designed with portability in mind, this activity can be easily assigned or outsourced to a managed service provider to maintain and facilitate these requests for the organisation while still maintaining full transparency.
[0284] Throughout this specification and claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated integer or group of integers or steps but not the exclusion of any other integer or group of integers. As used herein and unless otherwise stated, the term "approximately" means ±20%.
[0285] Persons skilled in the art will appreciate that numerous variations and modifications will become apparent. All such variations and modifications which become apparent to persons skilled in the art, should be considered to fall within the spirit and scope that the invention broadly appearing before described.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS: 1) A data management system for automatically providing personal data from a network environment in response to a data request, the system including, one or more processing devices configured to: a) utilise a discovery process to generate a network topography indicative of data sources within the network environment; b) implement one or more Application Programming Interfaces (APIs) to access the data sources; c) access a data source repository maintaining industry specific data constructs relating to the data sources; d) receive a data request relating to an individual; e) perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, f) generate a search response including redacted personal data. 2) A data management system according to claim 1, wherein the one or more processing devices are configured to automatically perform redaction of the retrieved personal data. 3) A data management system according to claim 1 or claim 2, wherein the one or more processing devices are configured to use Robotic Process Automation (RPA) to at least one of: a) perform the discovery process; b) implement the API; c) perform the search; d) automatically redact personal data; and, e) validate search results. 4) A data management system according to claim 3, wherein the one or more processing devices are configured to perform RPA in accordance with RPA logic, and wherein the RPA logic is at least one of: a) specific to at least one of: i) an industry; ii) business requirements; and, iii) compliance rules; b) retrieved from an RPA logic repository; c) generated by the processing device; d) generated by the processing device and stored in an RPA repository for subsequent reuse; and, e) generated by the processing device using machine learning. 5) A data management system according to any one of the claims 1 to 4, wherein the one or more processing devices are configured to perform the discovery process by: a) monitoring network traffic in the network environment; and, b) analysing the network traffic to identify the data sources. 6) A data management system according to claim 5, wherein the network traffic is analysed using at least one of: a) traffic analytics; and, b) Azure traffic analytics. 7) A data management system according to claim 5 or claim 6, wherein the one or more processing devices are configured to: a) connect to one or more network logs; and, b) scan the network logs to identify data sources within the network environment. 8) A data management system according to any one of the claims 1 to 7, wherein the one or more processing devices are configured to perform the discovery process using an identity and access management service to determine access levels and credentials associated with the data sources. 9) A data management system according to any one of the claims 1 to 8, wherein the one or more processing devices are configured to: a) generate a network visualisation indicative of the network topography; and, b) display the network visualisation to a user. 10) A data management system according to claim 9, wherein the network visualisation includes graphical elements indicative of: a) data sources; b) network hardware; and c) connections indicative of communication links between the data sources and network hardware. 11) A data management system according to claim 10, wherein each graphical element includes a status indicator indicative of a status of a respective data source, network hardware or connection. 12) A data management system according to claim 10 or claim 11, wherein the one or more processing devices are configured to: a) detect user selection of graphical element; b) retrieve additional information regarding the respective data source, network hardware or connection; and, c) display the additional information. 13) A data management system according to any of the claims 1 to 12, wherein the one or more processing devices are configured to determine data source details by interrogating at least one of: a) a data sources; and, b) a configuration management database. 14)A data management system according to claim 13, wherein the one or more processing devices are configured to use the data source details to retrieve at least one of: a) an API; and b) configuration information. 15) A data management system according to claim 14, wherein the configuration information includes at least one of: a) an endpoint protocol; b) an authentication mechanism; and c) data source schema. 16) A data management system according to any one of the claims 13 to 15, wherein the data source details include at least one of: a) a server name; b) a database name; c) an IP address; d) a vendor name; and, e) a data source and/or software version. 17)A data management system according to claim 14, wherein the one or more processing devices are configured to select an API from an API repository, the API repository hosting APIs and configuration information for multiple different data sources. 18) A data management system according to claim 17, wherein the one or more processing devices are configured to select an API at least in part using at least one of: a) machine learning; and, b) Robotic Process Automation (RPA) algorithms. 19) A data management system according to any one of the claims 1 to 18, wherein the one or more processing devices are configured to: a) retrieve credentials; and, b) access the data source using the API and the credentials. 20) A data management system according to any one of the claims 1 to 19, wherein the one or more processing devices are configured to: a) select sample data; b) generate one or more test data requests relating to the sample data; c) perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on the test data request; d) automatically perform redaction of the retrieved personal data; and e) generate a test result including redacted personal data, wherein the test result is manually audited to confirm the personal data is redacted as required. 21) A data management system according to any one of the claims 1 to 20, wherein the one or more processing devices are configured to: a) analyse data stored in different data sources within a network environment; and, b) update a data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources. 22) A data management system according to claim 21, wherein the data constructs include at least one of: a) terminology; b) entities; c) attributes; d) relationships; and, e) rulesets. 23) A data management system according to claim 21 or claim 22, wherein the data constructs are usable in at least one of: a) analysing data from different data sources; b) combining data from different data sources; and, c) redacting data from the data sources. 24) A data management system according to any one of the claims 1 to 23, wherein the one or more processing devices are configured to: a) select sample data; and, b) analyse sample data to generate the data source repository, wherein the data source repository links to data associated with an individual in different data sources. 25) A data management system according to claim 24, wherein the one or more processing devices are configured to associate unique identifiers with each individual in the data source repository. 26) A data management system according to any one of the claims 1 to 25, wherein the one or more processing devices are configured to: a) receive a data request relating to an individual; b) perform a search of a network environment to thereby retrieve personal data based on an identity of the individual; c) automatically perform redaction of the retrieved personal data; and, d) generate a search response including redacted personal data. 27) A data management system according to claim 26, wherein the one or more processing devices are configured to: a) retrieve information regarding available data sources; b) identify one or more available data sources relevant to the data request; and, c) perform one or more searches of the available data sources. 28) A data management system according to claim 26 or claim 27, wherein the one or more processing devices are configured to: a) receive search results; b) analyse the search results using a data source repository that maintains industry specific data constructs relating to the data in the different data sources; and, c) aggregate the search results using results of the analysis to create the personal data. 29) A data management system according to any one of the claims 26 to 28, wherein the one or more processing devices are configured to: a) perform pattern recognition to recognise patterns in the retrieved personal data; and, b) redact the personal data in accordance with the recognised patterns. 30)A data management system according to claim 29, wherein the one or more processing devices are configured to: a) perform natural language processing to understand a context of recognised patterns; and, b) redact the personal data in accordance with the context. 31) A data management system according to any one of the claims 26 to 30, wherein the one or more processing devices are configured to redact the data using de -identification logic. 32) A data management system according to any one of the claims 1 to 31, wherein the one or more processing devices are configured to: a) receive a data request relating to an individual; b) validate the identity of the individual; and, c) perform the search in response to a successful validation. 33)A data management system according to claim 32, wherein the one or more processing devices are configured to validate the identity of the individual using a Know Your Client (KYC) procedure. 34)A data management system according to any one of the claims 1 to 33, wherein the redaction process is performed at least in part using at least one of: a) machine learning; and, b) Robotic Process Automation (RPA) algorithms. 35) A data management method for automatically providing personal data from a network environment in response to a data request, the method including, in one or more processing devices: a) utilising a discovery process to generate a network topography indicative of data sources within the network environment; b) implementing one or more Application Programming Interfaces (APIs) to access the data sources; c) generating a data source repository indicative of locations of personal data within the data sources; d) receiving a data request relating to an individual; e) performing a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; and, f) generating a search response including redacted personal data. 36)A data management system including one or more processing devices configured to perform a discovery process to generate a network topography indicative of data sources within a network environment by: a) monitoring network traffic in the network environment; b) analysing the network traffic to identify data sources within the network environment; and, c) using an identity and access management service to determine access levels and credentials associated with the data sources. 37) A data management system according to claim 36, wherein the network traffic is analysed using at least one of: a) traffic analytics; and, b) Azure traffic analytics. 38)A data management system according to claim 36 or claim 37, wherein the one or more processing devices are configured to: a) connect to one or more network logs; and, b) scan the network logs to identify data sources within the network environment. 39) A data management system according to any one of the claims 36 to 38, wherein the Identity and Access Management Service is an Active Directory. 40) A data management system according to any one of the claims 36 to 39, wherein the one or more processing devices are configured to determine data source details by interrogating at least one of: a) a data sources; and, b) a configuration management database. 41)A data management system according to claim 40, wherein the one or more processing devices are configured to use the data source details to retrieve at least one of: a) an API; and b) configuration information. 42) A data management system according to claim 41, wherein the configuration information includes at least one of: a) an endpoint protocol; b) an authentication mechanism; and c) data source schema. 43) A data management system according to any one of the claims 40 to 42, wherein the data source details include at least one of: a) a server name; b) a database name; c) an IP address; d) a vendor name; and, e) a data source and/or software version. 44)A data management system according to any one of the claims 36 to 43, wherein the one or more processing devices are configured to select an API from an API repository, the API repository hosting APIs and configuration information for multiple different data sources. 45) A data management system according to claim 44, wherein the one or more processing devices are configured to select an API at least in part using at least one of: a) machine learning; and, b) Robotic Process Automation (RPA) algorithms. 46)A data management system according to any one of the claims 36 to 45, wherein the one or more processing devices are configured to: a) retrieve credentials; and, b) access the data source using the API and the credentials. 47) A data management method including, in one or more processing devices, performing a discovery process to generate a network topography indicative of data sources within a network environment by: a) monitoring network traffic in the network environment; b) analysing the network traffic to identify data sources within the network environment; and, c) using an identity and access management service to determine access levels and credentials associated with the data sources. 48) A data management system for use in accessing multiple data sources, the system including one or more processing devices configured to: a) determine data source details by interrogating at least one of: i) a data sources; and, ii) a configuration management database; and, b) implement one or more Application Programming Interfaces (APIs) to access the data sources by selecting an API from an API repository using the data source details, wherein the API repository hosts APIs and configuration information for multiple different data sources. 49) A data management system according to claim 48, wherein the configuration information includes at least one of: a) an endpoint protocol; b) an authentication mechanism; and c) data source schema. 50)A data management system according to claim 48 or claim 49, wherein the data source details include at least one of: a) a server name; b) a database name; c) an IP address; d) a vendor name; and, e) a data source and/or software version. 51) A data management system according to any one of the claims 48 to 50, wherein the one or more processing devices are configured to select an API at least in part using at least one of: a) machine learning; and, b) Robotic Process Automation (RPA) algorithms. 52) A data management system according to any one of the claims 48 to 51, wherein the one or more processing devices are configured to: a) retrieve credentials; and, b) access the data source using the API and the credentials. 53)A data management method for use in accessing multiple data sources, the method including, in one or more processing devices: a) determining data source details by interrogating at least one of: i) a data sources; and, ii) a configuration management database; and, b) implementing one or more Application Programming Interfaces (APIs) to access the data sources by selecting an API from an API repository using the data source details, wherein the API repository hosts APIs and configuration information for multiple different data sources. 54) A data management system for generating a network visualisation, the system including one or more processing devices configured to: a) perform a discovery process to generate a network topography indicative of data sources within a network environment by monitoring network traffic in the network environment; b) generate a network visualisation indicative of the network topography, the network visualisation including: i) graphical elements indicative of: (1) data sources; (2) network hardware; and (3) connections indicative of communication links between the data sources and network hardware; and, ii) a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection; and, c) display the network visualisation to a user. 55)A data management system according to claim 54, wherein the one or more processing devices are configured to: a) detect user selection of graphical element; b) retrieve additional information regarding the respective data source, network hardware or connection; and, c) display the additional information. 56)A data management system according to claim 54 or claim 55, wherein the one or more processing devices perform the discovery process using the method of any one of the claims 35 to 45. 57) A data management method for generating a network visualisation, the method including, in one or more processing devices: a) performing a discovery process to generate a network topography indicative of data sources within a network environment by monitoring network traffic in the network environment; b) generating a network visualisation indicative of the network topography, the network visualisation including: i) graphical elements indicative of: (1) data sources; (2) network hardware; and (3) connections indicative of communication links between the data sources and network hardware; and, ii) a status indicator for each graphical element, the status indicator being indicative of a status of a respective data source, network hardware or connection; and, c) displaying the network visualisation to a user. 58) A data management system for maintaining a data source repository relating to different data sources, the system including one or more processing devices configured to: a) analyse data stored in different data sources within a network environment; and, b) update the data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources. 59)A data management system according to claim 58, wherein the data constructs include at least one of: a) terminology; b) entities; c) attributes; d) relationships; and, e) rulesets. 60) A data management system according to claim 58 or claim 59, wherein the data constructs are usable in at least one of: a) analysing data from different data sources; b) combining data from different data sources; and, c) redacting data from the data sources. 61) A data management method for maintaining a data source repository relating to different data sources, the method including, in one or more processing devices: a) analysing data stored in different data sources within a network environment; and, b) updating the data source repository using results of the analysis, the data source repository maintaining industry specific data constructs relating to the data in the different data sources. 62) A data management system for searching data relating to an individual, the system including one or more processing devices configured to: a) receive a data request relating to an individual; b) perform a search of a network environment to thereby retrieve personal data based on an identity of the individual; c) automatically perform redaction of the retrieved personal data; and, d) generate a search response including redacted personal data. 63) A data management system according to claim 62, wherein the one or more processing devices are configured to: a) retrieve information regarding available data sources; b) identify one or more available data sources relevant to the data request; and, c) perform one or more searches of the available data sources. 64) A data management system according to claim 62 or claim 63, wherein the one or more processing devices are configured to: a) receive search results; b) analyse the search results using a data source repository that maintains industry specific data constructs relating to the data in the different data sources; and, c) aggregate the search results using results of the analysis to create the personal data. 65) A data management system according to any one of the claims 62 to 64, wherein the one or more processing devices are configured to: a) perform pattern recognition to recognise patterns in the retrieved personal data; and, b) redact the personal data in accordance with the recognised patterns. 66)A data management system according to claim 65, wherein the one or more processing devices are configured to: a) perform natural language processing to understand a context of recognised patterns; and, b) redact the personal data in accordance with the context. 67) A data management system according to any one of the claims 62 to 66, wherein the one or more processing devices are configured to redact the data using de -identification logic. 68) A data management system according to any one of the claims 62 to 67, wherein the one or more processing devices are configured to: a) receive a data request relating to an individual; b) validate the identity of the individual; and, c) perform the search in response to a successful validation. 69) A data management system according to claim 68, wherein the one or more processing devices are configured to validate the identity of the individual using a Know Your Client (KYC) procedure. 70) A data management system according to any one of the claims 62 to 69, wherein the redaction process is performed at least in part using at least one of: a) machine learning; and, b) Robotic Process Automation (RPA) algorithms. 71) A data management method for searching data relating to an individual, the method including, in one or more processing devices: a) receiving a data request relating to an individual; b) performing a search of a network environment to thereby retrieve personal data based on an identity of the individual; c) automatically performing redaction of the retrieved personal data; and, d) generating a search response including redacted personal data. 72) A data management system including one or more processing devices configured to: a) determine Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: i) specific to at least one of: (1) an industry; (2) business requirements; and, (3) compliance rules; ii) retrieved from an RPA logic repository; iii) generated by the processing device; iv) generated by the processing device and stored in an RPA repository for subsequent reuse; and, v) generated by the processing device using machine learning; and, b) use the RPA logic to perform RPA to at least one of: i) perform a discovery process to generate a network topography indicative of data sources within the network environment; ii) implement one or more Application Programming Interfaces (APIs) to access the data sources; iii) perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; iv) automatically redact personal data; and, v) validate the search results. 73) A data management method including, in one or more processing devices: a) determining Robotic Process Automation (RPA) logic wherein the RPA logic is at least one of: i) specific to at least one of:
(1) an industry;
(2) business requirements; and,
(3) compliance rules; ii) retrieved from an RPA logic repository; iii) generated by the processing device; iv) generated by the processing device and stored in an RPA repository for subsequent reuse; and, v) generated by the processing device using machine learning; and, b) using the RPA logic to perform RPA to at least one of: i) perform a discovery process to generate a network topography indicative of data sources within the network environment; ii) implement one or more Application Programming Interfaces (APIs) to access the data sources; iii) perform a search of the network environment using the data source repository and the APIs to thereby retrieve personal data based on an identity of the individual; iv) automatically redact personal data; and, v) validate the search results.
PCT/AU2022/050240 2021-03-19 2022-03-17 Data management WO2022192961A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2022236779A AU2022236779A1 (en) 2021-03-19 2022-03-17 Data management

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2021900804 2021-03-19
AU2021900804A AU2021900804A0 (en) 2021-03-19 Data management

Publications (1)

Publication Number Publication Date
WO2022192961A1 true WO2022192961A1 (en) 2022-09-22

Family

ID=83321853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2022/050240 WO2022192961A1 (en) 2021-03-19 2022-03-17 Data management

Country Status (2)

Country Link
AU (1) AU2022236779A1 (en)
WO (1) WO2022192961A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233365A1 (en) * 2002-04-12 2003-12-18 Metainformatics System and method for semantics driven data processing
US20050278307A1 (en) * 2004-06-01 2005-12-15 Microsoft Corporation Method, system, and apparatus for discovering and connecting to data sources
US7890626B1 (en) * 2008-09-11 2011-02-15 Gadir Omar M A High availability cluster server for enterprise data management
US20170093645A1 (en) * 2015-09-21 2017-03-30 Splunk Inc. Displaying Interactive Topology Maps Of Cloud Computing Resources
US20170220657A1 (en) * 2016-01-29 2017-08-03 M-Files Oy Method, an apparatus and a computer program for providing mobile access to a data repository
US20180219751A1 (en) * 2017-01-31 2018-08-02 Splunk Inc. Visualizing network activity involving networked computing devices distributed across network address spaces
US20190156958A1 (en) * 2017-11-23 2019-05-23 Siemens Healthcare Gmbh Healthcare network
US20190163679A1 (en) * 2017-11-29 2019-05-30 Omics Data Automation, Inc. System and method for integrating data for precision medicine

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233365A1 (en) * 2002-04-12 2003-12-18 Metainformatics System and method for semantics driven data processing
US20050278307A1 (en) * 2004-06-01 2005-12-15 Microsoft Corporation Method, system, and apparatus for discovering and connecting to data sources
US7890626B1 (en) * 2008-09-11 2011-02-15 Gadir Omar M A High availability cluster server for enterprise data management
US20170093645A1 (en) * 2015-09-21 2017-03-30 Splunk Inc. Displaying Interactive Topology Maps Of Cloud Computing Resources
US20170220657A1 (en) * 2016-01-29 2017-08-03 M-Files Oy Method, an apparatus and a computer program for providing mobile access to a data repository
US20180219751A1 (en) * 2017-01-31 2018-08-02 Splunk Inc. Visualizing network activity involving networked computing devices distributed across network address spaces
US20190156958A1 (en) * 2017-11-23 2019-05-23 Siemens Healthcare Gmbh Healthcare network
US20190163679A1 (en) * 2017-11-29 2019-05-30 Omics Data Automation, Inc. System and method for integrating data for precision medicine

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Maintain Privacy Protection Compliance with Intellidact AI™ Redaction!", CSI COMPUTING SYSTEM INNOVATIONS, 20 November 2017 (2017-11-20), XP055971108, Retrieved from the Internet <URL:http://csisoft.com/process-automation/redaction> [retrieved on 20221013] *
ANONYMOUS: "RPA Demystified", IDM IMAGE & DATA MANAGER, 8 March 2020 (2020-03-08), XP055971105, Retrieved from the Internet <URL:https://idm.net.au/files/IDM-PDF-archive/2018-1201.pdf> [retrieved on 20221013] *
MADAKAM SOMAYYA, HOLMUKHE RAJESH M., KUMAR JAISWAL DURGESH: "The Future Digital Work Force: Robotic Process Automation (RPA)", JOURNAL OF INFORMATION SYSTEMS AND TECHNOLOGY MANAGEMENT, vol. 16, pages 1 - 17, XP055971109, DOI: 10.4301/S1807-1775201916001 *
TEIXEIRA SILVA CATARINA RAQUEL, TEIXEIRA SILVA: "Implementation of a data virtualization layer applied to insurance data", DISSERTATION, FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO, 13 October 2016 (2016-10-13), XP055971102, Retrieved from the Internet <URL:https://core.ac.uk/download/pdf/302912394.pdf> [retrieved on 20221013] *
VAN DER LANS RICK F.: "Creating an Agile Data Integration Platform using Data Virtualization", R20 CONSULTANCY, 1 January 2014 (2014-01-01), XP055971103, Retrieved from the Internet <URL:https://stonebond.com/wp-content/uploads/2014/02/Rick-Van-Der-Lans-Whitepaper-May-2013.pdf> [retrieved on 20221013] *

Also Published As

Publication number Publication date
AU2022236779A1 (en) 2023-11-02

Similar Documents

Publication Publication Date Title
US11138336B2 (en) Data processing systems for generating and populating a data inventory
US10564936B2 (en) Data processing systems for identity validation of data subject access requests and related methods
US11036771B2 (en) Data processing systems for generating and populating a data inventory
US11240273B2 (en) Data processing and scanning systems for generating and populating a data inventory
US10438016B2 (en) Data processing systems for generating and populating a data inventory
US10437860B2 (en) Data processing systems for generating and populating a data inventory
AU2015267387B2 (en) Method and apparatus for automating the building of threat models for the public cloud
US10970675B2 (en) Data processing systems for generating and populating a data inventory
US11222309B2 (en) Data processing systems for generating and populating a data inventory
US20220129837A1 (en) Data processing systems for generating and populating a data inventory
AU2022236779A1 (en) Data management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770089

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: AU2022236779

Country of ref document: AU

Ref document number: 804747

Country of ref document: NZ

Ref document number: 2022236779

Country of ref document: AU

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022236779

Country of ref document: AU

Date of ref document: 20220317

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 22770089

Country of ref document: EP

Kind code of ref document: A1