GB2583363A - Data anonymization - Google Patents

Data anonymization Download PDF

Info

Publication number
GB2583363A
GB2583363A GB1905778.5A GB201905778A GB2583363A GB 2583363 A GB2583363 A GB 2583363A GB 201905778 A GB201905778 A GB 201905778A GB 2583363 A GB2583363 A GB 2583363A
Authority
GB
United Kingdom
Prior art keywords
data
accordance
level
information
bins
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1905778.5A
Other versions
GB201905778D0 (en
Inventor
Basil Harrold William
James Nickalls John
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telensa Holdings Ltd
Original Assignee
Telensa Holdings Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telensa Holdings Ltd filed Critical Telensa Holdings Ltd
Priority to GB1905778.5A priority Critical patent/GB2583363A/en
Publication of GB201905778D0 publication Critical patent/GB201905778D0/en
Publication of GB2583363A publication Critical patent/GB2583363A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Traffic Control Systems (AREA)

Abstract

A method for achieving a desired level of anonymity of data, wherein a plurality of data items are obtained, each data item comprising values for a plurality of parameters, at least one of which is recorded at a respective time of occurrence. The plurality of data items are anonymized to form an anonymized dataset, by defining bins having respective ranges for a plurality of said parameters, and specifically by defining bins having variable sized ranges for at least one of the parameters, wherein the sizes of said bins are based on a rate at which data items are recorded at the respective times of occurrence. Also disclosed is a method wherein it is determined whether the data will be visible to a user only in a secure environment intended to prevent the processing of said data with other data, or whether the data will be made available to the user without such limitation and applying a first or second level of anonymisation based on such wherein the second level of anonymisation provides a higher level of anonymisation. The plurality of data items may relate to vehicular or pedestrian traffic movements and may be origin-destination pairs.

Description

DATA ANONYMIZATION
This relates to data anonymization, and in particular to techniques for ensuring that any dataset, that is released from one party to another, has an appropriate level of anonymity.
A dataset is typically made up of many individual pieces of data. Even when a piece of data relating to a particular person appears to be anonymous, there is a danger that that piece of data can be associated with the particular person by combining it with other information.
For example, information might be gathered about the movement of vehicles through a town or city. The information might make it possible to say that a specific vehicle, having a licence plate that has been recorded by Automatic Number Plate Recognition (ANPR) cameras, has left a first specific point at a first specific time, and arrived at a second specific point at a second specific time. This data point can be rendered anonymous by, for example, removing the record of the vehicle licence plate. However, this anonymity can be impacted if datasets are combined. For example, if the destination of a regular morning journey is a school, it might be inferred that a passenger is a pupil, and incorporation of other datasets (e.g. school-related or from social media) might lead to the re-identification of the vehicle and/or its occupants.
It is known that an appropriate level of anonymity can be established by reporting the data only with appropriately coarse granularity. Thus, rather than report that a vehicle left a first specific point at a first specific time, and arrived at a second specific point at a second specific time, it would instead be reported that a vehicle left a first region during a first time window and arrived in a second region during a second time window. The sizes of the first and second regions and the durations of the first and second time windows are set in order to ensure the appropriate level of anonymity. The level of anonymity is measured by a value 'k'. Specifically, k-anonymity is a property of a dataset, and a dataset is said to be k-anonymous if every combination of identity-revealing characteristics occurs in at least k different records of the data set. For example, privacy may be considered to be adequately protected if a k value of 5 is maintained, that is, if a given record relates to a certain individual, then there are at least 4 other identical records relating to different individuals.
However, by guaranteeing a certain level of anonymity in all circumstances, the usefulness of the reported dataset is reduced.
According to an aspect of the invention, there is provided a method for achieving a desired level of anonymity of data, the method comprising: obtaining a plurality of data items, wherein each data item comprises values for a plurality of parameters, at least one of which is recorded at a respective time of occurrence; anonymizing the plurality of data items to form an anonymized dataset, by defining bins having respective ranges for a plurality of said parameters; and further comprising: defining bins having variable sized ranges for at least one of the parameters, wherein the sizes of said bins are based on a rate at which data items are recorded at the respective times of occurrence.
According to an aspect of the invention, there is provided a method for achieving a desired level of anonymity of data, the method comprising: determining whether the data will be visible to a user only in a secure environment intended to prevent the processing of the data with other data, or whether the data will be made available to the user without such limitation; and applying a first level of anonymization if the data will be visible to the user only in a secure environment; or applying a second level of anonymization if the data will be made available to the user without such limitation, wherein the second level of anonymization provides a higher level of anonymity than the first level of anonymization.
According to an aspect of the invention, there is provided a method for achieving a desired level of anonymity of data, the method comprising: obtaining a plurality of data items, wherein each data item comprises values for a plurality of first parameters; anonymizing the plurality of data items to form a first anonymized dataset, by defining bins having respective ranges for a plurality of said first parameters; determining that data relating to a second parameter should be added to the first anonymized dataset; and adding aggregated values for the second parameter to the first anonymized dataset to form a second anonymized dataset.
According to an aspect of the invention, there is provided a system, comprising: an input for receiving data; and a processor, wherein the processor is configured to perform a method in accordance with any of the previous aspects.
According to an aspect of the invention, there is provided a computer program product, comprising computer-readable code, configured for causing a suitably programmed processor to perform a method in accordance with any of the previous aspects.
This has the advantage that the usefulness of the reported dataset can be increased, even while the appropriate level of anonymity is maintained.
BRIEF DESCRIPTION OF DRAWINGS
For a better understanding of the invention, and to show how it may be put into effect, reference will now be made, by way of example, to the accompanying drawings, in 20 which: Figure 1 shows a use of a part of a system in accordance with the disclosure; Figure 2 shows a part of the system; Figure 3 illustrates a form of an edge device in accordance with one aspect; Figure 4 illustrates a form of a central device in accordance with one aspect; Figure 5 illustrates communication between two edge devices; Figure 6 is a schematic diagram of a system in accordance with the disclosure; Figure 7 illustrates the operation of a method in accordance with the disclosure; Figure 8 is a flow chart illustrating a method in accordance with the disclosure; Figure 9 is a flow chart illustrating a further method in accordance with the disclosure; and Figure 10 is a flow chart illustrating a further method in accordance with the disclosure.
DETAILED DESCRIPTION
Figure 1 shows a possible deployment of a system in accordance with the present disclosure. It will be appreciated that any suitable system may be used to generate technical data, in circumstances where it may be useful to release that data to a recipient, provided that a desired level of anonymity of the data can be maintained.
Specifically, Figure 1 shows an arrangement of streets 10, 12, 14, 16, 18 in an urban environment. For monitoring the environment, edge devices 20, 22, 24, 26, 28, 30 are provided in the illustrated area. The edge devices are provided at respective locations in an overall coverage area of the system. In some embodiments, the edge devices are provided at respective fixed locations. In other embodiments, some or all of the edge devices may be mounted on mobile platforms. For example, one or more edge device may be mounted on a vehicle.
The number of edge devices can be any convenient number for providing the intended coverage. In a typical urban environment, the number of edge devices may be in the tens, hundreds, or thousands.
In this illustrated example, some or all of the edge devices 20, 22, 24, 26, 28, 30 may be mounted on existing street furniture, for example street lamps or associated poles, brackets or similar, in order to avoid the need for providing additional separate mounting points.
Although the system will be described with reference to its use in monitoring an exterior urban environment, it may be used in interior environments, or in exterior environments away from streets etc. Figure 2 shows the functional relationships in the system, in which the edge devices, namely the edge devices 20, 22, 24, 26, 28 in Figure 2, have a functional connection to a central device 40.
As described in more detail below, each edge device has one or more sensor, which generates sensor data. The sensor data is processed, in a way which compresses it, to form information about entities within the environment. This compressed information can then be sent to the central device 40.
The edge devices and the central device are described in more detail below.
Thus, Figure 3 shows one of the edge devices 20 by way of illustration, though the other edge devices in the system may be similar.
The edge device 20 includes one or more sensor 50. For example, the sensor(s) 50 may include but are not limited to: Cameras; Radar or Lidar detectors; Environmental sensors; Pollution sensors; Occupancy or presence detectors; RFID detectors; Radio detectors; Acoustic sensors.
Each sensor 50 is configured to generate sensor data. For example: A camera may provide image data and/or video data of an overall scene in the area local to the camera, or may provide image data of a limited area or of specific features or objects in that area; its field of view may be fixed or it may be steerable and or zoomable under control of the edge device 20, the central device 40 or another device or user; Radar or Lidar detectors may provide data relating to the presence and/or movement of individual objects or assemblages of objects within their coverage area, or to the general level of the occupancy of, or movement within, part or all of their coverage area; Environmental sensors may provide data descriptive of weather conditions (for example temperature, wind, cloud cover, rainfall), light level, frost risk (for example by sensing (road or other) surface temperatures in the vicinity of the sensor), humidity or barometric pressure; Pollution sensors may provide data descriptive of air quality (for example particulate levels or levels of gases such as CO, CO2 or volatile organic compounds); Occupancy or presence sensors may provide data related to the presence of people or other objects in the vicinity of the sensor or relating to movement of the same, and technologies employed may for example include Passive Infra-Red (PIR) or microwave-based sensors; RFID detectors may provide data relating to the presence and or movement of objects carrying RFID tags in the vicinity of the detectors; Radio detectors may provide data relating to radio transmissions from nearby wireless transmitter devices; Acoustic sensors may for example record data representative of sound in the area of the sensor (with a frequency range similar to or greater or less than a typical human hearing range) or may be designed to detect specific acoustic artefacts or events such as a gunshot. With the use of multiple acoustic sensors then a vector to the source of the noise could also be included.
Each sensor 50 is connected to a memory 52 in the edge device. The sensor data generated by the or each sensor can therefore be stored in the memory 52. For example, the memory 52 may be a first-in, first-out type memory, storing the data generated during a rolling time window, for example of a few days. In some embodiments, all of the data generated by the sensor(s) 50 is stored by the memory 52. In other embodiments, only a fraction of the sensor data is stored. For example, in the case of image data, it may be sufficient to store only one image frame per second, for example. The image data may be stored together with an associated time stamp indicating a time at which the data was generated. In the case of video image data, this may be stored in a compressed format, for example according to MPEG H.264 or H.265 format.
In addition, the or each sensor 50 is connected to a processor 54 in the edge device.
The processor 54 is configured to manipulate the sensor data from the or each sensor to obtain information about one or more entities within the environment in the vicinity of the edge device. For example, when one of the sensors is an imaging device, the processor 54 is configured to manipulate image data from the imaging device to obtain information about objects that are visible in the image. Specifically, the processor 54 may be configured to manipulate image data from the imaging device to obtain information about the movement of objects that are visible in the image.
The processor 54 is connected to the memory 52, so that the obtained information about the one or more entity in the environment can be stored. The obtained information may be stored together with an associated time stamp indicating a time at which the information was generated and/or the time or times at which the sensor data was generated.
In particular, the memory 52 may be configured for storing information (for example the sensor data and the information about one or more entity in the environment) securely, for example in an encrypted form. Any suitable form of encryption may be used.
In addition, as shown in Figure 3, each edge device 20 includes a transceiver 56, for communication with one or more other devices. In particular, the transceiver 56 is configured for communicating with the central device 40. The transceiver 56 is connected to the sensor(s) 50, and/or the memory 52, and/or the processor 54. If the transceiver 56 is connected to the sensor(s) 50, it is able to transmit raw sensor data to the central device 40 immediately. If the transceiver 56 is connected to the memory 52, it is able to transmit stored data to the central device 40. As mentioned above, the data that is stored in the memory 52 may comprise raw sensor data, and information obtained by the processor 54 about one or more entity in the environment. If the transceiver 56 is connected to the processor 54, it is able to transmit the information about the entity or entities in the environment to the central device 40 immediately.
For example, the transceiver 56 may be configured for communicating with the central device 40 for example by a public mobile cellular network, a fixed broadband data network (e.g. ADSL or fibre) or a radio network deployed for this application or for other smart city applications.
In particular, the edge device 20 may be configured for transmitting information to the central device 40 in an encrypted form. Any suitable form of encryption may be used.
In addition, the edge device 20 may be configured for transmitting information (that is, for example, the sensor data and/or the information about one or more entity in the environment) to the central device together with an associated time stamp indicating a time at which the information was generated.
The edge device 20 is also configured for receiving information from the central device 40. For example, the edge device 20 may be configured for receiving from the central device 40 information regarding the processing to be performed by the processor 54.
Thus, the central device 40 may send instructions to the edge device, indicating what form of processing should be applied to the sensor data, in order to identify entities in the environment. Moreover, the instructions may indicate what form of processing the processor 54 should perform in order to detect specified attributes of the identified entities. In addition, the central device 40 may send instructions to the edge device 20, controlling which information about the one or more entity in the environment, including information about one or more specified attribute of the entity, should be transmitted to the central device.
In other embodiments, as described in more detail below, the transceiver 56 is additionally configured for communicating with other edge devices. Specifically, two edge devices may be configured for communicating with each other, for example, by means of a public mobile cellular network, a fixed broadband data network (e.g. ADSL or fibre) or a radio network deployed for this application or for other smart city applications or by a short-range radio technology such as Bluetooth or Wi-Fi.
Figure 4 shows a central device 40 by way of illustration. Although Figure 2 shows only one central device 40, any suitable number of central devices may be provided in the system depending on the required system capacity, and other central devices in the system may be similar to that shown in Figure 4. The or each central device 40 may for example be provided in a cloud computing environment.
The central device 40 includes a memory 70. Information received from the or each edge device can therefore be stored in the memory 70. For example, the memory 70 may be considerably larger than that provided in the respective edge devices, with the capacity to store several days, week, or months of data from each edge device that is connected to it.
The central device 40 also includes a processor 72. The processor 72 is configured to manipulate the data received from the or each edge device to obtain information about one or more entity in the environment in the overall coverage area.
For example, when one of the sensors in each of the edge devices is a respective imaging device, and hence the entities being considered by that edge device are objects that are visible in the image, the processor 54 in the edge device 20 is configured to manipulate image data from the imaging device to obtain information about attributes of those objects. The processor 72 may then be configured to manipulate the information received from multiple edge devices to obtain information about attributes of objects in the coverage area of multiple edge devices.
More specifically, when the processor 54 in each edge device 20 is configured to manipulate image data from the respective imaging device to obtain information about movement of objects that are visible in the image generated by that imaging device, the processor 72 in the central device 40 may be configured to obtain information about the movement of objects between locations where they are visible by different imaging devices.
The processor 72 is connected to the memory 70, so that the obtained information about the one or more entity in the environment can be stored.
In addition, as shown in Figure 4, the central device 40 includes a transceiver 74, for communication with other devices, and in particular for communicating with the edge devices 20. The transceiver 74 is connected to the memory 70, and/or the processor 72.
As mentioned above, the transceiver 74 may be configured for communicating with the edge device 20 for example by a public mobile cellular network, a fixed broadband data network (e.g. ADSL or fibre) or a radio network deployed for this application or for other smart city applications.
Figure 5 illustrates this. Specifically, Figure 5 illustrates a situation in which two edge devices 20, 22 are able to communicate with each other directly, over a communications link 80. For example, in some embodiments, two or more edge devices may be configured (for example by control information transmitted from the central device) to communicate information about entities that are detected, or about attributes of those entities.
A system as described above may be deployed in any environment that is to be monitored, for example an exterior urban environment, an interior environment, or an exterior environment away from streets etc. For example, a system may be deployed in a warehouse or similar industrial facility.
The sensors 50 may include imaging devices, but may additionally or alternatively include other sensors that are triggered by locating devices (such as RFID tags) on autonomous robotic or other devices within the warehouse.
The relevant entities in the environment may then relate to the location of the robotic or other devices within the warehouse, or other aspects of the operational state of such devices. In that case, the processing devices 54 may be configured to process the raw data from the sensor(s) 50 to obtain relevant information about the locations of the devices, or about the other aspects of the operational state.
As another example, a system may be deployed in a wind farm, where the sensors 50 may include devices for monitoring the state of the wind turbines, and devices for monitoring the environment (for example monitoring the wind speed and direction). In that case, the processing devices 54 may be configured to process the raw data from the sensor(s) 50 to obtain relevant information about attributes of the wind turbines and/or about their interaction with the wind conditions. For example, each turbine may be provided with sensors 50 which continuously report such conditions as temperature, vibration, and noise. The entity being considered may then be the turbine, and the respective processing device 54 may be configured to process the raw data from the sensors to obtain relevant information about attributes of the turbine, such as the state of wear of the bearings, the condition of the blades, etc. As another example, a system may be deployed with an edge device on each of multiple vehicles (for example autonomous vehicles), where the sensors 50 may include imaging devices or other sensors, and the processing devices 54 may be configured to process the raw data from the sensors to obtain information about attributes of the vehicle and its environment, such as the location and velocity of the vehicle on which the respective edge device is mounted, and information about the location, velocity and classification of other objects (such as vehicles, pedestrians, etc) in the vicinity of the vehicle, for the purposes of navigation and safety.
As a further example, as mentioned above, Figure 1 shows a system deployed in an urban environment, with streets 10, 12, 14, 16, 18. Edge devices 20, 22, 24, 26, 28, 30 are provided at respective locations (which in this example are fixed locations). Specifically, the edge devices 20, 22, 24, 26, 28, 30 may be mounted on street lights, and the exact locations of the edge devices may be known from data provided by a Global Navigation Satellite System (GNSS), for example the Global Positioning System (GPS). Providing each edge device with a GPS receiver, and requiring it to report its location to the central device, means that installation is simplified, because the edge device can report its exact location, without requiring any manual intervention.
The system can be scaled as desired, from a system with just a single edge device, to a system with a large number of edge devices, where Figure 1 shows a small part of a very much larger system. For example, the edge devices of the system may have contiguous or non-contiguous coverage areas, with an overall coverage area that covers a part of one street, a whole street, a neighbourhood, a borough, a city, or a country, or is world-wide.
A single central device may be connected to every edge device, or separate central devices may be connected to respective groups of edge devices. Moreover, where there are multiple central devices, these may also be connected in a hierarchical structure, with the data available in the lowest layer of central device being organised and merged into larger federated data sets at a higher level, for example on a citywide or national level.
In this illustrative example, the sensors 50 on each edge device include a video image sensor, which may for example be a standard video camera, or which may additionally be provided with Automatic Number Plate Recognition (ANPR) functionality. The sensors on each device also include a radar based object detection sensor, for example using millimetre wave spectrum (i.e. in the frequency range between about 30GHz and 300GHz). Thus, the data generated by the sensors 50 includes raw image data and raw radar data about objects that are in the field of view of those sensors.
The sensor package is such that it can accurately detect objects in 2D or 3D space at as high a sampling frequency rate as possible.
The processor 54 in each of the edge devices may then identify objects that are within the field of view. For example, returning to Figure 1, a vehicle 100 (for example, a car, truck, motorcycle, etc) can be identified in the field of view of the edge device 20. The identification may for example be performed by an artificial neural network (ANN), for example a convolutional neural network (CNN) or a deep neural network (DNN). The processor 54 is able to identify the vehicle 100 in images taken at different times, and is therefore able to track the vehicle as it moves over time. Further, additional data properties can be added over time. For example, when a moving object is first identified by a radar sensor, it may be possible to determine from its speed that it is a vehicle, but impossible to determine what type of vehicle it is. As it comes closer to the edge device, a camera may determine that it is a car, for example. Then, as the car comes closer still to the edge device, the Automatic Number Plate Recognition feature of the sensor may be able to read the number plate of the car.
Thus, the vehicle 100 is an entity in the environment around the edge device 20 and its position and movement are, amongst others, its attributes. The information about the entity and these attributes is therefore stored in the memory 52.
The processor 54 may similarly obtain information about other attributes of entities in the environment. As one example, the processor 54 may obtain information about the position and movement of pedestrians or other people, in the field of view of the sensors. As another example, the processor 54 may obtain information about the position of fixed objects such as benches and litter bins. The positions may be expressed in terms of co-ordinates (for example in the GPS system or in World Geodetic System 1984 (WGS84) projections).
The information that is stored in each edge device can be regarded as a "digital twin" of the environment itself. That is, the stored information acts as a digital representation of the physical street environment with the various tracked objects in it.
As described above, some or all of the raw data generated by the sensors 50, and the synthesized information about the entities in the environment, such as the movement of the objects that are detected by the sensors, is stored securely in the memory 52. However, the information is also made available under specified conditions to the central device 40. Restrictions on what data can be transmitted from an edge device to the central device may be imposed by policies around trust and privacy and relevant laws, as well as by any backhaul bandwidth limitations that may exist.
One advantage of deriving the information about the entities in the environment in the edge devices, and only transmitting that derived information to the central device 40, is that this requires very much less bandwidth than transmitting the entirety of the sensor data. This means that the cost of transmission is reduced, and the latency and quality of service are improved. Moreover, there is less reliance on the cloud computing environment when this is used, and in particular there is less reliance placed in the security of the cloud computing environment, because the raw sensor data will in general not be stored in the cloud. Moreover the system can still operate when the backhaul data link is interrupted, because the data can be transmitted when the link is restored.
If the information that is stored in each edge device is regarded as a "digital twin" of the environment around the respective edge device, acting as a digital representation of the physical street environment with the various tracked objects in it, then the information that is stored in the central device can be regarded as a higher layer, or federated, "digital twin" of the wider environment, acting as a digital representation of that wider environment and the various tracked objects.
As mentioned above, the edge device may be controlled from the central device 40. For example, the central device 40 may instruct each edge device to transmit information identifying each vehicle that it identifies, and information about the speed and direction of travel of that vehicle. Alternatively, the central device 40 may instruct each edge device to transmit information identifying each truck or van that it identifies, and information about the speed and direction of travel of each such vehicle.
These instructions may be varied or updated as required. For example, after the system has been operating for a period of time, transmitting information identifying each truck or van that it identifies, the central device 40 may instruct the edge device to start transmitting information identifying every vehicle that it identifies thereafter.
Moreover, because each edge device stores some or all of its raw sensor data, as well as the information that it derives about the entities in the environment, the central device may instruct one or more edge device to transmit to it either some or all of that stored data or information. For example, in a typical scenario, the central device may instruct one or more edge device to transmit to it the derived information about attributes of the entities as the information is generated. The central device may also or alternatively instruct one or more edge device to transmit to it some part of the stored sensor data, either immediately or when some criterion is met.
Further, the central device may instruct one or more edge device to perform some additional processing on the stored sensor data, in order to extract information about additional entities in the environment that were not originally considered when the data was generated.
For example, in a case where each edge device is configured to generate and transmit information about the movement of vehicles, a central device may send a request to one or more edge device to perform additional processing on stored data, which relates to a previous time period. For example, the edge device may be instructed to analyse the stored raw image data in order to identify the presence of people in the field of view generally, or to identify the presence of people with more specific characteristics, such as a person wearing red clothing, or a lone child, or person carrying a weapon, for example. When the central device instructs the edge device to repeat a search that it has performed previously, the edge device may search the data only starting from the last time the search was executed. This reduces the processing load on the edge device, and potentially also reduces the backhaul costs from the edge devices to the central device.
Thus, the storage of the detailed raw image data on each edge device means that it is possible to extract information about specific entities in the environment of the edge device, even when the specific entities are only identified at a time that is later than the time at which the data was obtained.
When a central device 40 requests one or more edge device to analyse the sensor data to obtain information about one or more specific entity in the environment, the central device can also specify the way in which it is notified about that entity.
For example, when the attribute being monitored is the movement of an entity such as a vehicle, the edge device may send the relevant information continuously, or only when the attribute meets some specified criterion, for example movement above a specified speed limit, or movement into a particular area. Similarly, when the central device 40 instructs the edge device to analyse the previously stored raw image data in order to identify the presence of, say, a person wearing red clothing, it may also instruct the edge device to notify it as soon as such a person is identified. Alternatively, the central device 40 may instruct the edge device to execute a trigger to inform an external authorised application (via a subscribe/publish interface) of that event, and possibly to transmit the relevant data to it.
The central device may also perform a search on the information that has been transmitted to it from the edge devices.
Thus, the process of collecting information about the environment is separated from the applications using the data, which are typically carried out in the central device, or in a further device having access to the data stored in the central device.
Thus, storing the raw sensor data at the edge device means that all of the data is available, and can be searched subsequently, without requiring expensive backhaul of all of the data to the central device. Abstracting from the raw data the attributes of the entities in the environment, and transmitting that information to the central device, means that the backhaul requirement is significantly reduced.
As mentioned above, storing the raw sensor data at the edge device means that new applications can be developed at the central device, and can access the raw sensor data from an earlier time period.
This architecture also means that edge devices can be upgraded piecemeal. For example, more accurate sensor hardware can be introduced at just some of the edge devices, without affecting the overall architecture. For example, some edge devices may be provided with image sensors having increased camera resolution or improved shutter speed. The same edge devices, or other edge devices, may be provided with radar sensors having increased radar object resolution and ranging. Similarly, improved detection and tracking algorithms can be introduced at some or all edge devices while they are in operation.
In addition, the upgrading of the edge devices may include adding new sensor types after initial deployment of the edge device, thereby improving performance, and potentially allowing the possibility of collecting different data, and generating information about additional attributes of entities in the environment.
In the illustrative example above, the operation of the system has been described with reference to image sensors, for tracking physical objects. The environment in the locality of an edge device may also be considered to have aspects which are less physical such as, but not limited to, 'environmental sound state' or 'environmental air quality'. These too may be considered to be entities and may be described by reference to their attributes.
In the context of the less physical entities described above, additionally, or alternatively, some or all edge devices may be provided with audio microphones, for detecting environmental sounds. Then, the processor in the respective edge device may analyse the raw data provided by the microphone(s) to generate information about attributes of the environment. That is, the processor may attempt to detect attributes of the audio environment, for example specific noises, such as human speech, or gunshots, or excessively loud vehicles, or sounds that are typical of specific vehicles.
Additionally, or alternatively, some or all edge devices may be provided with chemical sensors, for detecting pollutants or other chemicals in the environment. Then, the processor in the respective edge device may analyse the raw sensor data to generate information about attributes of, for example, the 'air quality' entity in the environment. For example, the processor may notify the central device when a level of a particular pollutant exceeds a threshold level. Meanwhile, the raw sensor data remains stored in the edge device. The central device may receive the information about the attributes of the 'air quality' entity from multiple edge devices, and may combine this received information to form a representation of the 'air quality' entity in the wider coverage area.
Additionally, or alternatively, some or all edge devices may be provided with radio detectors, for detecting radio transmissions from nearby devices, for example detecting unpaired Bluetooth discovery data. Then, the processor may analyse the raw data to obtain information about elements in the environment, for example in the form of MAC addresses of the transmitting devices.
Returning to Figure 5, this shows a further aspect of the system as it may be used to track vehicles through an urban environment.
As described above, in the system shown in Figure 1, the edge device 20 may be able to detect the presence of the vehicle 100 in the street 10, and can also determine that it is heading in the direction of the junction with the streets 12, 14, 16.
In order to provide low latency object tracking and faster acquisition of fast moving objects between edge device installations, edge devices may be able to communicate directly, for example over Wi-Fi or some other high speed technology, with other nearby edge devices. This provides a much more efficient and less error prone hand-over of detected objects as they leave the range of one edge device and enter into the range of another. The edge devices may be deployed so that their coverage areas are overlapping or contiguous, as this may result in better quality and a higher confidence in the accuracy of the objects path, but this is not essential.
This direct communication may be controlled by the central device 40. For example, if one edge device detects movement of a vehicle of interest, and reports this to the central device, the central device may instruct that one edge device to cooperate with nearby edge devices to track that vehicle through their respective coverage areas.
When this direct communication is configured, the originating and receiving edge devices are configured to transmit and receive such notifications. Any device acting as an originating device can establish a communication path with selected adjacent devices, if this has not already been done.
Thus, in Figure 1, when the edge device 20 detects the presence of the vehicle 100, heading in the direction of the junction with the streets 12, 14, 16, it can send a handover to the edge devices 22, 24, 28, which cover the streets along which the vehicle 100 may travel next (as shown at 100a, 100b, 100c). The handover may be sent by broadcasting or multicasting a message to the relevant edge devices, or sending separate messages to each relevant edge device, describing the attributes of the entity or object. For example, in this illustrated example of a vehicle, the information sent in the handover message may include some or all of: a unique tracking ID, a last known GPS location of the vehicle, further classification of the vehicle (whether it is a car, truck, etc), the colour of the vehicle, the number plate details of the vehicle, and any other available attributes that might help the receiving edge device to identify the vehicle in its image data. This same data can be passed on to subsequent edge devices as the vehicle passes from one coverage area to another, and subsequent edge devices can also include additional information or modified information, for example if a first edge device identifies a specific vehicle but is unable to read its number plate, and a second edge device is able to read the number plate, the number plate details may be included in the information passed from the second edge device to a third edge device, etc. Thus, as shown in Figure 5, information about the vehicle 100 is sent from one edge device to one or more other edge device over the relevant communications link 80.
Typically only one of those edge devices will detect the vehicle 100 and process it in a way similar to that in which it would process an entity established within its own range, others will not detect the vehicle and therefore not process it as an entity in the environment in their range.
If the edge device 28 detects the presence of the vehicle on the street 14, heading away from the junction with the streets 10, 12, 16, as shown at 100b, it can determine that the vehicle may continue along the street 14 as shown at 100d, and remain within its own coverage area, or may turn along the street 18, as shown at 100e, and so it may send a handover to the edge device 26. If the edge device 28 detects the continued presence of the vehicle on the street 14, as shown at 100d, it can send a handover to the edge device 30, as this covers the street 14 along which the vehicle is expected to continue travelling, as shown at 100f.
To carry out this handover functionality, each edge device must be provided with or itself establish, based on communication between edge devices, information about a nearby street layout, and/or information about the locations of other edge devices. The relevant information may be specifically downloaded to the respective edge devices from the central device, or the edge devices may be provided with access to map information, which each edge device can then access to determine its own location, with each edge device being further provided with a means to discover other nearby devices and their locations.
Thus, while this is only one example of a system that can generate useful technical data, it can be seen that the system described with reference to Figures 1-5 can generate data relating to the movement of specific vehicles or pedestrians through a coverage area.
It would be useful to be able to release such data to a third party, for example for the purposes of planning traffic systems.
Figure 6 illustrates a system 200 for managing the release of data. This is described with reference to a traffic monitoring system of the type described above, but the source of the data may produce any technical data. The system may for example be operated by a suitable trusted authority, gathering data from its own sources and/or other sources, and releasing data in a useable form to suitable data recipients.
As described above, the system 200 has edge-processing multi-sensor platforms 20, of which only one is shown in Figure 6, around a geographical area (typically a city) generating many observations, which are produced by analysis of video, radar, or other sensor data to form elements such as "a blue car parked here", which in some cases is associated with ANPR data. These are stored in the central system 40. In particular these data points may comprise observations of certain vehicles passing certain points at certain times.
The observations are then enriched in a data enrichment block 202 to form data packages, which comprise multiple events (which might be referred to as 'event streams' if they are ongoing, or 'event packages' if they are static) in which someone might be interested, for any one of many applications. Events are somewhat more abstracted than observations, with more information contained in each and probably with a greater relevance to a target user or application. For example these may include Origin-Destination (0-D) pairs for vehicle journeys, as these O-D pairs have value in a range of applications, for example the planning of traffic systems.
The data enrichment block may also receive other data 204, for example from different sources. For example, when the data relates to vehicle locations or movements, and the data includes AN PR data, the data enrichment block 202 may receive additional data from a vehicle licensing authority, which may be able to provide detailed information about any vehicle on the basis of identification by its licence plate. For example, the vehicle licensing authority may be able to provide information about the category, make, model, and colour of any vehicle on the basis of the licence plate information.
In some examples a trust engine 206 can determine what information can be released to any potential recipient, typically with reference to information about that recipient, as well as the conditions under which such information can be released.
The data packages generated by the data enrichment block 202 are then passed to a data anonymization block 208, where they are diced, rearranged and filtered, particularly with regard to privacy and trust issues, but also including filters for example based on time, geographical area, and immediacy, to generate data products that can be released as shown at 210. This process may be governed or informed by the determination of the trust engine 206.
The data enrichment block 202 and the data anonymization block 208 may be implemented in a central device, such as the central device 40 shown in more detail in Figure 4, or may be implemented in any other device having the same general structure as the device 409, with suitable processing, memory and transceiver functions. The processes of data enrichment and data anonymization, as discussed in more detail below, may be performed by one or more suitable computer program product, for example stored in any suitable tangible device, for example a memory device, and capable of running on a processing device, on the basis of data that may be stored in the same memory or a different memory.
As described in more detail below, the anonymization is a technical process, which ensures that an appropriate level of anonymization is applied to the data, depending on the type of data, and depending on the intended recipient of the data product.
These data sets can be offered to data recipient users or applications with confidence that all needs related to trust have been addressed, with the trust issues having been delegated in a transparent and traceable way to a suitable party. This allows for the trust to be regulated, for example allowing privileged access to the data for law enforcement authorities, and allowing data providers to prevent their data from being released to certain recipients, with an audit trail of what data has been released to each recipient and the reasons for its release. For example, if the system 200 is operated by a city authority, that authority can combine information from its own sources and from private sources, but can ensure that its own information is only released to suitable recipients, with transparent access to the reasons for such release and the benefits to its citizens of such release.
The release of the information as shown at 210 may be achieved by sending the data to the intended recipient. Alternatively, the release of the information as shown at 210 may be achieved by allowing the intended recipient to access the data in a secure environment, for example on a computer system managed by the operator of the system 200.
One important aspect is the use of mechanisms to maintain an appropriate level of anonymity in the data that is released.
It is known that an appropriate level of anonymity can be established by reporting the data only with an appropriately coarse level of granularity.
Examples are described below with reference to data obtained from sensors that can detect the movement of vehicles. However, the data can be obtained from other sensors, or other sources, and may relate to any physical aspect or characteristic of any physical object or objects. In the examples given below, the physical aspects include the locations of the objects and the times at which the objects are detected at the locations. However, the data may relate to any other physical characteristics such as the speed, temperature, colour, or chemical composition of the physical objects.
The anonymized data can then be released so that they can be used in real-life applications, such as the monitoring and control of the physical objects.
In one specific example, the data is in the form of vehicle Origin-Destination (0-D) data, in which the journey start and end points may be recorded with fine spatial and temporal granularity. The result is that O-D records are likely to be unique, and thus not meeting the requirement for a minimum of k-anonymity, as described in more detail below.
This is illustrated with reference to Figure T Specifically Figure 7 shows a partial map of a city, indicating roads. Sensors around the city may for example report that a first vehicle identified by its number plate left a first specific point 302 at a first specific time, say 07:32 for the sake of example, and arrived at a second specific point 304 at a second specific time, say 07:51, again for the sake of example. Similarly, sensors may for example report that a second vehicle identified by its number plate left a third specific point 306 at a third specific time, say 07:35 for the sake of example, and arrived at a fourth specific point 308 at a fourth specific time, say 07:59, again for the sake of example.
This information is potentially useful, but it would be contrary to the requirements of privacy for this information to be released, since the vehicles can be directly identified.
Moreover, even if the number plate information is removed to provide a degree of anonymity, the release of this information may still be unacceptable, because the detailed information may allow the vehicles and/or their occupants to be identified, which again would be a breach of privacy.
However, it may be acceptable to report that two vehicles left a first region, namely the grid square F6, during a first time window from 07:30 -07:40, and arrived in a second region, namely the grid square D2, during a second time window from 07:50 -08:00.
The sizes of the first and second regions and the durations of the first and second time windows are set in order to ensure the appropriate level of anonymity. Thus, the parameter values are placed in respective bins. The level of anonymity is measured by a value 'W. Specifically, k-anonymity is a property of a dataset, and a dataset is said to be k-anonymous if every combination of identity-revealing characteristics occurs in at least k different rows of the data set. For example, privacy may be considered to be adequately protected if a k value of 5 is maintained, that is, if a given record relates to a certain individual, then there are at least 4 other identical records relating to different individuals.
Figure 8 is a flow chart, illustrating a first method according to this disclosure.
Specifically, at step 402, a plurality of data items are obtained. For example, each item may be obtained directly from the relevant sensors, or may be obtained indirectly, for example after combining data from multiple sources. Each data item comprises values for a plurality of parameters.
Each data item is also associated with at least one respective time of occurrence. The time of occurrence of a data item may be a time at which one of the parameter values making up the data item is recorded. For example, in the case of vehicle Origin-Destination (0-D) data, there are at least two possible times of occurrence, namely a departure time and an arrival time, while an average of the departure time and the arrival time may also be an appropriate time of occurrence in some situations.
In step 404, the data items are anonymized, and an anonymized data set is formed.
Specifically, the method of Figure 8 takes advantage of the fact that, for many types of technical data, more data is available in some periods than in others. In the case of traffic data, traffic flows vary according to the time of day and other factors, and so more data items are available during peak periods.
For example, a data set relating to traffic movements in an area such as that shown in Figure 7 may contain the following data items, amongst others: Origin location Destination location Arrival times D5 B1 05:02, 05:12, 05:16, ... 05:23, ...08:02, 08:02, 08:06, 08:09, 08:11, 08:13, D6 B1 05:10, 05:22, 05:28, ...08:01, 08:02, 08:07, 08:08, 08:14, D5 B2 05:03, 08:13, 05:07, 08:11, 05:11, ... 05:14, 08:12, 05:21, ...08:02, 08:03, 08:05, 08:09, 08:09, D6 B2 05:18, 05:27, ...08:01, ... 08:04, 08:06, 08:07, 08:13, In this data set, the location information has already been binned, so that the departure locations and the arrival locations are expressed in terms of the grid squares in which they are found, rather than more precisely.
Thus, one vehicle leaves a location in grid square D5 and arrives at a location in grid square B1 at a time between 05:00 and 05:10; two vehicles leave locations in grid square D5 and arrive at locations in grid square B1 at times between 05:10 and 05:20; one vehicle leaves a location in grid square D5 and arrives at a location in grid square B1 at a time between 05:20 and 05:30; and four vehicles leave locations in grid square D5 and arrive at locations in grid square B1 at times between 08:00 and 08:10. Similarly, one vehicle leaves a location in grid square D6 and arrives at a location in grid square B1 at a time between 05:00 and 05:10; no vehicles leave locations in grid square D6 and arrive at locations in grid square B1 at times between 05:10 and 05:20; two vehicles leave locations in grid square D6 and arrive at locations in grid square B1 at times between 05:20 and 05:30; and four vehicles leave locations in grid square D6 and arrive at locations in grid square B1 at times between 08:00 and 08:10, and so on.
In general, in order to maintain a desired level of k-anonymity, the sizes of the bins must be set with a granularity that provides the desired level of k-anonymity, even at times when there is relatively little data being generated.
Thus, in this example, to maintain a desired level of k-anonymity, where k = 3, for example, if the location bins are kept equal to the grid squares, the sizes of the time bins must be set such that there are enough vehicle movements during the off-peak period. Thus, the bin for the time of arrival parameter must be set to a width of 30 minutes, so that all of the vehicles arriving between 05:00 and 05:30 are placed in the same time bin.
This method recognizes that the desired level of k-anonymity can be achieved with less granularity (i.e. finer spatio-temporal resolution) at peak traffic times, and moreover that these may be the most interesting times, for example for traffic planning.
Thus, in step 406 of the method, the plurality of data items are anonymized to form an anonymized dataset, by defining bins having respective ranges for a plurality of said parameters, and specifically by defining bins having variable sized ranges for at least one of the parameters, wherein the sizes of said bins are based on a rate at which data items are recorded at the respective times of occurrence. In the case of data items relating to traffic movements, the sizes of said bins may be based on traffic flow rates at the respective times of occurrence.
Thus, wider bins are defined for time periods where the rate at which data items are recorded is lower. In this example, the space and/or time bin sizes can be defined in accordance with the traffic level at the time of recording the data. in order to meet the required value of k.
In the example given above, the desired level of k-anonymity can be achieved by setting a 30 minute wide time bin between 05:00 and 06:00, while setting a 10 minute wide time bin between 08:00 and 09:00.
Alternatively, if it is considered more desirable to maintain a high level of granularity in the time measurements, even during off-peak periods, the sizes of the location bins can be varied. Thus, while maintaining a constant 10 minute wide time bin at all times, the sizes of one or both of the location bins can be increased. In the illustrated example, by way of illustration, the location bins for the departure location and the arrival location can be made equal to two grid squares, so that all vehicles leaving from locations in the grid squares D5 and D6, and arriving at locations in the grid squares B1 and B2 are considered together.
Thus, measures are taken to adjust the reporting of traffic data to reflect traffic flow rates, in order that privacy or anonymity is not compromised.
Figure 9 is a flow chart, illustrating a second method according to this disclosure, which may be used in appropriate circumstances.
Specifically, a plurality of data items are obtained. For example, each item may be obtained directly from the relevant sensors, or may be obtained indirectly, for example after combining data from multiple sources. Each data item comprises values for a plurality of parameters.
As described above, the intention is to form a dataset that can be made available to a data user. Therefore, the plurality of data items are anonymized to form an anonymized dataset, by defining bins having respective ranges for a plurality of said parameters.
In many situations, fine-grained data (for example in temporal and/or spatial dimensions) has considerably greater utility, for example permitting short-term maxima to be identified or allowing more accurate linkage between different data points or records.
However, a balance must be found between granularity and the risk of re-identification.
Re-identification of a record in a released data set is a threat. That is, if data is released into the public domain, it may be subject to intensive attempts to re-identify the source of the records, thus threatening the privacy of the data. For example, re-identification may be accomplished by combining the data in a data set with data from a different source.
Therefore, in step 502 of the method shown in Figure 9, it is determined whether the data will be visible to a user in a secure environment, for example an environment intended to prevent the processing of the data with other data, or whether the data will be made available to the user without such limitation to the user. For example the data may be made available only for viewing and not for storage or onward transmission, based on physical access constraints and the limitation of equipment which may be used in that environment. As an alternative, by contrast, the data may be transmitted to the user, for them to use as they wish, with no realistic way for the system operator to put any constraints on such use.
If it is determined in step 502 that the data will be visible to the user in a secure environment, the method passes to step 504, and a first level of anonymization is applied.
However, if it is determined in step 502 that the data will be transmitted to the user, the method passes to step 506, a second level of anonymization is applied, where the second level of anonymization provides a higher level of anonymity than the first level of anonymization.
Thus, in order to maintain an acceptable overall risk of re-identification, an anonymized dataset that is to be released to a user so that they have access to all of the data may need to be very coarse-grained (i.e. with relatively large bin sizes) so that such re-identification is difficult, even when the data is combined with other possible sources of data. On the other hand if data is to be visible to the user only in a secure environment, for example with strictly controlled access to the servers of the authority operating the system 200, then it may in practice be impossible to combine the data with other sources of data, and hence attempts at re-authentication may be very unlikely to occur and/or may prove impossible, permitting a degree of relaxation of the protection of a given record, that is, using more fine-grained data, i.e. with smaller bin sizes.
Thus to achieve the required level of privacy, i.e. the required degree of certainty that re-identification of the data will not occur, the level of k-anonymity in the dataset can be set to a first level or to a second level, according to the way in which the user will be able to access the data.
Figure 10 is a flow chart, illustrating a third method according to this disclosure.
Specifically, in step 442, a plurality of data items are obtained. For example, each item may be obtained directly from the relevant sensors, or may be obtained indirectly, for example after combining data from multiple sources. Each data item comprises values for a plurality of parameters.
In step 444, the plurality of data items are anonymized to form a first anonymized dataset. As described previously, this involves defining bins having respective ranges for a plurality of said first parameters.
At step 446, it is determined that data relating to a second parameter should be added to the first anonymized dataset. For example, after release of the first anonymized data set, a recipient may request that additional data be added, relating to a different parameter.
The combining of multiple data sets can result in an increased risk of re-identification of individual records (for example by adding a further data set comprising spatial or temporal points relevant to each record or a subset of records). As described above, one solution to this is to make the data sets more coarsely-grained.
At step 448, an alternative solution is used. Specifically, aggregated values for the second parameter are added to the first anonymized dataset to form a second anonymized dataset. The aggregated values may be aggregated over a plurality of bins for at least one of the first parameters. The second parameter may be an indication of a category that each object falls into. In that case, the aggregated values may be values that indicate the number of objects in particular categories. Alternatively, the second parameter may be a numerical value. in that case, the aggregated values may be parameter values averaged over one or more bins.
To illustrate this, firstly, an example of a case where the second parameter is an indication of a category will be described.
In the following example, each data item comprises a starting point and an end point for a vehicle journey (i.e. an origin-destination pair) and an associated time (for example a journey start time). In this illustration, the starting point and the end point have been allocated to bins that are referred to as regions, and the start times have also been placed in bins, which are of 15 minutes duration. This provides an appropriate level of anonymity, as shown below: Time bin Spatial bin Number of vehicles 09:00 -09:15 Region 1 -4 Region 2 60 09:15 -09:30 Region 1 -Region 2 50 09:30 -09:45 Region 1-, Region 2 40 In this illustration, the data user wants additional information about a category relating to the vehicles, for example the colours of the vehicles. Providing full information about the colours within each of these three records could easily mean that the level of anonymity is reduced to an unacceptable degree.
Therefore, in order to provide information about the colours of the vehicles, without reducing the level of anonymity to an unacceptable degree, the information about the colours of the vehicles can be aggregated across all three records, for example: Time bin Spatial bin Number of vehicles Number of white vehicles Number of black Number of blue etc vehicles vehicles 09:00 - Region 1 60 09:15 -> 40 20 15 Region 2 09:15 - Region 1 50 09:30, Region 2 09:30 - Region 1 40 09:45, Region 2 Thus, it is indicated that the number of blue vehicles leaving Region 1 and arriving at Region 2, with start times between 09:00 and 09:45 was 15, and this maintains an appropriate level of anonymity. However, if this number was broken down between the three original time bins, there would be a danger that the level of anonymity would be reduced to an unacceptable degree.
This example illustrates the aggregation over multiple time bins. However, it will be appreciated that, alternatively or additionally, the aggregation can equally well take place over multiple spatial bins, for example: Time bin Spatial bin Number of vehicles Number of white vehicles Number of black Number of blue etc vehicles vehicles 09:00 - Region 1 60 09:15 -* 45 25 20 Region 2 09:00 - Region 1 30 09:15 4 Region 3 09:00 - Region 1 70 09:15 4 Region 4 Thus, it is indicated that the number of blue vehicles leaving Region 1 and arriving at either Region 2, Region 3, or Region 4, with start times between 09:00 and 09:15 was 20, and this again maintains an appropriate level of anonymity.
The process of aggregation will now be illustrated with reference to an example of a case where the second parameter is a numerical value.
In this example, as before, the bin sizes have been chosen to ensure that a certain dataset meets anonymity requirements, and one example of a record is then: Time bin Spatial bin Number of vehicles 09:00 -09:15 Region 1 4 Region 2 60 In this illustration, the data user wants additional information about a category which has a numerical value, for example the CO2 emissions of the vehicles. This information may for example by available from a vehicle licensing authority, which is able to associate every licence plate recognized by an ANPR camera with a corresponding vehicle model, and has data about the CO2 emissions of each vehicle model (at least under standard test conditions).
The dataset could be expanded to include this information, but this would typically fracture each record such that anonymity is reduced to an unacceptable degree. For example, the single record shown above may result in the following: Time bin Spatial bin CO2 (g/km) Number of vehicles 09:00 -09:15 Region 1 4 Region 2 119 4 09:00 -09:15 Region 1 4 Region 2 142 3 09:00 -09:15 Region 1 4 Region 2 107 14 09:00 -09:15 Region 1 4 Region 2 90 16 09:00 -09:15 Region 1 4 Region 2 09:00 -09:15 Region 1 4 Region 2...
These bins may not provide the required level of anonymity (for example, they do not offer k = 5 anonymity), and so it may not be appropriate to release the data in this form.
The 002 emissions data is therefore aggregated by providing an average value, as shown below: Time bin Spatial bin Number of vehicles Avg CO2 (g/km) 09:00 -09:15 Region 1 4 Region 2 60 98.5 09:15 -09:30 Region 1 4 Region 2 50 96.5 Thus, an average value (i.e. the average CO2 emission value for each vehicle making up that record) is provided for each pair consisting of one time bin and one spatial bin.
Alternatively, as shown below, the CO2 emission value can be averaged across multiple records: Time bin Spatial bin Number of vehicles Avg CO2 (g/km) 09:00 -09:15 Region 1 4 Region 2 60 97.6 09:15 -09:30 Region 1 4 Region 2 50 Thus, the numbers of vehicles are indicated for each record consisting of one time bin and one spatial bin, but the 002 emission value is averaged across multiple time bins.
Although these illustrative examples form the average as the mean value, numerous other possibilities exist. For example, the median value might be used for parameters where anonymity might be impacted by the presence of outliers (e.g. the presence of a single very heavy vehicle might be easy to recognise in datasets even if the average vehicle weight were to be added to a record rather than the individual weights).
Thus, steps are taken which can ensure that adding additional data points to a data set can be done without compromising privacy or anonymity.
In general, therefore, methods for achieving an appropriate level of anonymization of the data can be implemented.

Claims (31)

  1. CLAIMS1. A method for achieving a desired level of anonymity of data, the method comprising: obtaining a plurality of data items, wherein each data item comprises values for a plurality of parameters, at least one of which is recorded at a respective time of occurrence; anonymizing the plurality of data items ta form an anonymized dataset, by defining bins having respective ranges for a plurality of said parameters; and further comprising: defining bins having variable sized ranges for at least one of the parameters, wherein the sizes of said bins are based on a rate at which data items are recorded at the respective times of occurrence.
  2. 2. A method in accordance with claim 1, wherein the plurality of data items correspond to traffic movements, the method comprising: defining the sizes of the bins for the at least one of the parameters, based on traffic flow rates at the respective times of occurrence.
  3. 3. A method in accordance with claim 2, wherein the plurality of data items correspond to vehicular traffic movements.
  4. 4. A method in accordance with claim 2, wherein the plurality of data items correspond to pedestrian traffic movements.
  5. 5. A method in accordance with any of claims 2 to 4, wherein the plurality of data items are origin-destination pairs.
  6. 6. A method in accordance with any preceding claim, comprising defining bins having variable sized ranges for a location parameter.
  7. 7. A method in accordance with any preceding claim, comprising defining bins having variable sized ranges for a time parameter.
  8. 8. A method for achieving a desired level of anonymity of data, the method comprising: determining whether the data will be visible to a user only in a secure environment intended to prevent the processing of the data with other data, or whether the data will be made available to the user without such limitation; and applying a first level of anonymization if the data will be visible to the user only in a secure environment; or applying a second level of anonymization if the data will be made available to the user without such limitation, wherein the second level of anonymization provides a higher level of anonymity than the first level of anonymization.
  9. 9. A method in accordance with claim 8, wherein the data represents values of parameters relating to physical objects.
  10. 10. A method in accordance with claim 8 or 9, wherein applying the first level of anonymization and applying the second level of anonymization comprise setting bin sizes for the data, and wherein applying the first level of anonymization comprises setting a smaller bin size for at least one parameter of the data than applying the second level of anonymization.
  11. 11. A method in accordance with claim 10, comprising setting bin sizes for a location parameter.
  12. 12. A method in accordance with claim 10 or 11, comprising setting bin sizes for a time parameter.
  13. 13. A method in accordance with any of claims 8 to 12, wherein the data relates to traffic movements.
  14. 14. A method in accordance with claim 13, wherein the data relates to vehicular traffic movements.
  15. 15. A method in accordance with claim 132, wherein the data relates to pedestrian traffic movements.
  16. 16. A method in accordance with any of claims 13 to 15, wherein the data are origin-destination pairs for traffic.
  17. 17. A method for achieving a desired level of anonymity of data, the method comprising: obtaining a plurality of data items, wherein each data item comprises values for a plurality of first parameters; anonymizing the plurality of data items to form a first anonymized dataset, by defining bins having respective ranges for a plurality of said first parameters; determining that data relating to a second parameter should be added to the first anonymized dataset; and adding aggregated values for the second parameter to the first anonymized dataset to form a second anonymized dataset.
  18. 18. A method in accordance with claim 17, wherein the aggregated values are aggregated over a plurality of bins for at least one of the first parameters.
  19. 19. A method in accordance with claim 17 or 18, wherein the aggregated values are average values.
  20. 20. A method in accordance with claim 19, wherein the average values are mean 20 values.
  21. 21. A method in accordance with claim 19, wherein the average values are median values.
  22. 22. A method in accordance with claim 19, wherein the average values are modal values.
  23. 23. A method in accordance with any of claims 17 to 22, wherein each data item represents values of parameters relating to physical objects.
  24. 24. A method in accordance with claim 23, wherein each data item includes a location parameter.
  25. 25. A method in accordance with claim 23 or 24, wherein each data item includes a time parameter.
  26. 26. A method in accordance with any of claims 17 to 25, wherein the data items relate to traffic movements.
  27. 27. A method in accordance with claim 26, wherein the data items relate to vehicular traffic movements.
  28. 28. A method in accordance with claim 26, wherein the data items relate to pedestrian traffic movements.
  29. 29. A method in accordance with any of claims 26 to 28, wherein the data items are origin-destination pairs for traffic.
  30. 30. A system, comprising: an input for receiving data; and a processor, wherein the processor is configured to perform a method in accordance with any of claims 1 -29.
  31. 31. A computer program product, comprising computer-readable code, configured for causing a suitably programmed processor to perform a method in accordance with any of claims 1 -29.
GB1905778.5A 2019-04-25 2019-04-25 Data anonymization Withdrawn GB2583363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1905778.5A GB2583363A (en) 2019-04-25 2019-04-25 Data anonymization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1905778.5A GB2583363A (en) 2019-04-25 2019-04-25 Data anonymization

Publications (2)

Publication Number Publication Date
GB201905778D0 GB201905778D0 (en) 2019-06-05
GB2583363A true GB2583363A (en) 2020-10-28

Family

ID=66810172

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1905778.5A Withdrawn GB2583363A (en) 2019-04-25 2019-04-25 Data anonymization

Country Status (1)

Country Link
GB (1) GB2583363A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4209951A1 (en) * 2022-01-06 2023-07-12 Volkswagen Ag A method, a computer program product and a device for dynamic spatial anonymization of vehicle data in a cloud environment
WO2023131684A1 (en) * 2022-01-06 2023-07-13 Volkswagen Aktiengesellschaft A method, a computer program product and a device for dynamic spatial anonymization of vehicle data in a cloud environment
DE102022103049A1 (en) 2022-02-09 2023-08-10 Cariad Se Method for checking a data provision order, verification device and data provision system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4209951A1 (en) * 2022-01-06 2023-07-12 Volkswagen Ag A method, a computer program product and a device for dynamic spatial anonymization of vehicle data in a cloud environment
WO2023131684A1 (en) * 2022-01-06 2023-07-13 Volkswagen Aktiengesellschaft A method, a computer program product and a device for dynamic spatial anonymization of vehicle data in a cloud environment
DE102022103049A1 (en) 2022-02-09 2023-08-10 Cariad Se Method for checking a data provision order, verification device and data provision system

Also Published As

Publication number Publication date
GB201905778D0 (en) 2019-06-05

Similar Documents

Publication Publication Date Title
US10522048B2 (en) Community drone monitoring and information exchange
US20230161816A1 (en) Short-term and long-term memory on an edge device
US20200374342A1 (en) Scalable and secure vehicle to everything communications
US9559804B2 (en) Connected vehicles adaptive security signing and verification methodology and node filtering
US8428856B2 (en) Methods, systems, devices, and computer program products for implementing condition alert services
US7890060B2 (en) Enhanced location based service for positioning intersecting objects in the measured radio coverage
US20170352262A1 (en) Method and apparatus for classifying a traffic jam from probe data
GB2583363A (en) Data anonymization
US20200387560A1 (en) Notifying entities of relevant events
US9024779B2 (en) Policy based data management and imaging chipping
CN112088397A (en) System and method for vehicle geofence management
GB2580495A (en) Monitoring system
WO2014000161A1 (en) Apparatus and method for cooperative electronic data collection
JP2005333637A (en) System and method for detecting signal tampering
TWI797306B (en) Systems and methods for remote management of emergency equipment and personnel
US11320281B2 (en) On-demand vehicle imaging systems and methods
EP2827259A1 (en) Method for providing image data for a location
US20230306130A1 (en) Information processing system, information processing method, and program recording medium
Joy et al. Privacy risks in vehicle grids and autonomous cars
Rindt Situational Awareness for Transportation Management: Automated Video Incident Detection and Other Machine Learning Technologies for the Traffic Management Center
Hossen et al. A Review on Outdoor Parking Systems Using Feasibility of Mobile Sensors
KR20230003512A (en) Crowd-sourced detection and tracking of unmanned aerial systems
Thakuriah et al. Data sources and management
Nwebonyi et al. Check for updates
McQuiddy Unattended ground sensors for monitoring national borders

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)