WO2016180460A1

WO2016180460A1 - In-device privacy control mechanism for wearable smart devices

Info

Publication number: WO2016180460A1
Application number: PCT/EP2015/060310
Authority: WO
Inventors: Pan Hui; Ji Yang; Muhammad Haris; Christoph Peylo
Original assignee: Deutsche Telekom Ag
Priority date: 2015-05-11
Filing date: 2015-05-11
Publication date: 2016-11-17
Also published as: EP3295696A1

Abstract

The present invention relates to a wearable smart device such as smart glass. The device comprises a camera for taking a video stream; a buffer for temporarily storing the video stream; a face detection module for detecting a face of a person in the video stream in the buffer by face recognition; a gesture detection module for tracking the person in the stored video stream after the face of the person has been detected by the face detection module, and determining whether the person has made a predefined gesture by detecting the predefined gesture in the stored video stream; and a de-identification module for de-identifying, in the stored video stream, the face of the person who has made the predefined gesture by removing facial identification information from a video segment which will be taken by the camera after the predefined gesture has been detected by the gesture detection module.

Description

IN-DEVICE PRIVACY CONTROL MECHANISM FOR

WEARABLE SMART DEVICES

Technical Field

The present invention generally relates to a control mechanism for privacy in wearable computing devices (wearable smart devices) equipped with a camera, in particular in smart glasses. In particular, the present invention relates to a method of providing a framework to ensure privacy of people which are detected by a camera, preferably a digital camera of wearable computing devices. The framework of the present invention can be provided by a system and/or a method which guarantees that privacy of people on photos or videos taken by wearable smart devices is preserved, preferably by using in-device techniques.

Background of the invention

Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics or GPS data. It is related to a more general concept called mediated reality, in which a view of reality is modified by a computer. As a result, the technology functions by enhancing one's current perception of reality. Augmentation is conventionally in real-time and in semantic context with environmental elements, such as sports scores on TV during a match. With the help of advanced AR technology, the information about the surrounding real world of the user becomes interactive. Artificial information about the environment and its objects can be overlaid on the real world.

Smart glasses are wearable computing devices in the form of computerized eyeglasses, which typically comprise an optical head-mounted display (OHMD). Due to latest developments in wearable technology modern smart glasses typically possess enhanced data processing functionality similar to a smart phone or tablet and are able to run sophisticated applications. These devices can also include special features such as augmented reality overlay, GPS and mapping capability. Despite all the advantages, these devices also give rise to new challenges about privacy. A main feature regarding this aspect is the camera used in these devices. Since the smart glass is controlled by the wearer, the wearer can control when photos or videos are taken, i.e., a wearer would typically not necessarily ask for permission from those around the wearer (see Fig. 1). Even though every user of a mobile phone or smartphone can also take photos and videos, in most cases it is easy to recognize when the user is taking a photo or a video, because the user has typically to hold up the smartphone. Due to size and nature of the devices like smart glasses, however, it is much harder (if not impossible) to recognize whether a wearer is taking a photo or a video with a smart glass.

It is to be noted that the present invention is preferably applicable, but not limited to smart glasses as long as the devices are wearable (e.g. smart watch which is worn around the wrist). Taking a photo or a video by a camera of such devices may be easier to be recognized than the case using smart glasses, but can still be done without people in the surroundings being aware of that. The smart device to which the present invention is applicable may not necessarily be a single integrated device. For example, the present invention covers a device construction in which a camera module with a communication interface is separately provided as a wearable unit, and other elements (which are not necessarily wearable) are integrated as a separate unit for communication with the camera module.

Moreover, face recognition capabilities can be easily integrated into AR applications. Thus, photos and videos of people from the on-device camera of the device can be identified with facial recognition software. Once identified, the holder of the device could be presented with the person's social networking service profiles (e.g. Facebook profile, Twitter feed) or Internet search results linked to his/her profile. Individuals typically do not expect such an automated link with their internet data if they move in public, they have an expectation of anonymity. It is to be noted that the present invention is preferably applicable, but not limited to AR applications (requiring a display) such as smart glasses as long as the wearable computing devices are equipped with a camera.

These privacy issues are serious and have gained attention on official and personal levels. For example, Google Glass™ is known in the art. This smart glass is capable of recording audio, video, and photos. This device can also use GPS (Global Positioning System) for location-tracking and directions. The device is further capable of handling computational tasks as it is also equipped with processor chip and GPU. For instance, there exists an application to take pictures surreptitiously by winking. To an outside individual it is difficult or impossible to recognize whether the user of the smart glass is recording audio or video with the smart glass. Furthermore, all data recorded by the smart glass, including photos, videos, audio, location data, and user data, can be stored in a cloud server, e.g., on Google's cloud servers. Present smart glass can connect to the Internet via Wi-Fi, or tether to the user's smartphone. Moreover, even when temporary offline the smart glass can record audio and/or video.

Recently members of Congress asked Google Inc. Chief executive to give assurances about privacy safeguards for smart glass devices. Moreover a new survey of around 4,000 UK residents conducted by Rackspace and Goldsmiths at the University of London has found that 20 percent of respondents believe that smart glasses should be banned outright, while 61 percent think smart glasses and other wearable camera devices should at least be regulated. Another survey conducted by Bite Interactive shows that only 10% of USA residents trust in smart glasses. Moreover many private entertainment places have banned smart glasses.

There are already some efforts to change the privacy policy for smart glasses. For instance, the sale of any applications which use face recognition or which record video without turning on an indication light on the smart glass is restricted or forbidden in official market places or application stores.

Although there are already efforts to reduce the above mentioned privacy risk by smart glasses, the known methods are not sufficient to grantee privacy. For example, blocking of face recognition applications in an official application market cannot restrict developers to develop face recognition applications.

Moreover, since privacy concern is related to people around a glass wearer, trust cannot be made on glass owners. A majority of people believes that they are almost anonymous in the public. In this situation, however, once a glass wearer takes pictures or videos of people around him, he can perform face recognition and identify the persons around him.

Privacy systems for video surveillance systems and video sharing systems are known, wherein privacy is preserved by altering videos by adding noise (or blurring) or removing or replacing private objects from the videos. A major drawback of such systems is the facts that important information in pictures or videos is destroyed, which is undesirable in a smart glass. The article "A Scanner Darkly: Protecting user privacy from perceptual applications" by Jana A. Narayanan and V. Shmatikov published 2013, proposes a privacy layer for visual systems.

Accordingly, there is a need of control mechanism which ensures privacy of people around wearable smart devices. In addition, it is further preferred that devices owners can still take photos or videos of those who are not disturbed (e.g. their friends) without violation of privacy.

Summary of the invention

The above objects are achieved by the present invention defined by the features of the independent claims. Dependent claims represent preferred embodiments.

The present invention proposes an in-device automated privacy framework for a wearable smart device, such as smart glasses or smart watches. The preferable goal of this invention is to protect the privacy of individuals while preserving sufficient information. The framework is composed of human face detection in the images from on-device camera. After the face detection, tracking of the person is performed in order to recognize a certain gesture of the person. In the tracking, robust tracking algorithm is preferably used, as any error in tracking will typically decrease the chances of recognition. In addition, after the tracking, the framework further comprises an intelligent de-identification.

Preferably, the framework of the present invention will provide balance between privacy and utility. The framework of the present invention is preferably intelligent to preserve privacy while keeping functionality of camera and sufficient information in the images/videos.

According to a first aspect of the present invention, a method for preserving privacy of a person visible in a camera module of a wearable smart device (e.g. smart glass) is performed within the smart device, wherein the method comprises the steps of:

taking a video stream by the camera module;

storing the video stream in a buffer within the smart device;

detecting a face of the person in the video stream stored in the buffer by face recognition; tracking, after the face-detecting step, the person in the video stream stored in the buffer, and determining whether the person has made a predefined gesture by detecting the predefined gesture in the video stream stored in the buffer; and de-identifying, in the video stream stored in the buffer, the face of the person who has made the predefined gesture by removing facial identification information from a video segment which will be taken by the camera module after the predefined gesture has been detected at the determining step.

With this method, from the time when a person around the smart device has made a predefined gesture (e.g. hand gesture, facial expression), the smart device de-identifies, in a video segment stored in the buffer, the face of the person.

In the context of the present invention, such a future video segment may be a single image (photo) or more (rather than a seamless video stream) taken by the camera module after the predefined gesture has been made. The user of the smart device may stop taking a video stream for a second and then take a photo, or may take a video stream and at the same time take a photo. In this case, the person's face of the photo can be de-identified.

Currently privacy controls in smart devices are only available to the owner of the devices. These devices involve, due to their nature (always on, always recording), a risk of harming privacy of people around the device who are not an owner. On the other hand, according to the method of the present invention, it can create controls that allow people in the surroundings to control their privacy by making a predefined gesture.

De-identification is an essential part of the present framework of the privacy control mechanism. De-identification preferably maintains balance between utility and privacy. Utility is a function of amount of features or information in the image. That is, it is preferable that information of each image (constituting the video stream) than is necessary to de-identify the face of the person is not image-processed upon de-identification in order to keep as much information as possible.

Preferably, the video stream in which the face of the person has been de-identified is stored in a storage within the smart device. This video stream stored in this storage is ready for access by the user of the smart device, for access by an application running in the smart device and/or for supply to an external device which is connected with the smart device. This guarantees that the video stream stored in the buffer before the de-identification has been performed is not accessible for any purpose than the de-identification (by a de-identification module).

Preferably, face features obtained by the face recognition at the face detecting step are preserved in a cache for a predetermined period of time after it has been determined that the face of the person disappears in the video stream. The de-identifying step restarts when it has been determined that the face of the person reappears in the video stream before the period of time has elapsed. The person may temporarily be not face-detected, for example if he turns back, e.g. for a few seconds, or moves out of a view range of the camera. By providing a cache for temporarily storing face features, it is possible to keep de-identifying without requiring the person making a predefined gesture again when the face is redetected.

This embodiment is also applicable to a case where the user of the smart device stops taking a video stream for a while and then restarts taking a video stream and/or a photo.

Preferably, the de-identifying step is retroactively performed on a video segment stored in the buffer between a time when the face has been detected and a time when the predefined gesture has been detected. Since the de-identification starts from a time when the gesture has been detected, and therefore the face of the person remains identifiable before the person has made a gesture. In this preferred embodiment, a video segment in the past can be subjected to the de-identifi cation process, which further improves privacy of people in the surroundings.

In the context of the present invention, such a past video segment may be a single image (photo) or more, if any, (rather than a seamless video stream) which were taken by the camera module. Again, the smart device may allow the user to take a video stream and a photo at the same time.

In order to protect the framework against manipulation, it is preferred that the method is implemented as a program which is located at a layer that is directly above an operating system kernel layer, wherein said layer adjacent the kernel layer is not accessible by applications running in the smart device, which are programmed by application developers. The purpose of this preferred requirement is to protect the framework from hacking attacks by developers. Here, developers refer to those who can write code to access services (camera, speaker, call log) for their applications that are intended to run in the smart device.

According to another aspect of the present invention, a wearable smart device (e.g. smart glass) comprises:

a camera module for taking a video stream;

a buffer for temporarily storing the video stream;

a face detection module for detecting a face of a person in the video stream in the buffer by face recognition;

a gesture detection module for tracking the person in the video stream stored in the buffer after the face of the person has been detected by the face detection module, and determining whether the person has made a predefined gesture by detecting the predefined gesture in the video stream stored in the buffer; and

a de-identification module for de-identifying, in the video stream stored in the buffer, the face of the person who has made the predefined gesture by removing facial identification information from a video segment which will be taken by the camera module after the predefined gesture has been detected by the gesture detection module.Further, the present invention relates to a computer program comprising computer executable program code adapted to be executed to implement the method of the present invention when being executed.

In the context of the present invention, the term "mechanism" or "framework" can relate to a set of methods and a system. Moreover, the terms "video stream" and "video" in the specification are interchangeable; they consists of multiple images (frames). Each image is subjected to de-identification process.

Brief description of the drawings

The present invention is described in more detail herein below by way of exemplary embodiments and with reference to the attached drawings, in which:

illustrates a situation with a user wearing a smart glass at a place with a plurality of persons; shows the basic components of a smart glass according to an embodiment of the present invention; shows the basic software components of Android™ OS, e.g., as used in a smart glass; shows the architecture of iOS™, which is currently used in Apple smart devices; shows some components of a camera module as used in a smart glass; shows a software stack of a framework according to an embodiment of the present invention; shows an embodiment of the framework of the present invention; Fig. 7 shows a flowchart illustrating preferred method steps according to the present invention; and

Fig. 8 shows a further flowchart illustrating preferred method steps of a further preferred embodiment according to the present invention.

Detailed description of the invention

Preferable embodiments consistent with this invention are currently delineated with relation to the drawings. For clarification purpose, specific details are set forth while not departing from the scope of this invention as claimed.

It should be appreciated that for the ease of explanation, the embodiments are preferably explained for videos in which each frame of a video can be treated as a separate image, but the present invention is also capable of working with images. Current image processing applications treat videos in the same way. It should further be appreciated that for the ease of explanation, the majority of embodiments are preferably explained for smart glasses, but the present invention is also capable of working with other wearable smart devices such as smart watches.

Figure 2 shows a smart glass (device) according to an embodiment of the present invention. In this embodiment, the device comprises: a memory (not shown), a memory controller 202, a processor (CPU) 203, a peripheral interface 204, RF circuitry 205, audio circuitry 207, a speaker 213, a microphone 210, an input output subsystem 208, a projection display 21 1 , a camera 212, software components 201, and other input devices or control devices (e.g. motion module 209). These components can communicate with each other over at least one communication buses or signal lines. The device can be any smart glasses and it is only one example for smart glasses. Therefore, it may have more or fewer components as shown in Fig. 2. The various components shown in Fig. 2 may be implemented in hardware and/or software.

The framework of the control mechanism according to the present invention preferably resides inside the device and will preferably be automated to ensure firewall against any kind of privacy breaching attempt. Most advanced, currently available, smart glasses have slimmed down their operating systems, but these operating systems are still adequate to handle software stacks running on mature kernels.

The preferable exact location of the framework of the present invention will be dependent on architecture of the operating system. For example, Google Glass has an Android operating system running in its core. In order to keep the framework from hacking attempts of developers of applications for the wearable smart device, preferably the framework exists in libraries layer of Android software stacks.

As will be appreciated by those skilled in the art, almost all of advanced operating systems of available smart glasses or devices can be divided in abstraction layers. These layers typically separate different functional units of the operating system. Although a fine grained detail of these abstraction layers may differ between operating systems, on higher level these operating systems are preferably divided typically into:

iv) application layer,

iii) service layer,

ii) library layer and

i) kernel/ hardware layer.

One such example of an Android OS, which is used for example in Google Glass, is shown in Fig. 3 a.

Moreover, as a further example, reference is made to Fig. 3b, which shows the architecture of iOS™, i.e., the operation system which is currently used in Apple smart devices. Although some names in Android OS and iOS are different, it can be seen that the abstraction layers of the operating system comprise an iv) application layer, e.g., applications 501 in Fig. 3a and Cocoa Touch™ in Fig. 3b. In particular, Cocoa Touch is a UI (user interface) framework for building software programs to run on the iOS operating system (for the iPhone™, iPod Touch , and iPad ) from Apple Inc. Cocoa Touch provides an abstraction layer of iOS.

Below said layer, the service layer (iii) is located, e.g., application framework 502 in Fig. 3a and Media Services (iii) in Fig. 3b.

Below said layer, the layer (ii) of core services or core libraries is provided (see again Figs. 3a and 3b).

Finally at the lowest layer, which directly communicates with the hardware (see e.g. Fig. 3b) is a core (kernel), e.g., the operating system kernel 505 in Fig. 3a and the Core OS in Fig. 3b.

Most operating systems of sophisticated smart glasses comprise an operating system kernel (e.g. core OS) and a hardware abstraction layer (see e.g. "Hardware" in Fig. 3). This hardware abstraction layer manages hardware resources and provides interface for hardware components like camera, microphone and speaker etc. These layers are the lowest layers in the abstraction layer model. Libraries and services exist in combined or separate layers just after this layer hardware abstraction layer and use hardware abstraction interfaces to perform their dedicated tasks.

As shown, for example, in Fig. 5 usually layers until services (see layers (iv) and (iii); 501 and 502) are accessible to developers. On the other hand, layers after services, e.g., layers (i) and (ii) are not prune to manipulation/ hacking. According to the present invention, it is preferable to reside the framework of the present invention after services layer, e.g., a layer (ii) below layer (iii). Moreover, according to a further embodiment, it would be further preferred that the framework should be inside the kernel layer (e.g. layer (i)). However, it can be located anywhere between services and the kernel layer.

Moreover, according to a further preferred embodiment, the framework according to the present invention is preferably not an "application" and therefore does not require SDK (software development kit) of the operating system. In other words, the framework does not require SDK because it is not regarded as an application, but treated as a system level service. The framework can preferably be implemented directly in the kernel using languages like C and C++. Moreover, if the framework is located just after services, then it can also use functions of library like OpenGL.

Moreover, according to a further preferred embodiment the framework of the present invention does not necessarily require to be a separate (i.e. single) abstraction layer. Since it is preferably related to only one hardware feature that is preferably the camera, it is possible that the framework resides within current abstraction layers of the operating system.

Referring again to Figure 3a, the drawing presents a detailed overview of software components according to an embodiment of the present invention. In the present embodiment, the architecture includes the following components: an operating system kernel 505, core libraries 504, a virtual machine (run time libraries) 503, an application framework 502 and one or more applications 501. As indicated above, according to the present invention the device is not restricted to the shown components. It is possible that more or fewer components are used in the invention.

The operating system kernel 505 includes components and drivers to control general system tasks as well as to manage communication between software and hardware components. For instance, the operating system kernel 505 may have: a display driver, a Wi-Fi driver, a camera driver, a power management, a memory driver and/or other drivers.

There are core libraries 504 on top of kernel 505. These libraries comprise instructions to instruct the device to handle data. The core libraries may comprise a couple of modules, such as open-source Web browser engine and SQLite database. The modules will be useful for storage and sharing of application data, libraries to play and record audio and/or video, SSL libraries responsible for Internet security etc. Furthermore, the core libraries include other support libraries to run the algorithms involved in the modules of the framework. Specific algorithms are implemented for face detection, gesture detection and de-identification, which will be described below in great detail.

On the same layer, there may exist a virtual machine and/or runtime libraries 503. It is designed to ensure the independence of individual applications. It further provides the preferred advantage in case of application crashes with such virtual machines construction. In addition, it can be easily ensured that the remaining applications are not affected by any other applications running on the device. In other words, a crashed application does preferably not influence the other running applications.

The virtual machine may also provide a time window to enhance de-identification functionality. The time window is corresponding to certain virtual memory, which is served as a cache for input video streams. The virtual memory can temporarily store the input video for a short period of time (e.g. 30 seconds). In case the person's face disappears from the input stream and reappears shortly afterwards, for example, he/she turns back or moves out of the range of the camera, the de-identification process should not be interrupted if he/she has already made some gestures for de-identification purpose. However, if the person disappears for over the duration of the time window, he/she needs to redo the gestures for de-identification upon reappearance.

The application framework 502 is on the next layer. It contains the programs of device manage basic functions, for example, resource allocation, process switching and physical location tracking, etc. In most cases, application developers should have full control of the application framework 502 so that they take advantage of processing capabilities and support features when building an application. In other words, the application framework can be seen as a set of basic tools used by a developer for building more complex tools or applications.

The application layer 501 is shown in Fig. 3a. This layer consists of applications like camera applications, calculators, image galleries, etc. The user of the device should only interact with applications on this layer.

Figure 4 illustrates the main components of the camera module 212 (Fig. 2) in the device according to an embodiment of the present invention. According to the present embodiment, the camera module includes an optical lens 401, an image sensor technology 402, an image signal processor 403 and a driver 404. The lens 401 is used for taking high resolution photos and/or record high definition videos. An optical image can be converted into an electronic signal with the image and/or video sensor 402. CCD image sensors and CMOS sensors may be used as those used in most digital devices, which perform the task of capturing light and converting it into electrical signals. The scenes (i.e. optical images taken by the sensor) can be better interpreted by communications between image sensors and image signal processors. The image processor 403 is a specialized digital signal processor used for processing images, which is an on chip system of multi-processors or multi-core processors architecture. The driver 404 provides an interface between software libraries and hardware chips.

Referring back to Fig. 2, the I/O subsystem 208 provides an interface between inputs and outputs on the device. According to the embodiment of the present invention, the I/O subsystem includes a voice module 210 which comprises a microphone and/or a voice controller. The voice module 210 provides an input interface between the user and the device, which receives acoustic signals from the user and converts them into electrical signals. The device may be controlled with the voice signal commands. For example, the user of the smart glass can say commands like "okay glass take picture" to take pictures with the device's camera.

In some embodiments, the device may contain a motion module 209 for activating and deactivating different functions. It may comprise a motion detection sensor, for example, a gyro sensor or an accelerometer sensor. User's motion can be translated into commands to control device by this module. Some embodiments may also use the camera module 212 as an input interface, which judges the user's "virtual touching" and converts it into an input signal.

The present invention provides an automated in-device privacy framework for wearable smart devices. In other words, according to the present invention, a device is provided with a plurality of modules constituting the framework which resides inside the device to ensure privacy of people detected by the camera on device. In other words, the term "in-device" should be interpreted as being already built in the device and should not be alterable by software installed on the device. In consequence, privacy of people recognized by the camera of the device can be ensured because of this "in-device" implementation.

As mentioned above, the effectiveness of framework of the present invention depends on its location inside the device. Instead of physical location, the location here preferably refers to the logical location or arrangement in terms of software components or software layers as illustrated in Figs. 3, 5 and 7. In other words, location means the level at which the framework is located as well as the corresponding interfaces. Furthermore, the location of the framework also depends on the architecture of the operating system used inside the device. However, most of the operating systems in such smart devices are software stacks, which share some degree of similarity in the architecture.

For example, Figure 5 shows one preferred location of the framework in case of a smart device that runs Android operating system. Developers should be able to access application framework layer of the device. Furthermore, Fig. 5 shows that the location of the framework (except for the camera module) should preferably be adjacent to the operating system kernel. This location will allow the framework to work in automated way and preferably hide itself, preventing hacking attempts by application developers. In addition, the framework should run automatically at some layer which is not alterable by the user of the smart device.

Hereinafter the modules of the framework of the present invention, which are preferably provided at the layer adjacent to the kernel layer, will be described in detail with reference to Figure 6.

The smart device comprises a buffer 601, a face-detection module 602, a gesture detection module 603, and a de-identification module 604.

The buffer 601 is adapted to store temporarily the video stream taken by the camera module 212. The face detection module 602 is configured to detect a face of a person in the video stream in the buffer 601 by face recognition. The gesture detection module 603 is configured to tracking the person in the video stream stored in the buffer 601 after the face of the person has been detected by the face detection module 602. The gesture detection module 603 is further adapted to determine whether the person has made a predefined gesture by detecting the predefined gesture in the video stream stored in the buffer 601. The de-identification module 604 is configured to de-identify, in the video stream stored in the buffer 601, the face of the person who has made the predefined gesture by removing facial identification information from a video segment which will be taken by the camera module 212 (i.e. future video segment) after the predefined gesture has been detected by the gesture detection module 603. As mentioned above, the "video segment" covers a single photo in the content of the present invention. In this connection, it is to be noted that the claimed "camera" by definition allows a photo to be taken.

In order to perform facial de-identification functionalities, the de-identification module 604 is preferably associated with a library 605. This library 605 comprises specific functions to fulfill de-identification purpose, for instance, to insert mosaic or blur on human faces to the output video stream.

The smart device comprises a storage (not shown) for storing the video stream to which the de-identification has been subjected is outputted. The storage is accessible by the user of the smart device and/or by an application running in the smart device. Alternatively or additionally, the video stream stored in the storage may be supplied to an external device (not shown) which is connected with the smart device. On the contrary, the buffer 601 is preferably provided within an area of the smart device, which is not accessible by the user of the smart device and an application running in the smart device.

Preferably, the smart device may further comprise an object generation module 606 for generating human objects and non-human objects on the input images/videos based on image processing and related techniques. In this embodiment, the face detection module 602 also serves as a human detection module for checking all the human objects and detecting any human face appearing in the objects.

Preferably, the gesture detection module 603 may classify the detected gesture as negative and positive. In this case, the smart device may further comprise a control module 607 for determining whether or not the face of the person should be de-identified according to negative gesture or positive gesture.

The privacy control operation of the framework of the present invention according to an embodiment will be explained with reference to Figure 7.

In this exemplary embodiment, the user of the device uses any input method explained before, to turn the camera module 212 for taking images or a video. The framework receives direct input from the camera module 212 at step 301 , wherein this input can be images and/or a video. Then at step 302, the object generation module 606 generates objects on the input images/video stored in the buffer 601 based on image processing and related techniques. Human objects and non-human objects are generated in this module 606.

Objects generated are passed into the face (human) detection module 602 at step 303. This module 602 checks, in the images/video stored in the buffer 601 , all the human objects and detects any human face appearing in the objects. If no human face is detected (305 in Fig. 7), the process goes to step 312 and the images/video stored in the buffer 601 are directly sent to the output (i.e. to the storage accessible by the user or application). In other words, no privacy control mechanism is applied here if no human is involved. In order to reduce energy consumption, which is limited by small batteries in smart devices such as smart glasses or smart watches, the method preferably works in an idle state if no face is detected in the image, such that a subsequent tracking step is not performed (i.e. only camera and human face detection methods are working while all subsequent methods remain idle).

When some human face is detected (304 in Fig. 7), at step 306 the gesture detection module 603 tracks the persons in the video stored in the buffer 601 and makes a determination as to whether the persons have made predefined gestures by detecting the predefined gestures in the video. The gesture detection module 603 classifies them as positive gestures and negative gestures. A positive gesture (308 in Fig. 7) is a gesture (e.g. nodding) indicating that it is not necessary to remove facial identifying information from the images/video. In contrast, a negative gesture (309 in Fig 7) is a gesture (e.g. waiving his hand, shaking his head) indicating that identifying information must be removed from the images/video. The gesture may be facial expression or pose. Figure 8 illustrates an exemplary embodiment of the gesture detection module 603 in detail.

After gestures have been detected and classified, the control module 607 determines at step 307 whether the images/video should be de-identified according to the type of gestures. If any negative gesture appears, de-identification process should be applied at step 311 for any faces concerned, otherwise no changes are needed (step 310). At step 312 the images/video is finally outputted to the storage accessible by the user or an application.

De-identification means removal of facial identifying information from the images or videos, prior to access/ sharing of data. The preferable goal of the de-identification module 604 is to protect identity and meanwhile preserve utility, e.g. the ability to recognize surroundings of the person from the de-identified images without recognizing his/her identity. It is to be noted that the present invention is applicable to a situation where more than one person need to be de- identified. Therefore, it is preferable that while one person is being de-identified, a gesture of any other person at his surroundings can be detected. For this purpose, it is preferable to minimize a removal area of facial identification information of each person.

Preferably, face de-identification will factorize the face parts into identity and non- identity factors using a generative multi-factor model. De-identification is applied on combined factorized data, and then de-identified images are reconstructed from this data. De-identification is preferably performed on identity factors by taking an average of k inputs. In connection with these techniques, reference is made, for example, to Andrew Senior; "Protecting Privacy in Video Surveillance"; Springer Science & Business Media, 2009. There are many algorithms that provide provable performance and privacy. For example, de-identification is discussed in further details in the following articles: E. Newton, L. Sweeney, and B. Malin; "Preserving privacy by de-identifying facial images"; IEEE Transactions on Knowledge and Data Engineering, 2005. R. Gross, E. Airoldi, B. Malin, and L. Sweeney; "Integrating utility into face de-identification"; in Workshop on Privacy Enhancing Technologies (PET), June 2005. Gross, R., Sweeney, L., dela Torre, F., Baker, S.: "Model-based face de- identification"; Workshop on Privacy Research in Vision. IEEE (2006).

The present invention covers any de-identification method by removing facial identification information. For example, various techniques such as blurring, noise addition, black out may be used, although they are less sophisticated. A facial mask which only covers pixels representing a face or identifiable part of face may be used. Such a facial mask will change its shape as the person's face is moving.

hi an embodiment, the wearable smart device may be equipped with a display for displaying a video stream taken by the camera module 212. The user may want to view a video after taking it. In this embodiment, the library 605 may include some AR (augmented reality) functionality for de-identification in the sense that a virtual image is superposed on a real image (which is stored in the buffer 601 , but not yet de-identified) so that only a superposed image is allowed to appear on the display. For example, as such a virtual image, some mosaic-pattern may be overlaid on a facial region, or ripple may be added for blurring.

One method for an exemplary embodiment of the gesture detection by the gesture detection module is explained in Figure 8 for reference. At step 802 a gesture start detection module detects a (possible) starting point of a gesture (e.g. a hand is located close to a face within a predetermined distance) appearing in an image (i.e. single shot) from the face detection module 602 at step 801. At step 803 a track module then keeps tracking of the gesture in subsequent shots (e.g. covering a face with a hand) and passes it to a match module at step 807. The match module compares the tracked gesture with a gesture database and determines whether the gesture is positive or negative. Afterwards, it sends out a control message indicating that the gesture is positive or negative out to the control module 607.

hi a preferred embodiment, face features obtained by the face recognition at the face detection module 602 may be preserved in a cache for a predetermined period of time (e.g. in the range from a few seconds to one minute) after it has been detennined that the face of the person disappears in the video stream. The de-identifying step restarts when it has been determined that the face of the person reappears in the video stream before the period of time has elapsed.

Such a cache 608 (Fig. 6) is preferably provided at a layer adjacent to the kernel layer and not accessible by the user of the smart device and an application running in the smart device.

In a preferred embodiment, the de-identifying step may be retroactively performed on a video segment stored in the buffer 601 between a time when the face has been detected and a time when the predefined gesture has been detected. This is because the de-identification starts from a time when the gesture has been detected, and the face of the person will remain identifiable before the person has made a gesture.

While the present invention has been described in connection with certain preferred embodiments, it is to be understood that the subject-matter encompassed by the present invention is not limited to those specific embodiments. On the contrary, it is intended to include any alternatives and modifications within the scope of the appended claims.

Claims

1. A method for preserving privacy of a person visible in a camera module of a wearable smart device, such as smart glass, the method being performed within the smart device, the method comprising the steps of:

taking a video stream by the camera module;

storing the video stream in a buffer within the smart device;

detecting a face of the person in the video stream stored in the buffer by face recognition;

tracking, after the face-detecting step, the person in the video stream stored in the buffer, and detennining whether the person has made a predefined gesture by detecting the predefined gesture in the video stream stored in the buffer; and

de-identifying, in the video stream stored in the buffer, the face of the person who has made the predefined gesture by removing facial identification information from a video segment which will be taken by the camera module after the predefined gesture has been detected at the determining step.

2. The method of claim 1 , further comprising the step of:

storing the video stream in which the face of the person has been de-identified at the de-identifying step in a storage within the smart device, for access by a user of the smart device, for access by an application running in the smart device and/or for supply to an external device which is connected with the smart device.

3. The method of claim 1 or 2, wherein face features obtained by the face recognition at the face detecting step are preserved in a cache for a predetermined period of time after it has been determined that the face of the person disappears in the video stream, and

the de-identifying step restarts when it has been determined that the face of the person reappears in the video stream before the period of time has elapsed.

4. The method of any one of claims 1 to 3, wherein the de-identifying step is retroactively performed on a video segment stored in the buffer between a time when the face has been detected and a time when the predefined gesture has been detected.

5. The method of any one of claims 1 to 4, wherein the smart device comprises an operating system kernel layer and a software stack sitting on the kernel layer, and the method is implemented in a layer that is directly above the kernel layer and is not accessible by applications running in the smart device.

6. A wearable smart device, such as smart glass, comprising:

a camera module for taking a video stream;

a buffer for temporarily storing the video stream;

a de-identification module for de-identifying, in the video stream stored in the buffer, the face of the person who has made the predefined gesture by removing facial identification information from a video segment which will be taken by the camera module after the predefined gesture has been detected by the gesture detection module.

7. The smart device of claim 6, further comprising a storage for storing the video stream in which the face of the person has been de-identified by the de-identification module, for access by a user of the smart device, for access by an application running in the smart device and/or for supply to an external device which is connected with the smart device.

8. The smart device of claim 6 or 7, further comprising a cache for preserving face features obtained by the face recognition by the face detection module for a predetermined period of time after the face detection module determines that the face of the person is disappears in the video stream, and

the de-identification module is adapted to restart the de-identifying operation when the face detection module determines that the face of the person reappears in the video stream before the period of time has elapsed.

9. The smart device of any one of claims 6 to 8, wherein the de-identification module is adapted to retroactively perform the de-identifying operation on a video segment stored in the buffer between a time when the face has been detected and a time when the predefined gesture has been detected.

10. The smart device of claims 6 to 9, wherein the smart device comprises an operating system kernel layer and a software stack sitting on the kernel layer, and the buffer, the face detection module, the gesture detection module, and the de-identification module are located on a layer directly above the kernel layer and are not accessible by applications running in the smart device.

1 1. A computer program that causes, when run on a computer, the computer to execute the method of any one of claims 1 to 5.