WO2024041714A1 - Sensor data capturing arrangement and a method for capturing sensor data - Google Patents

Sensor data capturing arrangement and a method for capturing sensor data Download PDF

Info

Publication number
WO2024041714A1
WO2024041714A1 PCT/EP2022/073287 EP2022073287W WO2024041714A1 WO 2024041714 A1 WO2024041714 A1 WO 2024041714A1 EP 2022073287 W EP2022073287 W EP 2022073287W WO 2024041714 A1 WO2024041714 A1 WO 2024041714A1
Authority
WO
WIPO (PCT)
Prior art keywords
sensor data
stream
circuitry
image
person
Prior art date
Application number
PCT/EP2022/073287
Other languages
French (fr)
Inventor
Fredrik Dahlgren
Alexander Hunt
Håkan ENGLUND
Saeed BASTANI
Michael Björn
Andreas Kristensson
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2022/073287 priority Critical patent/WO2024041714A1/en
Publication of WO2024041714A1 publication Critical patent/WO2024041714A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • the present invention relates to a sensor data capturing arrangement, a method for capturing sensor data, a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of the capturing arrangement enable the capturing arrangement to implement the method. Moreover, it relates to a software component arrangement for use in the sensor data capturing arrangement. All these aspects are related to a system supported user selection of persons to include in captured images.
  • the sensor data capturing arrangement comprises: receiving circuitry configured to receive a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; object detection circuitry configured to detect an object in the stream of sensor data based on contextual information; capturing circuitry configured to capture an instance of the stream of sensor data; processing circuitry configured to replace the object with a replacement object in the captured instance of the stream and then to store the captured instance.
  • the sensor data capturing arrangement further comprises a controller, wherein the controller comprises at least one of the receiving circuitry, the object detection circuitry, the capturing circuitry and the processing circuitry.
  • the sensor data capturing arrangement further comprises a memory, wherein the processing circuitry is further configured to store the captured data in the memory.
  • the stream of sensor data is a continuous stream of sensor data.
  • the at least one sensor is an image sensor
  • the stream of sensor data comprises a preview, image stream and a full-resolution image stream
  • the preview image stream having an image resolution/frame rate lower than the maximum resolution/frame rate of the sensor
  • the full-resolution image stream having an image resolution/frame rate equal to (or lower than, but higher than the low resolution) the maximum resolution of the sensor
  • the object comprises a person
  • the sensor data capturing arrangement comprises an image capturing arrangement
  • the object detection circuitry is further configured to detect the person based on contextual information by: performing object detection on the low-resolution preview image stream; determining a context (FE, P, dl) of the person to be detected; identifying one or more other persons; determine a context of the one or more other persons; determine that at least one aspect of the context of the person is different from the context of the one or more other persons, and wherein the processing circuitry is further configured to replace the object in the full-resolution image stream and to store the instance of the fullresolution image stream
  • the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining a pose of the person and determining if the pose is different from poses of the one or more other persons.
  • the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining a facial expression of the person and determining if the facial expression is different from other facial expressions of the one or more other persons.
  • the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining an identity of the person and determining that the identity is unassociated with identities of the one or more other persons and/or of the owner of the image capturing device. In one embodiment the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by: determining a first distance between the persons and the other persons; determining a second distance between the one or more other persons; and determining that the first distance exceeds the second distance.
  • processing circuitry is further configured to generate a replacement person and wherein the replacement object comprises the replacement person.
  • processing circuitry is further configured to generate the replacement person by retrieving a person from a stored person image.
  • processing circuitry is further configured to generate a replacement face and wherein the replacement object comprises the replacement face.
  • processing circuitry is further configured to generate the replacement face by utilizing a Generative Adversarial Network, GAN.
  • GAN Generative Adversarial Network
  • processing circuitry is further configured to generate the replacement face by retrieving a face from a stored face image.
  • processing circuitry is further configured to generate an image of a physical object and wherein the replacement object comprises the image of a physical object wherein the physical object has a geographic location that corresponds to a geographic location of the image capturing device.
  • processing circuitry is further configured to generate an image of a background and wherein the replacement object comprises the image of the background wherein the background is an estimate of the background behind the face to be replaced.
  • processing circuitry is further configured to provide a marking of the detected object, provide a candidate for a replacement object, and to receive user input indicating an acceptance of the candidate as the replacement object.
  • processing circuitry is further configured to receive user input indicating a request for a further candidate, and in response thereto provide a further candidate.
  • the image capturing arrangement further comprises a user interface and the object detection circuitry is further configured to receive user input via the user interface indicating an area and to perform object detection in the indicated area in order to detect further objects.
  • the image capturing arrangement comprises a telecommunications User Equipment (UE).
  • UE User Equipment
  • the image capturing arrangement comprises a smart phone or a tablet computer.
  • a method for capturing sensor data comprises: receiving a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; detecting an object in the stream of sensor data; capturing an instance of the stream of sensor data; replacing the object with a replacement object in the captured instance of the stream; and then storing the captured instance, wherein the method further comprises detecting the object based on contextual information.
  • a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of a capturing arrangement enables the capturing arrangement to implement the method according to the teachings herein.
  • a software component arrangement for use in a sensor data capturing arrangement, wherein the software component arrangement comprises: software code for receiving a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; software code for detecting an object in the stream of sensor data; software code for capturing an instance of the stream of sensor data; software code for replacing the object with a replacement object in the captured instance of the stream; and software code for storing the captured instance after the object has been replaced, and software code for detecting the object based on contextual information.
  • a sensor data capturing arrangement comprising: circuitry for receiving a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; circuitry for detecting an object in the stream of sensor data; circuitry for capturing an instance of the stream of sensor data; circuitry for replacing the object with a replacement object in the captured instance of the stream; and circuitry for storing the captured instance after the object has been replaced, and circuitry for detecting the object based on contextual information.
  • the proposed solution thus enables for supporting a user for example taking a photo and quickly selecting which people to include in said photo using a combination of contextual information (possibly including location and/or historical data), and user input selection on a device screen, by providing a preprocessing arrangement and method for enhancing privacy aspects of collection of information, such as in photos or video recordings, before they are composed into storable or transmittable data objects such as a photos or video files.
  • the prior art discloses different techniques to identify faces or people, and also techniques to segment and remove, blur, or grey out selected people so such techniques are considered known and will not be discussed in detail herein.
  • the prior art does not disclose how to select the objects based on multi-contextual data and to do so already before encoding, storing or communicating such data beyond the local subsystem.
  • One example of the subsystem may be the signal processor processing the received sensor data or the combination of sensor and signal processor processing the received sensor data.
  • the local subsystem may be the chipset controlling image acquisition.
  • the proposed solution may utilizat a combination of contextual information such as location, historical data, friend data, consent information, etc., to propose to user - for example taking a photo - which objects (people) to include and which objects (people) to remove in said photo before said photo or video is encoded and compressed and either stored, possibly to be shared with other users or systems outside the processing subsystem of the apparatus, for example a mobile device.
  • contextual information such as location, historical data, friend data, consent information, etc.
  • Figure 1A shows a schematic view of a sensor data capturing arrangement according to some embodiments of the teachings herein;
  • Figure IB shows a schematic view of a sensor data capturing arrangement according to some embodiments of the teachings herein;
  • Figure 2 shows a schematic view of components and modules of a sensor data capturing arrangement according to some embodiments of the teachings herein;
  • Figures 3A to 3K each shows a schematic view of a sensor data capturing arrangement where objects are detected or replaced according to some embodiments of the teachings herein;
  • FIG. 4 shows a flowchart of a general method according to some embodiments of the teachings herein;
  • Figure 5 shows a schematic view of a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of an arrangement enables the arrangement to implement some embodiments of the teachings herein;
  • Figure 6 shows a component view for a software component arrangement according to some embodiments of the teachings herein.
  • Figure 7 shows a component view for an arrangement comprising circuits according to some embodiments of the teachings herein.
  • Figure 1A shows a schematic view of a sensor data capturing arrangement 100 according to some embodiments of the teachings herein. It should be noted that the sensor data capturing arrangement 100 may comprise a single device or may be distributed across several devices and apparatuses. Also, the following embodiments described herein are non-limiting and for illustration purposes only.
  • the sensor data capturing arrangement 100 comprises or is operably connected to a controller 101 and a memory 102.
  • the controller 101 is configured to control the overall operation of the sensor data capturing arrangement 100.
  • the controller 101 comprises a combination of circuits that enables the controller 101 to control the operation of the sensor data capturing arrangement 100.
  • the sensor data capturing arrangement 100 comprises a circuit 101A for receiving sensor data 103A.
  • Such a circuit 101A may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101.
  • the sensor data capturing arrangement 100 comprises a circuit for detecting objects 101B. Such a circuit may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101.
  • the sensor data capturing arrangement 100 comprises a circuit for capturing data 101C.
  • Such a circuit may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101.
  • the sensor data capturing arrangement 100 comprises a circuit for processing data 101D.
  • Such a circuit may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101.
  • the memory 102 is configured to store data, such as sensor data, and computer-readable instructions that when loaded into the controller 101 indicate how the sensor data capturing arrangement 100 is to be controlled.
  • the memory 102 may comprise several memory units or devices, but they will be perceived as being part of the same overall memory 102.
  • a general memory 102 for the sensor data capturing arrangement 100 is therefore seen to comprise any and all such memory units for the purpose of this application.
  • there are many alternatives of how to implement a memory for example using non-volatile memory circuits, such as EEPROM memory circuits, or using volatile memory circuits, such as RAM memory circuits.
  • the memory 102 may also be external to the sensor data capturing arrangement 100, such as an external physical memory in the form of an external hard drive (NVM, SSD, or disk-based) or in the form of a cloud storage solution. For the purpose of this application all such alternatives will be referred to simply as the memory 102.
  • the sensor data capturing arrangement 100 also comprises one or more sensors 103.
  • at least one sensor 103 is an image sensor (possibly comprised in a camera module).
  • at least one sensor 103 is an audio sensor for recording sounds, such as voice input.
  • at least one sensor 103 is a biometric sensor for capturing biometric data such as for example retina scans, fingerprint scans or other biometric data.
  • at least one sensor 103 is a tactile sensor for capturing tactile or haptic input.
  • the one or more sensors 103 are configured to receive at least one stream of sensor data.
  • the sensor data capturing arrangement 100 comprises circuitry for receiving such data stream(s).
  • the stream of sensor data is a continuous stream of data received (directly) from the corresponding sensor, such as raw sensor data.
  • Raw sensor data is the unprocessed data received directly from the sensor.
  • the stream of sensor data is a regular stream of processed data from the corresponding sensor.
  • the sensor data capturing arrangement is an image data capturing arrangement
  • the stream of sensor data is the image sensor data provided (at lower resolution) to the viewfinder. Upon capturing, a full resolution image is captured and stored.
  • the stream of sensor data comprises a first stream being of a lower quality or resolution and a second stream being of a higher quality or resolution up to the maximum resolution of the sensor 103.
  • the stream of lower quality or resolution may simply offer lower resolution than the maximum resolution of the sensor 103.
  • the first stream may be utilized for preprocessing or previewing of the data and the second stream may be utilized for capturing and final processing of the data. This enables for a preprocessing of the data that requires less computational resources (than processing the data at full quality(/resolution) while still capturing the data at a high quality(/resolution).
  • the data stream received is from an image sensor 103, in which example the first stream is a preview stream and the second stream is the full resolution stream.
  • the preview stream is the stream of a viewfinder pipeline.
  • full resolution does not necessarily relate to the maximum resolution that an image sensor is capable of but rather the full resolution, i.e. highest resolution set to be used by the system being the resolution that a capture should be made at.
  • a low or high frame rate could be used instead of or in addition to the low or high resolution.
  • a low frame rate thus being lower than the high frame rate, and the high frame rate being the frame rate that a capture should be made at.
  • the resolution is defined as pixels per inches (dpi), pixels in a vertical alignment and pixels in a horizontal alignment or simply the number of pixels.
  • a low resolution would generally be of 230,000 to 920,000 dots (less than a megapixel) and a high resolution would be several megapixels, for example 8, 10, 12 or even more.
  • a high resolution may also refer to the maximum resolution of the sensor. The maximum resolution can be the highest resolution that the sensor is designed to operate at or the highest resolution that the sensor is set to operate at through (user) settings.
  • the sensor data capturing arrangement 100 also comprises circuitry for detecting objects in the data stream. As is also discussed in the above, the sensor data capturing arrangement 100 also comprises circuitry for capturing an instance of the data stream, whereby the data at a given time is captured and stored. In embodiments where the data stream comprises a first and a second stream, the circuitry for detecting objects is configured to detect the object(s) in the first stream prior to a selection of which object is to be replaced has been made and then to detect the object in the second stream for replacing the object prior to capturing a instance of the data stream as per the second stream where the object to be replaced has been replaced. As is also discussed above, the sensor data capturing arrangement 100 also comprises circuitry for processing data in the data stream, and more specifically for processing the detected objects to select which object should be replaced and to generate a replacement object to replace the object in the captured data stream 103C.
  • the sensor data capturing arrangement 100 also comprises a user interface 110, for receiving user input and for providing information to the user. In some embodiments, as is indicated in figure 1A through the dashed lines, the sensor data capturing arrangement 100 is operably connected to such a user interface 110.
  • the sensor data capturing arrangement 100 also comprises a communications interface 104 for communicating with other arrangements 100, sensors
  • the sensor data capturing arrangement 100 is operably connected to such a communications interface 110.
  • the communications interface 104 comprises a radio frequency (RF) communications interface.
  • the communication interface 104 comprises a BluetoothTM interface, a WiFiTM interface, a ZigBeeTM interface, an RFIDTM (Radio Frequency I Dentifier) interface, Wireless Display (WiDi) interface, Miracast interface, and/or other RF interface commonly used for short range RF communication.
  • the communication interface 104 comprises a cellular communications interface such as a fifth generation (5G) cellular communication interface, an LTE (Long Term Evolution) interface, a GSM (Global System for Mobile Communications) interface and/or other interface commonly used for cellular communication.
  • the communications interface 104 is configured to communicate using the UPnP (Universal Plug n Play) protocol.
  • 104 is configured to communicate using the DLNA (Digital Living Network Appliance) protocol.
  • DLNA Digital Living Network Appliance
  • the communications interface 104 comprises a wired interface.
  • the communication interface 104 comprises a USB (Universal Serial Bus) interface.
  • the communication interface 104 comprises a HDMI (High Definition Multimedia Interface) interface, a Display Port interface, an Ethernet interface, a MIPI (Mobile Industry Processor Interface) interface, an analog interface, a CAN (Controller Area Network) bus interface, an I2C (Inter-Integrated Circuit) interface, or other interface.
  • Figure IB shows a schematic view of a sensor data capturing arrangement 100 as in figure 1A according to some embodiments of the teachings herein.
  • the sensor data capturing arrangement 100 is a telecommunications User Equipment.
  • the sensor data capturing arrangement 100 is a smartphone.
  • the sensor data capturing arrangement 100 is a tablet computer or simply a tablet.
  • the sensor data capturing embodiment 100 also comprises a user interface 110 comprising a display 110 and one or more buttons 110A.
  • the display 110 is a touch display
  • the at least one, some or all of the one or more buttons 110A are virtual buttons implemented through the touch display 110.
  • the one or more sensors 103 includes an image sensor 103 possibly as part of a camera module for receiving and capturing image data.
  • the one or more sensors 103 also include (at least in some embodiments) a microphone for receiving and recording audio data (such as voice data).
  • the sensor data capturing embodiment 100 is also referred to as an image capturing arrangement 100.
  • Figure 2 shows a schematic view of how a sensor data stream 103A is received and generally processed by modules in a sensor data capturing embodiment.
  • a sensor provides a sensor data stream 103A to a processing module 210.
  • the processing module 210 may be implemented by or as part of the controller 101 or specific circuitry discussed in the above for a sensor data capturing embodiment 100.
  • the sensor data stream 103A comprises a first stream 103A-1 and a second stream 103A-2.
  • the processing module 210 is configured to pre-process 210-1 the data stream 103A, wherein - in some embodiments - the first stream 103A-1 is rendered to be displayed as a preview to a user, for example on the display 110.
  • the processing is usually performed already on the chipset or other component of the circuitry receiving the sensor data stream 103A; i.e. on the signal processor for general sensor data and in the image processor for image sensor data.
  • At least one object to be replaced is detected during the preprocessing by the circuitry for detecting object detection as discussed in the above.
  • the object(s) to be replaced are indicated in the data stream 103A, whereby an altered data stream 103B is provided.
  • the data stream 103A comprises a first and a second stream
  • the first stream 103A-1 is altered as is indicated in figure 2. This allows for a preview to show the alteration of the stream prior to capturing the data.
  • the processing module 210 is also configured to capture 210-2 an instance of the data stream 103A, whereby the sensor data 103A at that time instance is captured as a stored capture 103C.
  • the capture is of the second (full resolution) data stream 103A-2 as is indicated in figure 2. Neither of the original data in the first stream or in the altered data stream 103B are thus captured nor stored.
  • the processing module 210 may also be configured to post process 210-3 the captured data, such as being configured to apply various filters or compressing the image.
  • the processing nodule 210 thus receives a sensor data stream 103A from a sensor 103 and provides a capture of the sensor data 103D.
  • the sensor data stream 103A is first stored after the capture is executed. And, as the inventors have realized, by detecting objects to be replaced already as part of or in connection to the pre-processing of the sensor data stream 103A, and to also replace them as part of or in connection to the pre-processing a copy of the original data stream will not be captured or otherwise stored and can thus not be misused or abused.
  • the inventors are therefore proposing to provide a sensor data capturing arrangement 100 configured to receive a sensor data stream 103A, to detect objects in the stream, capture the stream and replace the object(s) in the capture before the capture is stored. Specifically, the detection of objects and the determination of replacement objects are made or performed already as part of the pre-processing and, in embodiments with a first and second data stream, based on the first stream, whereby the capture is of the second stream.
  • a sensor data capturing arrangement 100 comprising receiving circuitry 101A configured to receive a stream of sensor data 103A from at least one sensor 103 arranged to provide the stream of sensor data 103A.
  • the sensor data capturing arrangement 100 also comprises object detection circuitry 101B configured to detect an object in the stream of sensor data 103 based on contextual information and capturing circuitry configured (101C) to capture an instance of the stream of sensor data.
  • the sensor data capturing arrangement 100 also comprises processing circuitry 101D configured to replace the object with a replacement object in the captured instance of the stream and then to store the captured instance.
  • teachings herein can also apply to other (biometric) information such as sound recordings before being composed into storable or transmittable audio recordings. Further embodiments also include composite biometric information such as photo or video including smell or tactile information where all or selected parts of the composite information about objects can be anonymized or removed.
  • Figure 3A shows a schematic view of an example of an image capturing arrangement of figure IB, such as a smartphone, where graphical information 300 representing an image stream 103A received from a camera 103 is displayed on the display 110.
  • the image stream 103A comprises a preview stream (first stream) and a full resolution stream (second stream), wherein the graphical representation is of the preview stream.
  • the image stream comprises three objects 310, which in the example of figure 3A are three persons, represented by the three faces being displayed on the display 110 in the example of figure 3A.
  • Each person 310 exhibits one or more visible traits, for example a pose P and/or a posture, such as a body posture, a gesture or a facial expression FE.
  • these traits are exemplified as a pose P and a facial expression FE.
  • the pose P which includes a position and a general direction of interest, is indicated by an arrow indicating a line of sight for the faces.
  • a facial expression FE is also shown for each face 310.
  • the pose P may be determined based on the direction of the line of sight or the direction of the eyes.
  • the pose may alternatively or additionally be determined based on (the direction of) the nose or other facial feature.
  • a facial expression may be determined through smile detection.
  • a body posture may be detected signaling an emotion. For example a closed fist would signal anger.
  • contextual information related the geographic location where the photo is taken is used to adjust the probability of people in said photo belonging to the same group and thus help the system deduce which people/faces to keep and which to replace.
  • Photos available publicly and/or from friends/Facebook/etc. from same geographic location could be used as input to said system.
  • Existing scene classification algorithms based on vision and acoustic can be used to determine the context and then use in a rule-based system for additional behavior, such as replacing.
  • the image capturing arrangement 100 is configured through the circuitry for object detection 101B to detect such objects 310 in the (preview) image stream.
  • the circuitry for object detection 101B is specifically configured to detect at least one object to be replaced 310-1 based on the contextual information.
  • Figure 3B shows an example where an object to be replaced 310-1 has been detected.
  • the object is a person represented by a face.
  • the object may also be only a face.
  • the object to be replaced has been detected based on the contextual information related to the identity of the object 310-1.
  • an object may be detected to be an object to be replaced based on an identity of the object, wherein the identity is a blocked identity.
  • identities are identities that have been blocked in a contact application or in a social media application.
  • an object may be detected to be an object to be replaced based on the identity of the object, wherein the identity is an identity that is not associated with identities of other objects 310-2 in the image stream.
  • identities is where the person to be replaced 310-1 is not associated (friends) with the other persons 310-2 in the preview, for example based on a social media app.
  • identities is where the identity of the person to be replaced 310-1 is not present in a contact list stored in the memory 102 of the smartphone 100.
  • the circuitry for detecting objects 101B is thus also, in some embodiments, configured to determine an identity associated with the detected object. Alternatively, such determination of identity may be performed by another circuitry, such as the processing circuitry 101D or the controller 101. In some embodiments the identity may be determined based on facial recognition.
  • the image capturing device is further configured to indicate which object 310-1 that is detected is to be replaced, for example through a graphic indication.
  • the person to be replaced 310-1 is indicated by a graphical object (in this example a frame) 315 being displayed around or on top of the person to be replaced 310-1. This allows for a user to quickly see and ascertain which person is to be replaced.
  • the other detected objects are indicated by a graphical indication to indicate to a user that these objects have been detected but are not proposed to be replaced.
  • the graphical indication for indicating an object to be replaced is displayed differently (for example in one color) from the graphical indication for the objects detected, but not proposed to be replaced (for example in a different color).
  • undetected objects are indicated by a graphical indication to indicate to a user that these objects have not been successfully detected or categorized and are unknown.
  • the graphical indication for indicating an unknown object is displayed differently (for example in one color) to the graphical indication for the objects detected.
  • unknown objects are automatically determined to be objects to be replaced 310-1.
  • unknown objects are automatically determined to be other objects 310-2 (i.e. to base the context upon). In some embodiments, this automatic determination is based on user settings.
  • the determination or a change of an automatic determination may be changed by a user and in such embodiments, the image capturing arrangement 100 is further configured to receive user input indicating such an object.
  • Figure 3C shows an example where a person (being an example of an object) to be replaced 310-1 has been detected.
  • the person to be replaced has been detected based on the contextual information that the pose of the person to be replaced is different from the other persons 310-2 in the preview.
  • the person to be replaced 310- 1 is looking to the side, while the other persons 310-2 are looking straight into the camera (indicated by the arrows pointing downwards).
  • the image capturing arrangement 100 is configured to determine a pose of each detected object, to determine if the majority (two or more) of the objects have a similar pose and if at least one object has a different pose, and if that is the case, detect the object(s) with the different pose as the object to be replaced 310-1.
  • the circuitry for detecting objects 101B is thus also, in some embodiments, configured to determine a pose associated with an object. Alternatively, such determination of pose may be performed by another circuitry, such as the processing circuitry 101D or the controller 101.
  • Figure 3D shows an example where a person (being an example of an object) to be replaced 310-1 has been detected.
  • the person to be replaced has been detected based on the contextual information that the posture of the person to be replaced is different from the other persons 310-2 in the preview.
  • the posture of a person is in some embodiments a body posture, a gesture and/or a facial expression.
  • the person to be replaced 310-1 is presenting an unhappy facial expression, while the other persons 310-2 are presenting happy facial expressions (they are smiling).
  • the circuitry for object detection is thus further configured to perform facial recognition to determine facial expressions.
  • the person to be replaced 310-1 is presenting a gesture.
  • the object presenting a rude or otherwise unallowed gesture is detected to be replaced.
  • the circuitry for object detection 101B is further configured to detect gestures, possibly based on image analysis by comparing them to an image library of known gestures, wherein some gestures are indicated to be unallowed.
  • the image library is stored locally in the memory 102. In some embodiments the image library is stored on a portion of the memory being dedicated to or even comprised in the circuitry for object detection. And, in some embodiments the image library is stored remotely, for example on a server, and accessed through the communication interface 104.
  • the circuitry for object detection 101B is further configured to determine that the other persons 310-2 are not making similar gestures and if that is the case, detect the person making the gesture as the person to be replaced 310-1. This allows for a group photo where the whole group presents rude gestures to be unaltered, while situations where only a by-passer is presenting the gesture is altered.
  • the person to be replaced 310-1 is exhibiting a body posture (for example standing), while the other persons 310-2 are exhibiting a different body posture (for example laying down).
  • the circuitry for object detection is thus further configured to perform body posture detection.
  • the image capturing arrangement 100 is configured to determine a posture of each detected object, to determine if the other objects 310-2 have a similar posture and if at least one object has a different posture, and if so, detect the object(s) with the different posture as the object to be replaced 310-1.
  • Figure 3E shows an example where a person (being an example of an object) to be replaced 310-1 has been detected.
  • the person to be replaced has been detected based on the contextual information that the geographic location of the person to be replaced is different from the other persons 310-2 in the preview. That the geographic location of the person to be replaced is different from the others may in some embodiments be determined based on a difference in distance between objects/persons.
  • the person to be replaced 310-1 is at a distance dl from the other persons 310-2, while the other persons 310-2 are at a distance d2 from one another.
  • the distance d2 between the other persons 310-2 is determined based on an average of the distances between the other persons 310-2.
  • the distance d2 between the other persons 310-2 is determined based on a median of the distances between the other persons 310-2.
  • the person to be replaced 310-1 is at a different geographic location if the distance dl exceeds a threshold distance.
  • the threshold distance is based on the distance d2 between the other persons, wherein the threshold distance is a factor (for example 1.5, 2, 3, 4, 5) of the distance d2 between the other persons 310-2.
  • the circuitry for object detection is further configured to determine geographic locations of objects.
  • the geographic location may be determined by detecting objects corresponding to landmark and identifying these landmarks.
  • the geographic location may be determined by retrieving a location from a location sensor such as a GPS (Global Positioning System) device.
  • GPS Global Positioning System
  • Figure 3F shows an example where a person (being an example of an object) to be replaced 310-1 has been detected.
  • the person to be replaced has been detected based on the contextual information that the movement (speed and/or direction) of the person to be replaced is different from movement of the other persons 310-2 in the preview.
  • the person to be replaced is moving fast to the left of the image, whereas the other persons 310-2 are moving slowly to the right. The movements are thus different.
  • the circuitry for object detection 101B is further configured to determine movements of objects, such as by tracking an object and determining a difference in location in the image at different (subsequent) times.
  • the other persons 310-2 are in some embodiments defined as the majority of detected persons.
  • the other persons 310-2 are in some embodiments defined as detected persons having associated identities.
  • the other persons 310-2 are in some embodiments defined as persons exhibiting similar poses, or postures.
  • the other persons 310-2 are in some embodiments defined as persons being at a location close to one another, for example less than 1.5 times the distance d2.
  • the other persons 310-2 are in some embodiments defined as persons exhibiting a similar movement.
  • the contextual information is related to the geographic location of the sensor 103.
  • the circuitry for object detection 101B is further configured to determine that if the geographic location of the sensor (i.e. where the image stream is received) is a specific geographic location, then specific determinations for the context are to be applied. For example, if the geographic location is a geographic location marked as sensitive, all persons with clearly recognizable faces are to be replaced.
  • the contextual information is related to the time of the data stream.
  • the circuitry for object detection 101B is further configured to determine that if the time is a specific time, then specific determinations should be applied.
  • the context of a geographic location and a time may infer specific contextual determinations such as that any stream received at a specific geographic location at a specific time is subjected to specific contextual determinations.
  • specific contextual determinations such as that any stream received at a specific geographic location at a specific time is subjected to specific contextual determinations.
  • One example is where it is forbidden to take photos of license plates of cars at night, whereby all license plates are removed or blurred in images taken during night hours in such streets.
  • Another example is where it is not allowed to take photos identifying children at schools during school hours, whereby all faces are blurred or replaced by anonymous faces from images taken during such hours.
  • objects with identities may be logos and/or textual identities.
  • a logo such as a trademark, may easily be detected and identified by the circuitry for object detection.
  • any textual information such as a name, may also be easily detected and identified.
  • a logo or textual information may also easily be replaced.
  • the image capturing arrangement 100 may also learn and use information such as geographic location, social circles etc. to adapt the contextual determinations that determine which objects to be replaced.
  • Figure 3G shows an example situation where an object, in this example a person represented by a face, is marked as being an object to be replaced 310-1.
  • the marking is achieved by displaying a graphical indication 315 marking the object 310-1.
  • the graphical indication 315 is a frame encompassing the face 310-1.
  • the graphical indication 315 is in a color that contrasts with the background.
  • the graphical indication 315 is in a color that indicates the context of the object, indicating to a user why the object is to be replaced.
  • a first color can be used to indicate a first context (for example a blocked identity) and a second color (yellow) can be used to indicate a second context (for example an unknown identity).
  • the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating an acceptance of an object to be replaced.
  • the input may be received as a double tap on the object 310-1 or anywhere else inside the marking 315.
  • the input may be received as a press on a softkey or other key indicated to be for acceptance.
  • a softkey is a key whose functions differ depending on the current execution state and whose function may be indicated by being displayed on the display 110.
  • the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating a detected object 310-1 to be an object to be replaced.
  • the input may be received as a long press or double tap on the object 310-2 or anywhere else inside its marking (if a marking is displayed).
  • the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating an undetected object to be an object to be replaced or to be an object to be kept (i.e. another object).
  • the input may be received as a double tap or a long press on the object 310 or anywhere else inside its marking (if a marking is displayed).
  • all proposals are accepted by receiving a command to execute the capture, i.e. to take the picture.
  • the graphical marking 315 is displayed differently in response to receiving an acceptance.
  • a graphical frame 315 being displayed as dashed lines for a proposed object may be changed to solid lines for an accepted object.
  • the color of the marking may be changed as the object is accepted.
  • the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating a rejection or cancellation of an object to be replaced.
  • the input may be received as a long tap on the object 310-1 or anywhere else inside the marking 315.
  • the input may be received as a press on a cancellation or clear key.
  • the input may be received as a press on a softkey 110A being indicated to be for cancellation.
  • the graphical marking 315 is no longer displayed or removed in response to receiving a cancellation.
  • the object to be replaced 310-1 is to be replaced by a replacement object (referenced 310-R in figure 3H), and in some embodiments, the object to be replaced 310-1 or rather a proposed object to be replaced 310-1 is replaced already in the preview to indicate to the user what the final image will look like. Alternatively, the proposed object to be replaced 310-1 may be replaced first upon receiving an acceptance of the replacement object 310-R.
  • the image capture arrangement 100 is further configured to receive user input through the user interface 110 requesting a (next or further) proposed replacement object (referenced 310-R in figure 3H) to be displayed.
  • the input may be received as a single tap on the object 310-1 or anywhere else inside the marking 315, whereby a (next) proposed replacement object (referenced 310-R in figure 3H) is displayed for later acceptance.
  • the input may be received as a press on a navigation (arrow) key.
  • the input may be received as a press on a softkey 110A being indicated to be for proposing a next replacement object (referenced 310-R in figure 3 H).
  • a replacement object (referenced 310-R in figure 3H) is displayed instead of the original object, in case the original object is displayed, and a further or second replacement object (referenced 310-R in figure 3 H ) is displayed instead of the first replacement object (referenced 310-R in figure 3H).
  • Figure 3H shows an example of a replacement object 310-R to be used to replace the object to be replaced 310-1.
  • an alternative or modified object in this example, a modified face 310-R, is shown. This is indicated by the replacement face 310-R being (slightly) different from the original face 310-1 of figure 3G. In figure 3H this is indicated by the face having a different nose which indicates a different pose.
  • the replacement face is an autogenerated face that in some embodiments is generated utilizing a generative adversarial network, GAN for example using StyleGAN.
  • the replacement face is a replacement face retrieved from an image library stored either locally or remotely.
  • Figure 31 shows an example of an alternative where the replacement object - in this example the face - is a blurred object - in this case a blurring of the face.
  • Figure 3J shows an example of an alternative where the replacement object - in this example the face - is a deletion of the object - in this case a deletion or blocking of the face.
  • the deletion may be achieved by overlaying the face with a different object, such as an estimation or generation of a continuation of the background.
  • the deletion may alternatively be achieved by overlaying the face (the object) with an empty space.
  • Figure 3K shows an example of an alternative where the replacement object - in this example the face - is an alternative object.
  • the alternative object is in some embodiments selected from an image library to be an object common to the context of the photograph.
  • the context of the photo may be based on the geographic location such as being in a forest wherein an object commonly found in a forest is inserted to replace the face, in the example of figure 3K, the replacement object is a plant.
  • the context of the photo may be based on an analysis of the background of the image wherein an object similar to other objects in the background of the objects is selected from an image library. Alternatively a copy of an object present in the image may be used as the replacement object, whereby a copy of for example a plant may be used instead of the face to be replaced.
  • the marking may display numbers, letters or any character, where the character is associated with a command (scroll, accept, decline, change) and the user inputs the command by simply pressing the key for the corresponding character.
  • a different object detection algorithm is used in the indicated area to allow for a different object detection to be performed.
  • a clustering algorithm (such as K-means) can be used to create groups of objects to be kept or replaced.
  • the image or rather a representation of the image such as bag of Visual Words
  • cluster centers to find if this image matches a group.
  • a neural network is utilized to detect objects to be kept or replaced. The neural network is trained by a database of photos that have been annotated to reflect what is judged as being relevant groups, main targets, etc.
  • the neural network accepts an image (e.g., face of person) as input vector, process the image through a plurality of hidden layers, and an output layer that classifies the image as relevant/non-relevant to main group (i.e., binary classification neural network). If there are multiple main or target groups, then the neural network performs a multi-class classification.
  • image e.g., face of person
  • main group i.e., binary classification neural network
  • a generative adversarial network is a class of machine learning frameworks where two neural networks contest with each other. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics.
  • a GAN is based on the "indirect” training through a so-called discriminator (another neural network that can tell how much input is "realistic" which itself is also being updated dynamically). This means that the GAN is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.
  • a GAN thus comprises a generator algorithm and a discriminator algorithm.
  • the generator algorithm in one form, is a neural network, receiving a sample data from a random distribution as input, processing the input through one or more hidden layers and producing output (such as an image, face etc.) in the output layer.
  • the discriminator algorithm in one form, is a neural network, receiving the output generated by the generator algorithm as its input, processing the input through one or more hidden layers, and producing a classification of its input as real or fake.
  • the two algorithms are trained jointly using the known labels of real samples (e.g. a dataset of real images with known labels).
  • the discriminator is discarded and only the generator network is used to produce output (e.g. human faces, or other types of data).
  • some embodiments do an assessment of whether detected persons can be recognized and if they are known to the user. Furthermore, some embodiments also include algorithms to determine whether those persons have previously been marked as they should be included as original or should be replaced. Furthermore, some embodiments include algorithms to determine whether there is profile information from recognized people indicating their willingness to be included in photos or if they prefer to be anonymized.
  • such judgements depend on the geographic location of the photograph, e.g., a private place, office, restaurant, or a public open space.
  • Some embodiments also include an estimated context or type of situation of the photograph, such as if people show nudity, if people are at a party which could be recognized using crowd counting techniques or a scene classification algorithm that can be re-trained using a database of party images, if people are well dressed and posing in a structured way, or if the photo includes well known buildings or places.
  • Detection of certain buildings in a scene relates to scene or landmark classification and therefore algorithms could be trained for the given purpose.
  • any other standard classification techniques in machine learning domain such as ResNet or alike can be trained and used to detect certain scenes or landmarks. All these aspects can be included into the initial detection of objects to be replaced.
  • the user might set certain preferences on how certain factors shall impact the decision about whether to replace objects, and in some embodiments a person's public profile can include such preferences (e.g.: I am willing to be included in photos in public places but not in private places and not when some nudity is involved).
  • the sequence of events during viewfinder mode - i.e. when the preview stream is displayed is taken into consideration when determining which persons to include when the photo is taken. For example, if there are people passing by in the background, but two people are stationary in front of camera, the faces of the people in the background will be predicted not to belong to the target group, as has been discussed in the above with reference to figure 3F for example. This can be done by extracting optical flow from the sequence of images and then remove the moving regions (e.g., moving people) from the image based on pre-defined thresholds on optical flow.
  • FlowNet is an example of an algorithm for estimating optical flow that can be utilized.
  • the image capturing arrangement 100 is configured to adapt the contextual parameters and how the determination is to be made based on past operation - the arrangement can learn.
  • the image capturing arrangement 100 is configured to learn from the selections made by the user when accepting, adding and/or cancelling people (and/or) objects in photos.
  • selections are tied to contextual information (such as e.g., geographic location) and a custom database could be created for the user to keep track of selections to simplify future selections; office people, home people, etc.
  • the image capturing arrangement 100 is configured to apply a feature extraction algorithm (such as SIFT, SURF) to the historical dataset to extract a set of feature vectors for each sample in the dataset. These feature vectors form a bag of visual words (BoVW) for the given sample.
  • the image capturing arrangement 100 is configured to thereafter apply a clustering algorithm (such as k-means) to the collection of feature vectors extracted to create a finite number of clusters (e.g., 10 clusters) where a cluster center represents all similar samples (i.e., user selected patches) in the past (such as faces, people, buildings etc.) in a compact way.
  • a feature extraction algorithm such as SIFT, SURF
  • a feature vectors form a bag of visual words (BoVW) for the given sample.
  • the image capturing arrangement 100 is configured to thereafter apply a clustering algorithm (such as k-means) to the collection of feature vectors extracted to create a finite number of clusters (e.g., 10 cluster
  • the image capturing arrangement 100 is further configured to, given a new image taken by the camera, detect all possible objects (with their bounding box regions) in the image (using an object detection algorithm such as Yolo. And feeding the detected objects into a feature extraction algorithm (such as SIFT, SURF) to assign feature vectors to each object.
  • object detection algorithm such as Yolo.
  • feature extraction algorithm such as SIFT, SURF
  • the image capturing arrangement 100 is further configured to match the feature vectors corresponding to the object(s) to the cluster centers. If a match is found with high similarity (using a pre-defined threshold), then the bounding box (or other marking 315) corresponding to the object is recommended to the user for further actions (for example replacing).
  • feature extraction algorithm is a neural network with an input layer that accepts image pixel values, processed through a plurality of intermediate (aka. hidden) layers, and an output layer that predicts a vector(s) of features.
  • the training of the clustering algorithm can be continuous; that is, when new feedbacks from user (such as image patches) are provided, the clustering algorithm is updated to enhance its accuracy and/or extract new clusters that were not learned before.
  • non-visual sensor data such as sensorial recording [e.g., select what to include among separate audio tracks; voice, bird, nsfw sounds, misc. sounds], can also be detected and presented as objects to be replaced.
  • sensorial recording e.g., select what to include among separate audio tracks; voice, bird, nsfw sounds, misc. sounds
  • Audacity could be applied to filter background voice.
  • GAN generative adversarial networks
  • the user can manually change the recommendation also between photos.
  • the user has a selfie-stick and cannot reach the touchscreen while in target viewfinder position. Then, the user can take a photo, look at the anonymization that was the result of the recommendation, change that setting, and re-do the photo and the system would identify that the scene is very similar to the previous and then uses the new updated recommendation.
  • biometric data are captured instead of or in addition to image data.
  • the flow for processing other data is similar to the flow for processing image data as discussed herein and the main difference is that a Digital Signal processor may be used to process the biometric data instead of an image centric Image Signal Processor.
  • Some examples of biometric data are given herein and includes speech in a video feed, for example, that should be filtered (replaced) because the user does not want to hear anyone except the one that holds a speech/talk.
  • biometric data could be other types of biometric data such as olfactory data.
  • the smell of a certain individual would be possible to mark using for example visual or haptic cues, in order to make user selections.
  • controller including various circuitry 101 residing in the sensor data capturing arrangement, for example as the smartphone of figure IB
  • the controller may be in a separate device, such as a server connected to the smartphone, wherein both the smartphone and the server are considered to be included in the arrangement 100.
  • Figure 4 shows a flowchart for a general method according to the teachings herein.
  • the method is to be executed on a sensor data capturing arrangement as discussed in figure 1A or in figure IB in a manner as discussed in relation to figure 2 and any, some or all of figures 3A to 3K.
  • the method comprises receiving 410 a stream of sensor data 103A/300 from at least one sensor 103 arranged to provide the stream of sensor data 103A/300 and detecting 420 an object 310 in the stream of sensor data. It should be noted that the method comprises detecting 420 the object 310 based on contextual information.
  • the method also comprises capturing 440 an instance of the stream of sensor data and replacing 450 the object 310 with a replacement object 310-R in the captured instance of the stream. It should be noted that the object to be replaced may be replaced with the replacement object already in the data stream prior to the capture or as the capture is made.
  • the method also comprises storing 460 the captured instance after the object has been replaced.
  • the method also comprises determining 430 the replacement object.
  • the method may also comprise any, some or all of the embodiments discussed herein, specifically with regards to figures 2 and any of figures 3A to 3L.
  • Figure 5 shows a schematic view of a computer-readable medium 500 carrying computer instructions 510 that when loaded into and executed by a controller of a sensor data capturing arrangement 100 enables the sensor data capturing arrangement 100 to implement the present invention.
  • the computer-readable medium 500 may be tangible such as a hard drive or a flash memory, for example a USB memory stick or a cloud server.
  • the computer- readable medium 500 may be intangible such as a signal carrying the computer instructions enabling the computer instructions to be downloaded through a network connection, such as an internet connection.
  • a computer-readable medium 500 is shown as being a computer disc 500 carrying computer-readable computer instructions 510, being inserted in a computer disc reader 520.
  • the computer disc reader 520 may be part of a cloud server 530 - or other server - or the computer disc reader may be connected to a cloud server 530 - or other server.
  • the cloud server 530 may be part of the internet or at least connected to the internet.
  • the cloud server 530 may alternatively be connected through a proprietary or dedicated connection.
  • the computer instructions are stored at a remote server 530 and be downloaded to the memory 102 of the sensor data capturing arrangement 100 for being executed by the controller 101.
  • the computer disc reader 520 may also or alternatively be connected to (or possibly inserted into) a sensor data capturing arrangement 100 for transferring the computer- readable computer instructions 510 to a controller of the sensor data capturing arrangement (presumably via a memory of the sensor data capturing arrangement 100).
  • Figure 5 shows both the situation when a sensor data capturing arrangement 100 receives the computer-readable computer instructions 510 via a server connection and the situation when another sensor data capturing arrangement 100 receives the computer-readable computer instructions 510 through a wired interface. This enables for computer-readable computer instructions 510 being downloaded into a sensor data capturing arrangement 100 thereby enabling the sensor data capturing arrangement 100 to operate according to and implement the teachings as disclosed herein.
  • Figure 6 shows a schematic view of a software component arrangement 600 for use in a sensor data capturing arrangement 100 as discussed herein.
  • the software component arrangement 600 comprises software code 610 for receiving a stream of sensor data 103A/300 from at least one sensor 103 arranged to provide the stream of sensor data 103A/300 and software code 620 for detecting an object 310 in the stream of sensor data.
  • the software component arrangement 600 also comprises software code 620 for detecting the object 310 based on contextual information.
  • the software component arrangement 600 also comprises software code 640 for capturing an instance of the stream of sensor data and software code 650 for replacing the object 310 with a replacement object 310-R in the captured instance of the stream.
  • the software component arrangement 600 also comprises software code 660 for storing the captured instance after the object has been replaced.
  • the software component arrangement 600 further comprises software code 630 for determining the replacement object.
  • the software component arrangement 600 also comprises software code 670 for further functionality as discussed herein, specifically as discussed herein with reference to figure 2 and figures 3A to 3L.
  • Figure 7 shows a schematic view of a sensor data capturing arrangement 700, such as the sensor data capturing arrangement of figure 1A or figure IB as discussed herein.
  • the sensor data capturing arrangement 700 comprises circuitry 710 for receiving a stream of sensor data 103A/300 from at least one sensor 103 arranged to provide the stream of sensor data 103A/300 and circuitry 720 for detecting an object 310 in the stream of sensor data.
  • the sensor data capturing arrangement 700 comprises circuitry 720 for detecting the object 310 based on contextual information.
  • the sensor data capturing arrangement 700 also comprises circuitry 740 for capturing an instance of the stream of sensor data and circuitry 750 for replacing the object 310 with a replacement object 310-R in the captured instance of the stream.
  • the sensor data capturing arrangement 700 also comprises circuitry 760 for storing the captured instance after the object has been replaced.
  • the sensor data capturing arrangement 700 further comprises circuitry 730 for determining the replacement object.
  • the sensor data capturing arrangement 700 also comprises circuitry 770 for further functionality as discussed herein, specifically as discussed herein with reference to figure 2 and figures 3A to 3K.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Studio Devices (AREA)

Abstract

A sensor data capturing arrangement (100) comprising a memory (102), wherein the sensor data capturing arrangement (100) comprises: receiving circuitry (101A) configured to receive a stream of sensor data (103A/300) from at least one sensor (103) arranged to provide the stream of sensor data (103A/300); object detection circuitry (101B) configured to detect an object (310) in the stream of sensor data based on contextual information; capturing circuitry configured (101C) to capture an instance of the stream of sensor data; processing circuitry (101D) configured to replace the object (310) with a replacement object (310-R) in the captured instance of the stream and then to store the captured instance.

Description

SENSOR DATA CAPTURING ARRANGEMENT AND A METHOD FOR CAPTURING SENSOR DATA
TECHNICAL FIELD
The present invention relates to a sensor data capturing arrangement, a method for capturing sensor data, a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of the capturing arrangement enable the capturing arrangement to implement the method. Moreover, it relates to a software component arrangement for use in the sensor data capturing arrangement. All these aspects are related to a system supported user selection of persons to include in captured images.
BACKGROUND
Issues relating to privacy and personal integrity are becoming increasingly more important in today's society where data such as audio recordings and images are captured almost everywhere and at any given time. These issues are important to both persons capturing such data and to people accidentally captured. For example, a person taking a photo may not want some specific person or a random person making rude or offensive gestures or expressions to be captured in the photograph. This would ruin the photograph and also potentially inflict emotional harm to the person taking the photo or being at the center of the photo. Similarly, there may be persons that are accidentally or unintentionally captured in the photograph against their will.
Similar problems may arise in other types of data capturing, such as recording video, recording audio or capturing biometrics, to mention a few areas.
There is thus a need for improved techniques keeping captured data free from unwanted objects or artefacts.
SUMMARY
Even if prior art provides for different techniques for identifying objects in images and deleting or blurring such objects, there is a problem in that, as the inventors have realized, such anonymization of the captured data is performed only after the data has been captured. This means that despite the fact that an object or some objects are anonymized in the final representation of the captured data, there will be a copy - at least locally - of the captured data where the object is not anonymized. As the inventors have also realized, there is a risk that such a copy - even if temporary - is shared for example through malware or by accident in cases where temporary copies are not timely deleted.
It is therefore one aspect of the teachings herein to provide a sensor data capturing arrangement , wherein the sensor data capturing arrangement comprises: receiving circuitry configured to receive a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; object detection circuitry configured to detect an object in the stream of sensor data based on contextual information; capturing circuitry configured to capture an instance of the stream of sensor data; processing circuitry configured to replace the object with a replacement object in the captured instance of the stream and then to store the captured instance.
In one embodiment the sensor data capturing arrangement further comprises a controller, wherein the controller comprises at least one of the receiving circuitry, the object detection circuitry, the capturing circuitry and the processing circuitry.
In one embodiment the sensor data capturing arrangement further comprises a memory, wherein the processing circuitry is further configured to store the captured data in the memory.
In one embodiment the stream of sensor data is a continuous stream of sensor data.
In one embodiment the at least one sensor is an image sensor, wherein the stream of sensor data comprises a preview, image stream and a full-resolution image stream, the preview image stream having an image resolution/frame rate lower than the maximum resolution/frame rate of the sensor, the full-resolution image stream having an image resolution/frame rate equal to (or lower than, but higher than the low resolution) the maximum resolution of the sensor, and wherein the object comprises a person, whereby the sensor data capturing arrangement comprises an image capturing arrangement, and wherein the object detection circuitry is further configured to detect the person based on contextual information by: performing object detection on the low-resolution preview image stream; determining a context (FE, P, dl) of the person to be detected; identifying one or more other persons; determine a context of the one or more other persons; determine that at least one aspect of the context of the person is different from the context of the one or more other persons, and wherein the processing circuitry is further configured to replace the object in the full-resolution image stream and to store the instance of the fullresolution image stream.
In one embodiment the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining a pose of the person and determining if the pose is different from poses of the one or more other persons.
In one embodiment the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining a facial expression of the person and determining if the facial expression is different from other facial expressions of the one or more other persons.
In one embodiment the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining an identity of the person and determining that the identity is unassociated with identities of the one or more other persons and/or of the owner of the image capturing device. In one embodiment the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by: determining a first distance between the persons and the other persons; determining a second distance between the one or more other persons; and determining that the first distance exceeds the second distance.
In one embodiment the processing circuitry is further configured to generate a replacement person and wherein the replacement object comprises the replacement person.
In one embodiment the processing circuitry is further configured to generate the replacement person by retrieving a person from a stored person image.
In one embodiment the processing circuitry is further configured to generate a replacement face and wherein the replacement object comprises the replacement face.
In one embodiment the processing circuitry is further configured to generate the replacement face by utilizing a Generative Adversarial Network, GAN.
In one embodiment the processing circuitry is further configured to generate the replacement face by retrieving a face from a stored face image.
In one embodiment the processing circuitry is further configured to generate an image of a physical object and wherein the replacement object comprises the image of a physical object wherein the physical object has a geographic location that corresponds to a geographic location of the image capturing device.
In one embodiment the processing circuitry is further configured to generate an image of a background and wherein the replacement object comprises the image of the background wherein the background is an estimate of the background behind the face to be replaced.
In one embodiment the processing circuitry is further configured to provide a marking of the detected object, provide a candidate for a replacement object, and to receive user input indicating an acceptance of the candidate as the replacement object.
In one embodiment the processing circuitry is further configured to receive user input indicating a request for a further candidate, and in response thereto provide a further candidate.
In one embodiment the image capturing arrangement further comprises a user interface and the object detection circuitry is further configured to receive user input via the user interface indicating an area and to perform object detection in the indicated area in order to detect further objects.
In one embodiment the image capturing arrangement comprises a telecommunications User Equipment (UE). In one such embodiment, the image capturing arrangement comprises a smart phone or a tablet computer.
According to one aspect of the teachings herein there is provided a method for capturing sensor data, wherein the method comprises: receiving a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; detecting an object in the stream of sensor data; capturing an instance of the stream of sensor data; replacing the object with a replacement object in the captured instance of the stream; and then storing the captured instance, wherein the method further comprises detecting the object based on contextual information.
According to one aspect of the teachings herein there is provided a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of a capturing arrangement enables the capturing arrangement to implement the method according to the teachings herein.
According to one aspect of the teachings herein there is provided a software component arrangement for use in a sensor data capturing arrangement, wherein the software component arrangement comprises: software code for receiving a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; software code for detecting an object in the stream of sensor data; software code for capturing an instance of the stream of sensor data; software code for replacing the object with a replacement object in the captured instance of the stream; and software code for storing the captured instance after the object has been replaced, and software code for detecting the object based on contextual information.
According to one aspect of the teachings herein there is provided a sensor data capturing arrangement comprising: circuitry for receiving a stream of sensor data from at least one sensor arranged to provide the stream of sensor data; circuitry for detecting an object in the stream of sensor data; circuitry for capturing an instance of the stream of sensor data; circuitry for replacing the object with a replacement object in the captured instance of the stream; and circuitry for storing the captured instance after the object has been replaced, and circuitry for detecting the object based on contextual information.
The proposed solution thus enables for supporting a user for example taking a photo and quickly selecting which people to include in said photo using a combination of contextual information (possibly including location and/or historical data), and user input selection on a device screen, by providing a preprocessing arrangement and method for enhancing privacy aspects of collection of information, such as in photos or video recordings, before they are composed into storable or transmittable data objects such as a photos or video files.
The prior art discloses different techniques to identify faces or people, and also techniques to segment and remove, blur, or grey out selected people so such techniques are considered known and will not be discussed in detail herein. The prior art, however, does not disclose how to select the objects based on multi-contextual data and to do so already before encoding, storing or communicating such data beyond the local subsystem. One example of the subsystem may be the signal processor processing the received sensor data or the combination of sensor and signal processor processing the received sensor data. For an embodiment where the sensor data capturing device is an image capturing device, the local subsystem may be the chipset controlling image acquisition.
The proposed solution may utiliza a combination of contextual information such as location, historical data, friend data, consent information, etc., to propose to user - for example taking a photo - which objects (people) to include and which objects (people) to remove in said photo before said photo or video is encoded and compressed and either stored, possibly to be shared with other users or systems outside the processing subsystem of the apparatus, for example a mobile device.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1A shows a schematic view of a sensor data capturing arrangement according to some embodiments of the teachings herein;
Figure IB shows a schematic view of a sensor data capturing arrangement according to some embodiments of the teachings herein;
Figure 2 shows a schematic view of components and modules of a sensor data capturing arrangement according to some embodiments of the teachings herein;
Figures 3A to 3K each shows a schematic view of a sensor data capturing arrangement where objects are detected or replaced according to some embodiments of the teachings herein;
Figures 4 shows a flowchart of a general method according to some embodiments of the teachings herein;
Figure 5 shows a schematic view of a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of an arrangement enables the arrangement to implement some embodiments of the teachings herein;
Figure 6 shows a component view for a software component arrangement according to some embodiments of the teachings herein; and
Figure 7 shows a component view for an arrangement comprising circuits according to some embodiments of the teachings herein.
DETAILED DESCRIPTION
Figure 1A shows a schematic view of a sensor data capturing arrangement 100 according to some embodiments of the teachings herein. It should be noted that the sensor data capturing arrangement 100 may comprise a single device or may be distributed across several devices and apparatuses. Also, the following embodiments described herein are non-limiting and for illustration purposes only.
The sensor data capturing arrangement 100 comprises or is operably connected to a controller 101 and a memory 102. The controller 101 is configured to control the overall operation of the sensor data capturing arrangement 100. The controller 101 comprises a combination of circuits that enables the controller 101 to control the operation of the sensor data capturing arrangement 100. As a skilled person would understand there are many alternatives for how to implement a controller 101, such as using Field - Programmable Gate Arrays circuits, ASICs, processors, etc. in addition or as an alternative. For the purpose of this application, all such possibilities and alternatives will be referred to simply as the controller 101. In some embodiments the sensor data capturing arrangement 100 comprises a circuit 101A for receiving sensor data 103A. Such a circuit 101A may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101. In some embodiments, the sensor data capturing arrangement 100 comprises a circuit for detecting objects 101B. Such a circuit may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101. In some embodiments the sensor data capturing arrangement 100 comprises a circuit for capturing data 101C. Such a circuit may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101. In some embodiments the sensor data capturing arrangement 100 comprises a circuit for processing data 101D. Such a circuit may be a standalone circuit connected to other circuits or it may be implemented as part of a circuit or processor possibly being part of the controller 101.
The memory 102 is configured to store data, such as sensor data, and computer-readable instructions that when loaded into the controller 101 indicate how the sensor data capturing arrangement 100 is to be controlled. The memory 102 may comprise several memory units or devices, but they will be perceived as being part of the same overall memory 102. As a skilled person would understand, there are many possibilities of how to select where data should be stored and a general memory 102 for the sensor data capturing arrangement 100 is therefore seen to comprise any and all such memory units for the purpose of this application. As a skilled person would understand, there are many alternatives of how to implement a memory, for example using non-volatile memory circuits, such as EEPROM memory circuits, or using volatile memory circuits, such as RAM memory circuits. The memory 102 may also be external to the sensor data capturing arrangement 100, such as an external physical memory in the form of an external hard drive (NVM, SSD, or disk-based) or in the form of a cloud storage solution. For the purpose of this application all such alternatives will be referred to simply as the memory 102.
In some embodiments the sensor data capturing arrangement 100 also comprises one or more sensors 103. In one embodiment at least one sensor 103 is an image sensor (possibly comprised in a camera module). In one embodiment, at least one sensor 103 is an audio sensor for recording sounds, such as voice input. In one embodiment at least one sensor 103 is a biometric sensor for capturing biometric data such as for example retina scans, fingerprint scans or other biometric data. In one embodiment at least one sensor 103 is a tactile sensor for capturing tactile or haptic input.
The one or more sensors 103, possibly in combination with the controller 101, are configured to receive at least one stream of sensor data. As mentioned above, the sensor data capturing arrangement 100 comprises circuitry for receiving such data stream(s). In some embodiments the stream of sensor data is a continuous stream of data received (directly) from the corresponding sensor, such as raw sensor data. Raw sensor data is the unprocessed data received directly from the sensor. In some embodiments the stream of sensor data is a regular stream of processed data from the corresponding sensor. In some embodiments, where the sensor data capturing arrangement is an image data capturing arrangement, the stream of sensor data is the image sensor data provided (at lower resolution) to the viewfinder. Upon capturing, a full resolution image is captured and stored.
In some embodiments the stream of sensor data comprises a first stream being of a lower quality or resolution and a second stream being of a higher quality or resolution up to the maximum resolution of the sensor 103. The stream of lower quality or resolution may simply offer lower resolution than the maximum resolution of the sensor 103. In some such embodiments, the first stream may be utilized for preprocessing or previewing of the data and the second stream may be utilized for capturing and final processing of the data. This enables for a preprocessing of the data that requires less computational resources (than processing the data at full quality(/resolution) while still capturing the data at a high quality(/resolution). One example is where the data stream received is from an image sensor 103, in which example the first stream is a preview stream and the second stream is the full resolution stream. In some embodiments the preview stream is the stream of a viewfinder pipeline.
It should be noted that the use of full resolution herein does not necessarily relate to the maximum resolution that an image sensor is capable of but rather the full resolution, i.e. highest resolution set to be used by the system being the resolution that a capture should be made at.
It should also be noted that instead of or in addition to the low or high resolution, a low or high frame rate could be used. A low frame rate thus being lower than the high frame rate, and the high frame rate being the frame rate that a capture should be made at.
For the embodiments where the sensor is an image sensor, the resolution is defined as pixels per inches (dpi), pixels in a vertical alignment and pixels in a horizontal alignment or simply the number of pixels. In such embodiments a low resolution would generally be of 230,000 to 920,000 dots (less than a megapixel) and a high resolution would be several megapixels, for example 8, 10, 12 or even more. A high resolution may also refer to the maximum resolution of the sensor. The maximum resolution can be the highest resolution that the sensor is designed to operate at or the highest resolution that the sensor is set to operate at through (user) settings.
As is also discussed in the above, the sensor data capturing arrangement 100 also comprises circuitry for detecting objects in the data stream. As is also discussed in the above, the sensor data capturing arrangement 100 also comprises circuitry for capturing an instance of the data stream, whereby the data at a given time is captured and stored. In embodiments where the data stream comprises a first and a second stream, the circuitry for detecting objects is configured to detect the object(s) in the first stream prior to a selection of which object is to be replaced has been made and then to detect the object in the second stream for replacing the object prior to capturing a instance of the data stream as per the second stream where the object to be replaced has been replaced. As is also discussed above, the sensor data capturing arrangement 100 also comprises circuitry for processing data in the data stream, and more specifically for processing the detected objects to select which object should be replaced and to generate a replacement object to replace the object in the captured data stream 103C.
In some embodiments the sensor data capturing arrangement 100 also comprises a user interface 110, for receiving user input and for providing information to the user. In some embodiments, as is indicated in figure 1A through the dashed lines, the sensor data capturing arrangement 100 is operably connected to such a user interface 110.
In some embodiments the sensor data capturing arrangement 100 also comprises a communications interface 104 for communicating with other arrangements 100, sensors
103 or servers (not shown). In some embodiments, as is indicated in figure 1A through the dashed lines, the sensor data capturing arrangement 100 is operably connected to such a communications interface 110.
In some embodiments the communications interface 104 comprises a radio frequency (RF) communications interface. In one such embodiment the communication interface 104 comprises a Bluetooth™ interface, a WiFi™ interface, a ZigBee™ interface, an RFID™ (Radio Frequency I Dentifier) interface, Wireless Display (WiDi) interface, Miracast interface, and/or other RF interface commonly used for short range RF communication. In an alternative or supplementary such embodiment the communication interface 104 comprises a cellular communications interface such as a fifth generation (5G) cellular communication interface, an LTE (Long Term Evolution) interface, a GSM (Global System for Mobile Communications) interface and/or other interface commonly used for cellular communication. In some embodiments the communications interface 104 is configured to communicate using the UPnP (Universal Plug n Play) protocol. In some embodiments the communications interface
104 is configured to communicate using the DLNA (Digital Living Network Appliance) protocol.
In some embodiments the communications interface 104 comprises a wired interface. In some such embodiments the communication interface 104 comprises a USB (Universal Serial Bus) interface. In some alternative or additional such embodiments the communication interface 104 comprises a HDMI (High Definition Multimedia Interface) interface, a Display Port interface, an Ethernet interface, a MIPI (Mobile Industry Processor Interface) interface, an analog interface, a CAN (Controller Area Network) bus interface, an I2C (Inter-Integrated Circuit) interface, or other interface.
Figure IB shows a schematic view of a sensor data capturing arrangement 100 as in figure 1A according to some embodiments of the teachings herein. In the example embodiment shown in figure IB, the sensor data capturing arrangement 100 is a telecommunications User Equipment. In one such embodiment, the sensor data capturing arrangement 100 is a smartphone. In one alternative such embodiment, the sensor data capturing arrangement 100 is a tablet computer or simply a tablet.
Furthermore, the sensor data capturing embodiment 100 according to figure IB also comprises a user interface 110 comprising a display 110 and one or more buttons 110A. In embodiments where the display 110 is a touch display, the at least one, some or all of the one or more buttons 110A are virtual buttons implemented through the touch display 110.
The one or more sensors 103 includes an image sensor 103 possibly as part of a camera module for receiving and capturing image data. The one or more sensors 103 also include (at least in some embodiments) a microphone for receiving and recording audio data (such as voice data). In embodiments where the sensors 103 comprise an image sensor, the sensor data capturing embodiment 100 is also referred to as an image capturing arrangement 100.
Figure 2 shows a schematic view of how a sensor data stream 103A is received and generally processed by modules in a sensor data capturing embodiment.
A sensor provides a sensor data stream 103A to a processing module 210. It should be noted that the processing module 210 may be implemented by or as part of the controller 101 or specific circuitry discussed in the above for a sensor data capturing embodiment 100. In some embodiments, the sensor data stream 103A comprises a first stream 103A-1 and a second stream 103A-2.
The processing module 210 is configured to pre-process 210-1 the data stream 103A, wherein - in some embodiments - the first stream 103A-1 is rendered to be displayed as a preview to a user, for example on the display 110. The processing is usually performed already on the chipset or other component of the circuitry receiving the sensor data stream 103A; i.e. on the signal processor for general sensor data and in the image processor for image sensor data.
According to the teachings herein, at least one object to be replaced is detected during the preprocessing by the circuitry for detecting object detection as discussed in the above. The object(s) to be replaced are indicated in the data stream 103A, whereby an altered data stream 103B is provided. In embodiments where the data stream 103A comprises a first and a second stream, the first stream 103A-1 is altered as is indicated in figure 2. This allows for a preview to show the alteration of the stream prior to capturing the data.
The processing module 210 is also configured to capture 210-2 an instance of the data stream 103A, whereby the sensor data 103A at that time instance is captured as a stored capture 103C. In embodiments where the data stream 103A comprises a first and a second data stream, the capture is of the second (full resolution) data stream 103A-2 as is indicated in figure 2. Neither of the original data in the first stream or in the altered data stream 103B are thus captured nor stored.
The processing module 210 may also be configured to post process 210-3 the captured data, such as being configured to apply various filters or compressing the image.
The processing nodule 210 thus receives a sensor data stream 103A from a sensor 103 and provides a capture of the sensor data 103D.
As mentioned in the above, the sensor data stream 103A is first stored after the capture is executed. And, as the inventors have realized, by detecting objects to be replaced already as part of or in connection to the pre-processing of the sensor data stream 103A, and to also replace them as part of or in connection to the pre-processing a copy of the original data stream will not be captured or otherwise stored and can thus not be misused or abused.
The inventors are therefore proposing to provide a sensor data capturing arrangement 100 configured to receive a sensor data stream 103A, to detect objects in the stream, capture the stream and replace the object(s) in the capture before the capture is stored. Specifically, the detection of objects and the determination of replacement objects are made or performed already as part of the pre-processing and, in embodiments with a first and second data stream, based on the first stream, whereby the capture is of the second stream.
More specifically and with reference to figure 1A, the inventors are proposing to provide a sensor data capturing arrangement 100 comprising receiving circuitry 101A configured to receive a stream of sensor data 103A from at least one sensor 103 arranged to provide the stream of sensor data 103A. The sensor data capturing arrangement 100 also comprises object detection circuitry 101B configured to detect an object in the stream of sensor data 103 based on contextual information and capturing circuitry configured (101C) to capture an instance of the stream of sensor data. The sensor data capturing arrangement 100 also comprises processing circuitry 101D configured to replace the object with a replacement object in the captured instance of the stream and then to store the captured instance.
It should be noted that even if the following disclosure is primarily focused on receiving and capturing image data in an image capturing arrangement 100, such as in a smartphone or tablet computer of figure IB, the teachings herein also apply to other sensor data being received and captured.
It should be noted that in further embodiments, the teachings herein can also apply to other (biometric) information such as sound recordings before being composed into storable or transmittable audio recordings. Further embodiments also include composite biometric information such as photo or video including smell or tactile information where all or selected parts of the composite information about objects can be anonymized or removed.
Figure 3A shows a schematic view of an example of an image capturing arrangement of figure IB, such as a smartphone, where graphical information 300 representing an image stream 103A received from a camera 103 is displayed on the display 110. In the following no difference will be made between the image stream and the graphical representation 300 thereof, and it is noted that a skilled person would understand the difference and when what is actually referred to. In the example of figure 3A it is assumed that the image stream 103A comprises a preview stream (first stream) and a full resolution stream (second stream), wherein the graphical representation is of the preview stream.
In this example the image stream comprises three objects 310, which in the example of figure 3A are three persons, represented by the three faces being displayed on the display 110 in the example of figure 3A. Each person 310 exhibits one or more visible traits, for example a pose P and/or a posture, such as a body posture, a gesture or a facial expression FE. In the example of figure 3A these traits are exemplified as a pose P and a facial expression FE. The pose P, which includes a position and a general direction of interest, is indicated by an arrow indicating a line of sight for the faces. A facial expression FE is also shown for each face 310. The pose P may be determined based on the direction of the line of sight or the direction of the eyes. The pose may alternatively or additionally be determined based on (the direction of) the nose or other facial feature. A facial expression may be determined through smile detection. A body posture may be detected signaling an emotion. For example a closed fist would signal anger.
It should be noted that these visible traits are examples of the context for the objects 310, individually or in combination. Other examples of such context are the identity of a person, the geographic location of the image, the time of the image stream, to mention a few.
In some embodiments contextual information related the geographic location where the photo is taken is used to adjust the probability of people in said photo belonging to the same group and thus help the system deduce which people/faces to keep and which to replace. This could e.g., be geographic locations like "office", "tourist attraction", "home", etc. Photos available publicly and/or from friends/Facebook/etc. from same geographic location could be used as input to said system. Existing scene classification algorithms based on vision and acoustic can be used to determine the context and then use in a rule-based system for additional behavior, such as replacing.
As discussed in the above, the image capturing arrangement 100 is configured through the circuitry for object detection 101B to detect such objects 310 in the (preview) image stream. The circuitry for object detection 101B is specifically configured to detect at least one object to be replaced 310-1 based on the contextual information.
Figure 3B shows an example where an object to be replaced 310-1 has been detected. In this example the object is a person represented by a face. The object may also be only a face. In this example the object to be replaced has been detected based on the contextual information related to the identity of the object 310-1.
In some embodiments, an object may be detected to be an object to be replaced based on an identity of the object, wherein the identity is a blocked identity. Examples of such blocked identities are identities that have been blocked in a contact application or in a social media application.
In some embodiments, an object may be detected to be an object to be replaced based on the identity of the object, wherein the identity is an identity that is not associated with identities of other objects 310-2 in the image stream. One example of such identities is where the person to be replaced 310-1 is not associated (friends) with the other persons 310-2 in the preview, for example based on a social media app. Another example of such identities is where the identity of the person to be replaced 310-1 is not present in a contact list stored in the memory 102 of the smartphone 100.
The circuitry for detecting objects 101B is thus also, in some embodiments, configured to determine an identity associated with the detected object. Alternatively, such determination of identity may be performed by another circuitry, such as the processing circuitry 101D or the controller 101. In some embodiments the identity may be determined based on facial recognition.
In some embodiments, the image capturing device is further configured to indicate which object 310-1 that is detected is to be replaced, for example through a graphic indication. In the example of figure 3B, the person to be replaced 310-1 is indicated by a graphical object (in this example a frame) 315 being displayed around or on top of the person to be replaced 310-1. This allows for a user to quickly see and ascertain which person is to be replaced.
In some embodiments also the other detected objects are indicated by a graphical indication to indicate to a user that these objects have been detected but are not proposed to be replaced. In some such embodiments, the graphical indication for indicating an object to be replaced is displayed differently (for example in one color) from the graphical indication for the objects detected, but not proposed to be replaced (for example in a different color).
In some embodiments also undetected objects are indicated by a graphical indication to indicate to a user that these objects have not been successfully detected or categorized and are unknown. In some such embodiments, the graphical indication for indicating an unknown object is displayed differently (for example in one color) to the graphical indication for the objects detected. In some embodiments unknown objects are automatically determined to be objects to be replaced 310-1. In some embodiments unknown objects are automatically determined to be other objects 310-2 (i.e. to base the context upon). In some embodiments, this automatic determination is based on user settings.
In some embodiments, the determination or a change of an automatic determination may be changed by a user and in such embodiments, the image capturing arrangement 100 is further configured to receive user input indicating such an object.
Figure 3C shows an example where a person (being an example of an object) to be replaced 310-1 has been detected. In this example the person to be replaced has been detected based on the contextual information that the pose of the person to be replaced is different from the other persons 310-2 in the preview. As can be seen, the person to be replaced 310- 1 is looking to the side, while the other persons 310-2 are looking straight into the camera (indicated by the arrows pointing downwards). In one such embodiment, the image capturing arrangement 100 is configured to determine a pose of each detected object, to determine if the majority (two or more) of the objects have a similar pose and if at least one object has a different pose, and if that is the case, detect the object(s) with the different pose as the object to be replaced 310-1.
The circuitry for detecting objects 101B is thus also, in some embodiments, configured to determine a pose associated with an object. Alternatively, such determination of pose may be performed by another circuitry, such as the processing circuitry 101D or the controller 101.
Figure 3D shows an example where a person (being an example of an object) to be replaced 310-1 has been detected. In this example the person to be replaced has been detected based on the contextual information that the posture of the person to be replaced is different from the other persons 310-2 in the preview. The posture of a person is in some embodiments a body posture, a gesture and/or a facial expression.
In the example of figure 3D, the person to be replaced 310-1 is presenting an unhappy facial expression, while the other persons 310-2 are presenting happy facial expressions (they are smiling). In some embodiments, the circuitry for object detection is thus further configured to perform facial recognition to determine facial expressions.
In another example (not shown), the person to be replaced 310-1 is presenting a gesture. In some embodiments, the object presenting a rude or otherwise unallowed gesture is detected to be replaced. In such embodiments, the circuitry for object detection 101B is further configured to detect gestures, possibly based on image analysis by comparing them to an image library of known gestures, wherein some gestures are indicated to be unallowed.
In some embodiments, the image library is stored locally in the memory 102. In some embodiments the image library is stored on a portion of the memory being dedicated to or even comprised in the circuitry for object detection. And, in some embodiments the image library is stored remotely, for example on a server, and accessed through the communication interface 104.
In some embodiments the circuitry for object detection 101B is further configured to determine that the other persons 310-2 are not making similar gestures and if that is the case, detect the person making the gesture as the person to be replaced 310-1. This allows for a group photo where the whole group presents rude gestures to be unaltered, while situations where only a by-passer is presenting the gesture is altered.
In another example (not shown), the person to be replaced 310-1 is exhibiting a body posture (for example standing), while the other persons 310-2 are exhibiting a different body posture (for example laying down). In some embodiments, the circuitry for object detection is thus further configured to perform body posture detection.
In one such embodiment, the image capturing arrangement 100 is configured to determine a posture of each detected object, to determine if the other objects 310-2 have a similar posture and if at least one object has a different posture, and if so, detect the object(s) with the different posture as the object to be replaced 310-1.
Figure 3E shows an example where a person (being an example of an object) to be replaced 310-1 has been detected. In this example, the person to be replaced has been detected based on the contextual information that the geographic location of the person to be replaced is different from the other persons 310-2 in the preview. That the geographic location of the person to be replaced is different from the others may in some embodiments be determined based on a difference in distance between objects/persons.
In the example embodiment shown in figure 3E the person to be replaced 310-1 is at a distance dl from the other persons 310-2, while the other persons 310-2 are at a distance d2 from one another. In some embodiments the distance d2 between the other persons 310-2 is determined based on an average of the distances between the other persons 310-2. And, in some embodiments the distance d2 between the other persons 310-2 is determined based on a median of the distances between the other persons 310-2.
In some embodiments it is determined that the person to be replaced 310-1 is at a different geographic location if the distance dl exceeds a threshold distance. In some embodiments the threshold distance is based on the distance d2 between the other persons, wherein the threshold distance is a factor (for example 1.5, 2, 3, 4, 5) of the distance d2 between the other persons 310-2.
In some embodiments the circuitry for object detection is further configured to determine geographic locations of objects. The geographic location may be determined by detecting objects corresponding to landmark and identifying these landmarks. The geographic location may be determined by retrieving a location from a location sensor such as a GPS (Global Positioning System) device.
Figure 3F shows an example where a person (being an example of an object) to be replaced 310-1 has been detected. In this example, the person to be replaced has been detected based on the contextual information that the movement (speed and/or direction) of the person to be replaced is different from movement of the other persons 310-2 in the preview. In the example of figure 3F the person to be replaced is moving fast to the left of the image, whereas the other persons 310-2 are moving slowly to the right. The movements are thus different.
In some embodiments the circuitry for object detection 101B is further configured to determine movements of objects, such as by tracking an object and determining a difference in location in the image at different (subsequent) times.
The other persons 310-2 are in some embodiments defined as the majority of detected persons. The other persons 310-2 are in some embodiments defined as detected persons having associated identities. The other persons 310-2 are in some embodiments defined as persons exhibiting similar poses, or postures. The other persons 310-2 are in some embodiments defined as persons being at a location close to one another, for example less than 1.5 times the distance d2. The other persons 310-2 are in some embodiments defined as persons exhibiting a similar movement.
In some embodiments, and as discussed in the above, the contextual information is related to the geographic location of the sensor 103. In such embodiments the circuitry for object detection 101B is further configured to determine that if the geographic location of the sensor (i.e. where the image stream is received) is a specific geographic location, then specific determinations for the context are to be applied. For example, if the geographic location is a geographic location marked as sensitive, all persons with clearly recognizable faces are to be replaced.
In some embodiments, and as also discussed in the above, the contextual information is related to the time of the data stream. In such embodiments, the circuitry for object detection 101B is further configured to determine that if the time is a specific time, then specific determinations should be applied.
Specifically, the context of a geographic location and a time, may infer specific contextual determinations such as that any stream received at a specific geographic location at a specific time is subjected to specific contextual determinations. One example is where it is forbidden to take photos of license plates of cars at night, whereby all license plates are removed or blurred in images taken during night hours in such streets. Another example is where it is not allowed to take photos identifying children at schools during school hours, whereby all faces are blurred or replaced by anonymous faces from images taken during such hours.
It should be noted that the embodiments and examples discussed in the above may be combined in any manner where any, some or all of the contexts discussed are combined for one or more objects in the data stream. For example, there may be a person that has a blocked identity that is performing a rude gesture, while at the same time, there may be a person moving away from the group and exhibiting a different pose in the same data stream, wherein both such persons will be detected as being persons to be replaced.
As alternative examples of identities to persons with identities, it should be mentioned that objects with identities may be logos and/or textual identities. A logo, such as a trademark, may easily be detected and identified by the circuitry for object detection. Similarly, any textual information, such as a name, may also be easily detected and identified. A logo or textual information may also easily be replaced.
It should be noted that even though most examples given herein are related to images and detecting objects in such images giving some examples of contexts, other contexts may also be used to determine if an object is to be replaced or not.
In some embodiments, the image capturing arrangement 100 may also learn and use information such as geographic location, social circles etc. to adapt the contextual determinations that determine which objects to be replaced.
Returning to figure 3B, in the example shown in figure 3B (and also in figures 3C-3F) graphical indication is used in some embodiments to indicate or mark the object 310-1 to be replaced. Figure 3G shows an example situation where an object, in this example a person represented by a face, is marked as being an object to be replaced 310-1. The marking is achieved by displaying a graphical indication 315 marking the object 310-1. In this example the graphical indication 315 is a frame encompassing the face 310-1. In some embodiments, the graphical indication 315 is in a color that contrasts with the background. In some embodiments, the graphical indication 315 is in a color that indicates the context of the object, indicating to a user why the object is to be replaced. For example, a first color (red) can be used to indicate a first context (for example a blocked identity) and a second color (yellow) can be used to indicate a second context (for example an unknown identity). This enables a user to understand why an object is proposed to be replaced. In some embodiments, the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating an acceptance of an object to be replaced. In some such embodiments, utilizing a touch display 110, the input may be received as a double tap on the object 310-1 or anywhere else inside the marking 315. Alternatively, in some such embodiments utilizing a physical keys 110A, the input may be received as a press on a softkey or other key indicated to be for acceptance. As a skilled person would realize, a softkey is a key whose functions differ depending on the current execution state and whose function may be indicated by being displayed on the display 110.
In some embodiments, the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating a detected object 310-1 to be an object to be replaced. In some such embodiments utilizing a touch sensitive display 110, the input may be received as a long press or double tap on the object 310-2 or anywhere else inside its marking (if a marking is displayed).
In some embodiments, the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating an undetected object to be an object to be replaced or to be an object to be kept (i.e. another object). In some such embodiments utilizing a touch display 110, the input may be received as a double tap or a long press on the object 310 or anywhere else inside its marking (if a marking is displayed).
In some embodiments all proposals are accepted by receiving a command to execute the capture, i.e. to take the picture.
In some embodiments, the graphical marking 315 is displayed differently in response to receiving an acceptance. As an example, a graphical frame 315 being displayed as dashed lines for a proposed object may be changed to solid lines for an accepted object. Alternatively, the color of the marking may be changed as the object is accepted.
Similarly, in some embodiments, the image capture arrangement 100 is further configured to receive user input through the user interface 110 indicating a rejection or cancellation of an object to be replaced. In some such embodiments utilizing a touch sensitive display 110, the input may be received as a long tap on the object 310-1 or anywhere else inside the marking 315. Alternatively, in some such embodiments utilizing physical keys 110A, the input may be received as a press on a cancellation or clear key. Alternatively, the input may be received as a press on a softkey 110A being indicated to be for cancellation.
In some embodiments, the graphical marking 315 is no longer displayed or removed in response to receiving a cancellation.
As discussed in the above, the object to be replaced 310-1 is to be replaced by a replacement object (referenced 310-R in figure 3H), and in some embodiments, the object to be replaced 310-1 or rather a proposed object to be replaced 310-1 is replaced already in the preview to indicate to the user what the final image will look like. Alternatively, the proposed object to be replaced 310-1 may be replaced first upon receiving an acceptance of the replacement object 310-R. In some embodiments, the image capture arrangement 100 is further configured to receive user input through the user interface 110 requesting a (next or further) proposed replacement object (referenced 310-R in figure 3H) to be displayed.
In some such embodiments utilizing a touch display 110, the input may be received as a single tap on the object 310-1 or anywhere else inside the marking 315, whereby a (next) proposed replacement object (referenced 310-R in figure 3H) is displayed for later acceptance. Alternatively, in some such embodiments utilizing physical keys 110A, the input may be received as a press on a navigation (arrow) key. Alternatively, the input may be received as a press on a softkey 110A being indicated to be for proposing a next replacement object (referenced 310-R in figure 3 H).
In response to receiving such input a replacement object (referenced 310-R in figure 3H) is displayed instead of the original object, in case the original object is displayed, and a further or second replacement object (referenced 310-R in figure 3 H ) is displayed instead of the first replacement object (referenced 310-R in figure 3H).
Figure 3H shows an example of a replacement object 310-R to be used to replace the object to be replaced 310-1. In the example of figure 3H an alternative or modified object, in this example, a modified face 310-R, is shown. This is indicated by the replacement face 310-R being (slightly) different from the original face 310-1 of figure 3G. In figure 3H this is indicated by the face having a different nose which indicates a different pose.
In some embodiments, the replacement face is an autogenerated face that in some embodiments is generated utilizing a generative adversarial network, GAN for example using StyleGAN. In some embodiments, the replacement face is a replacement face retrieved from an image library stored either locally or remotely.
This also applies to persons (full or partial bodies), where the replacement person is an autogenerated image of a person or retrieved from an image library.
Figure 31 shows an example of an alternative where the replacement object - in this example the face - is a blurred object - in this case a blurring of the face.
Figure 3J shows an example of an alternative where the replacement object - in this example the face - is a deletion of the object - in this case a deletion or blocking of the face. The deletion may be achieved by overlaying the face with a different object, such as an estimation or generation of a continuation of the background. The deletion may alternatively be achieved by overlaying the face (the object) with an empty space.
Figure 3K shows an example of an alternative where the replacement object - in this example the face - is an alternative object. The alternative object is in some embodiments selected from an image library to be an object common to the context of the photograph. In some embodiments, the context of the photo may be based on the geographic location such as being in a forest wherein an object commonly found in a forest is inserted to replace the face, in the example of figure 3K, the replacement object is a plant. In some embodiments, the context of the photo may be based on an analysis of the background of the image wherein an object similar to other objects in the background of the objects is selected from an image library. Alternatively a copy of an object present in the image may be used as the replacement object, whereby a copy of for example a plant may be used instead of the face to be replaced.
In the examples of figures 3H to 3K a user may be able to scroll through the different alternatives of these figures by providing the user input for displaying a next proposal, effectively scrolling through the examples of figures 3H to 3K.
In alternative embodiments, the marking may display numbers, letters or any character, where the character is associated with a command (scroll, accept, decline, change) and the user inputs the command by simply pressing the key for the corresponding character.
In some embodiments, a different object detection algorithm is used in the indicated area to allow for a different object detection to be performed.
As is discussed in the above, various contextual information, such as pose and body posture or facial expression, is used to determine which object(s) is to be replaced and several embodiments and examples are discussed. Supplemental to these embodiments and examples and seen to be included in these, the inventors are also proposing to detect an object to be replaced (or not) based on additional aspects discussed in the below.
In some embodiments, a clustering algorithm (such as K-means) can be used to create groups of objects to be kept or replaced. In such embodiments, given an image, the image (or rather a representation of the image such as bag of Visual Words) is matched with cluster centers to find if this image matches a group. In some alternative embodiments, a neural network is utilized to detect objects to be kept or replaced. The neural network is trained by a database of photos that have been annotated to reflect what is judged as being relevant groups, main targets, etc. The neural network accepts an image (e.g., face of person) as input vector, process the image through a plurality of hidden layers, and an output layer that classifies the image as relevant/non-relevant to main group (i.e., binary classification neural network). If there are multiple main or target groups, then the neural network performs a multi-class classification.
In some embodiments, a generative adversarial network (GAN) is a class of machine learning frameworks where two neural networks contest with each other. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. A GAN is based on the "indirect" training through a so-called discriminator (another neural network that can tell how much input is "realistic" which itself is also being updated dynamically). This means that the GAN is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. A GAN thus comprises a generator algorithm and a discriminator algorithm. The generator algorithm, in one form, is a neural network, receiving a sample data from a random distribution as input, processing the input through one or more hidden layers and producing output (such as an image, face etc.) in the output layer. The discriminator algorithm, in one form, is a neural network, receiving the output generated by the generator algorithm as its input, processing the input through one or more hidden layers, and producing a classification of its input as real or fake. The two algorithms are trained jointly using the known labels of real samples (e.g. a dataset of real images with known labels). At the deployment phase (aka inference phase), the discriminator is discarded and only the generator network is used to produce output (e.g. human faces, or other types of data).
In order to base the detection on an identity, some embodiments do an assessment of whether detected persons can be recognized and if they are known to the user. Furthermore, some embodiments also include algorithms to determine whether those persons have previously been marked as they should be included as original or should be replaced. Furthermore, some embodiments include algorithms to determine whether there is profile information from recognized people indicating their willingness to be included in photos or if they prefer to be anonymized.
In some embodiments, such judgements depend on the geographic location of the photograph, e.g., a private place, office, restaurant, or a public open space. Some embodiments also include an estimated context or type of situation of the photograph, such as if people show nudity, if people are at a party which could be recognized using crowd counting techniques or a scene classification algorithm that can be re-trained using a database of party images, if people are well dressed and posing in a structured way, or if the photo includes well known buildings or places.
These aspects relate to scene classification and there are several methods described in the prior art for different purposes, for example place classification.
Detection of certain buildings in a scene relates to scene or landmark classification and therefore algorithms could be trained for the given purpose. In general, any other standard classification techniques in machine learning domain such as ResNet or alike can be trained and used to detect certain scenes or landmarks. All these aspects can be included into the initial detection of objects to be replaced.
In some embodiments, the user might set certain preferences on how certain factors shall impact the decision about whether to replace objects, and in some embodiments a person's public profile can include such preferences (e.g.: I am willing to be included in photos in public places but not in private places and not when some nudity is involved).
In some embodiments, also the sequence of events during viewfinder mode - i.e. when the preview stream is displayed, is taken into consideration when determining which persons to include when the photo is taken. For example, if there are people passing by in the background, but two people are stationary in front of camera, the faces of the people in the background will be predicted not to belong to the target group, as has been discussed in the above with reference to figure 3F for example. This can be done by extracting optical flow from the sequence of images and then remove the moving regions (e.g., moving people) from the image based on pre-defined thresholds on optical flow. FlowNet is an example of an algorithm for estimating optical flow that can be utilized. As mentioned in the above, the image capturing arrangement 100 is configured to adapt the contextual parameters and how the determination is to be made based on past operation - the arrangement can learn.
In some such embodiments, the image capturing arrangement 100 is configured to learn from the selections made by the user when accepting, adding and/or cancelling people (and/or) objects in photos. In some embodiments, such selections are tied to contextual information (such as e.g., geographic location) and a custom database could be created for the user to keep track of selections to simplify future selections; office people, home people, etc.
This can be achieved in the image capturing arrangement by noting objects selected by the user as feedback and building a historical dataset for training of a machine learning algorithm to automatically recommend objects in future images.
In some embodiments, the image capturing arrangement 100 is configured to apply a feature extraction algorithm (such as SIFT, SURF) to the historical dataset to extract a set of feature vectors for each sample in the dataset. These feature vectors form a bag of visual words (BoVW) for the given sample. The image capturing arrangement 100 is configured to thereafter apply a clustering algorithm (such as k-means) to the collection of feature vectors extracted to create a finite number of clusters (e.g., 10 clusters) where a cluster center represents all similar samples (i.e., user selected patches) in the past (such as faces, people, buildings etc.) in a compact way.
The image capturing arrangement 100 is further configured to, given a new image taken by the camera, detect all possible objects (with their bounding box regions) in the image (using an object detection algorithm such as Yolo. And feeding the detected objects into a feature extraction algorithm (such as SIFT, SURF) to assign feature vectors to each object.
The image capturing arrangement 100 is further configured to match the feature vectors corresponding to the object(s) to the cluster centers. If a match is found with high similarity (using a pre-defined threshold), then the bounding box (or other marking 315) corresponding to the object is recommended to the user for further actions (for example replacing).
In one embodiment, feature extraction algorithm is a neural network with an input layer that accepts image pixel values, processed through a plurality of intermediate (aka. hidden) layers, and an output layer that predicts a vector(s) of features.
In another embodiment, the training of the clustering algorithm can be continuous; that is, when new feedbacks from user (such as image patches) are provided, the clustering algorithm is updated to enhance its accuracy and/or extract new clusters that were not learned before.
As discussed in the above, other objects can be detected and not only persons and faces of persons. Other such examples are license plate, house, localization information (such as number on house or address). Existing solutions for visual images such as Inpaint and Cutout could be used for replacing visual data, such as when an object is to be deleted or replaced by a blurring.
As regards non-visual sensor data, such as sensorial recording [e.g., select what to include among separate audio tracks; voice, bird, nsfw sounds, misc. sounds], can also be detected and presented as objects to be replaced. For audio data, Audacity could be applied to filter background voice.
As mentioned in the above a person or a face can be replaced by being deleted, blurred or replaced by suitable background/content/objects to make it appear as if no person was present at said spot in photo. An existing solution based on generative adversarial networks (GAN), is GMCNN, which learns to inpaint masked regions of an image. The baseline algorithm is trained to inpaint objects with the background. For additional wanted features such as replacing with certain effects or content, GMCNN could be re-trained and used to produce wanted output.
In some embodiments, the user can manually change the recommendation also between photos. In one example, the user has a selfie-stick and cannot reach the touchscreen while in target viewfinder position. Then, the user can take a photo, look at the anonymization that was the result of the recommendation, change that setting, and re-do the photo and the system would identify that the scene is very similar to the previous and then uses the new updated recommendation.
As mentioned herein, not only image data is received, but also other types of data may be received. And, in some embodiments, other biometric data are captured instead of or in addition to image data. The flow for processing other data is similar to the flow for processing image data as discussed herein and the main difference is that a Digital Signal processor may be used to process the biometric data instead of an image centric Image Signal Processor. Some examples of biometric data are given herein and includes speech in a video feed, for example, that should be filtered (replaced) because the user does not want to hear anyone except the one that holds a speech/talk.
In an alternative embodiment, biometric data could be other types of biometric data such as olfactory data. In this embodiment, the smell of a certain individual would be possible to mark using for example visual or haptic cues, in order to make user selections.
It should be noted that even though the disclosure herein has been focused on the controller (including various circuitry) 101 residing in the sensor data capturing arrangement, for example as the smartphone of figure IB, the controller may be in a separate device, such as a server connected to the smartphone, wherein both the smartphone and the server are considered to be included in the arrangement 100.
Figure 4 shows a flowchart for a general method according to the teachings herein. The method is to be executed on a sensor data capturing arrangement as discussed in figure 1A or in figure IB in a manner as discussed in relation to figure 2 and any, some or all of figures 3A to 3K. The method comprises receiving 410 a stream of sensor data 103A/300 from at least one sensor 103 arranged to provide the stream of sensor data 103A/300 and detecting 420 an object 310 in the stream of sensor data. It should be noted that the method comprises detecting 420 the object 310 based on contextual information.
The method also comprises capturing 440 an instance of the stream of sensor data and replacing 450 the object 310 with a replacement object 310-R in the captured instance of the stream. It should be noted that the object to be replaced may be replaced with the replacement object already in the data stream prior to the capture or as the capture is made. The method also comprises storing 460 the captured instance after the object has been replaced.
In some embodiments, the method also comprises determining 430 the replacement object.
The method may also comprise any, some or all of the embodiments discussed herein, specifically with regards to figures 2 and any of figures 3A to 3L.
Figure 5 shows a schematic view of a computer-readable medium 500 carrying computer instructions 510 that when loaded into and executed by a controller of a sensor data capturing arrangement 100 enables the sensor data capturing arrangement 100 to implement the present invention.
The computer-readable medium 500 may be tangible such as a hard drive or a flash memory, for example a USB memory stick or a cloud server. Alternatively, the computer- readable medium 500 may be intangible such as a signal carrying the computer instructions enabling the computer instructions to be downloaded through a network connection, such as an internet connection.
In the example of figure 5, a computer-readable medium 500 is shown as being a computer disc 500 carrying computer-readable computer instructions 510, being inserted in a computer disc reader 520. The computer disc reader 520 may be part of a cloud server 530 - or other server - or the computer disc reader may be connected to a cloud server 530 - or other server. The cloud server 530 may be part of the internet or at least connected to the internet. The cloud server 530 may alternatively be connected through a proprietary or dedicated connection. In one example embodiment, the computer instructions are stored at a remote server 530 and be downloaded to the memory 102 of the sensor data capturing arrangement 100 for being executed by the controller 101.
The computer disc reader 520 may also or alternatively be connected to (or possibly inserted into) a sensor data capturing arrangement 100 for transferring the computer- readable computer instructions 510 to a controller of the sensor data capturing arrangement (presumably via a memory of the sensor data capturing arrangement 100).
Figure 5 shows both the situation when a sensor data capturing arrangement 100 receives the computer-readable computer instructions 510 via a server connection and the situation when another sensor data capturing arrangement 100 receives the computer-readable computer instructions 510 through a wired interface. This enables for computer-readable computer instructions 510 being downloaded into a sensor data capturing arrangement 100 thereby enabling the sensor data capturing arrangement 100 to operate according to and implement the teachings as disclosed herein.
Figure 6 shows a schematic view of a software component arrangement 600 for use in a sensor data capturing arrangement 100 as discussed herein. The software component arrangement 600 comprises software code 610 for receiving a stream of sensor data 103A/300 from at least one sensor 103 arranged to provide the stream of sensor data 103A/300 and software code 620 for detecting an object 310 in the stream of sensor data. The software component arrangement 600 also comprises software code 620 for detecting the object 310 based on contextual information. The software component arrangement 600 also comprises software code 640 for capturing an instance of the stream of sensor data and software code 650 for replacing the object 310 with a replacement object 310-R in the captured instance of the stream. The software component arrangement 600 also comprises software code 660 for storing the captured instance after the object has been replaced.
In some embodiments, the software component arrangement 600 further comprises software code 630 for determining the replacement object.
The software component arrangement 600 also comprises software code 670 for further functionality as discussed herein, specifically as discussed herein with reference to figure 2 and figures 3A to 3L.
Figure 7 shows a schematic view of a sensor data capturing arrangement 700, such as the sensor data capturing arrangement of figure 1A or figure IB as discussed herein. The sensor data capturing arrangement 700 comprises circuitry 710 for receiving a stream of sensor data 103A/300 from at least one sensor 103 arranged to provide the stream of sensor data 103A/300 and circuitry 720 for detecting an object 310 in the stream of sensor data. The sensor data capturing arrangement 700 comprises circuitry 720 for detecting the object 310 based on contextual information.
The sensor data capturing arrangement 700 also comprises circuitry 740 for capturing an instance of the stream of sensor data and circuitry 750 for replacing the object 310 with a replacement object 310-R in the captured instance of the stream. The sensor data capturing arrangement 700 also comprises circuitry 760 for storing the captured instance after the object has been replaced.
In some embodiments, the sensor data capturing arrangement 700 further comprises circuitry 730 for determining the replacement object.
The sensor data capturing arrangement 700 also comprises circuitry 770 for further functionality as discussed herein, specifically as discussed herein with reference to figure 2 and figures 3A to 3K.

Claims

1. A sensor data capturing arrangement (100), wherein the sensor data capturing arrangement (100) comprises: receiving circuitry (101A) configured to receive a stream of sensor data (103A/300) from at least one sensor (103) arranged to provide the stream of sensor data (103A/300); object detection circuitry (101B) configured to detect an object (310) in the stream of sensor data based on contextual information; capturing circuitry configured (101C) to capture an instance of the stream of sensor data; processing circuitry (101D) configured to replace the object (310) with a replacement object (310-R) in the captured instance of the stream and then to store the captured instance.
2. The sensor data capturing arrangement (100) according to claim 1, further comprising a controller (101), wherein the controller comprises at least one of the receiving circuitry, the object detection circuitry, the capturing circuitry and the processing circuitry.
3. The sensor data capturing arrangement (100) according to claim 1 or 2, further comprising a memory (102), wherein the processing circuitry is further configured to store the captured instance of the stream in the memory (102).
4. The sensor data capturing arrangement (100) according to one of the claims 13, wherein the stream of sensor data is a continuous stream of sensor data.
5. The sensor data capturing arrangement (100) according to any preceding claim, wherein at least one of the at least one sensor (103) is an image sensor, wherein the stream of sensor data comprises a preview image stream (103A-1) and a full-resolution image stream (103A-2), the preview image stream having an image resolution lower than the maximum image resolution of the sensor (103) and the full-resolution image stream (103A-2) having a resolution equal to the maximum image resolution of the sensor (103), and wherein the object comprises a person (310), whereby the sensor data capturing arrangement (100) comprises an image capturing arrangement, and wherein the object detection circuitry (101B) is further configured to detect the person based on contextual information by: performing object detection on the preview image stream (103A-1); determining a context (FE, P, dl) of the person (310-1) to be detected; identifying one or more other persons (310-2); determine a context (FE, P, d2) of the one or more other persons; determine that at least one aspect of the context of the person (310-1) is different from the context of the one or more other persons (310-2), and wherein the processing circuitry (101C) is further configured to replace the object in the full-resolution image stream (103A-2) and then to store the instance of the full-resolution image stream (103A-2).
6. The sensor data capturing arrangement (100) according to claim 5, wherein the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining a pose (P) of the person (310-1) and determining if the pose is different from poses (P) of the one or more other persons (310-2);
7. The sensor data capturing arrangement (100) according to claim 5 or 6, wherein the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining a facial expression (FE) of the person and determining if the facial expression is different from other facial expressions of the one or more other persons.
8. The sensor data capturing arrangement (100) according to any of claims 5 to 7, wherein the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by determining an identity of the person and determining that the identity is unassociated with identities of the one or more other persons and/or of the owner of the image capturing device.
9. The sensor data capturing arrangement (100) according to any of claims 5 to 8, wherein the object detection circuitry is further configured to determine that at least one aspect of the context of the person is different from the context of the one or more other persons by: determining a first location of the person; determining a second location of the other persons; and determining that the first location is different from the second location.
10. The sensor data capturing arrangement (100) according to claim 9, wherein the object detection circuitry is further configured to determine that at least one aspect of the location of the person is different from the location of the one or more other persons by: determining a first distance between the person and the other persons; determining a second distance between the one or more other persons; and determining that the first distance exceeds the second distance.
11. The sensor data capturing device according to any of claims 5 to 10, wherein the processing circuitry is further configured to generate a replacement person and wherein the replacement object comprises the replacement person.
12. The sensor data capturing device according to claim 11, wherein the processing circuitry is further configured to generate the replacement person by retrieving a person from a stored person image.
13. The sensor data capturing device according to any of claims 5 to 12, wherein the processing circuitry is further configured to generate a replacement face and wherein the replacement object comprises the replacement face.
14. The sensor data capturing device according to claim 13, wherein the processing circuitry is further configured to generate the replacement face by utilizing a Generative Adversarial Network, GAN.
15. The sensor data capturing device according to claim 13 or 14, wherein the processing circuitry is further configured to generate the replacement face by retrieving a face from a stored face image.
16. The sensor data capturing device according to any of claims 5 to 15, wherein the processing circuitry is further configured to generate an image of a physical object and wherein the replacement object is the image of a physical object, wherein the physical object has a geographic location that corresponds to a geographic location of the image capturing device. 1
17. The sensor data capturing device according to any of claims 5 to 16, wherein the processing circuitry is further configured to generate an image of a background and wherein the replacement object comprises the image of the background wherein the background is an estimate of the background behind the face to be replaced.
18. The sensor data capturing arrangement (100) according to any preceding claim, wherein the processing circuitry is further configured to provide a marking (315) of the detected object, provide a candidate for a replacement object, and to receive user input indicating an acceptance of the candidate as the replacement object.
19. The sensor data capturing arrangement (100) according to claim 18, wherein the processing circuitry is further configured to receive user input indicating a request for a further candidate, and in response thereto provide a further candidate.
20. The sensor data capturing arrangement (100) according to any preceding claim, wherein the image capturing arrangement (100) further comprises a user interface (110) and the object detection circuitry is further configured to receive user input via the user interface indicating an area and to perform object detection in the indicated area in order to detect further objects.
21. A method for capturing sensor data, wherein the method comprises: receiving (410) a stream of sensor data (103A/300) from at least one sensor (103) arranged to provide the stream of sensor data (103A/300); detecting (420) an object (310) in the stream of sensor data; capturing (440) an instance of the stream of sensor data; replacing (450) the object (310) with a replacement object (310-R) in the captured instance of the stream; and storing (460) the captured instance, wherein the method further comprises detecting the object (310) based on contextual information.
22. A computer-readable medium (120) carrying computer instructions (121) that when loaded into and executed by a controller (101) of a capturing arrangement (100) enables the capturing arrangement (100) to implement the method according to claim 21.
23. A software component arrangement (600) for use in a sensor data capturing arrangement (100), wherein the software component arrangement (600) comprises: software code (610) for receiving a stream of sensor data (103A/300) from at least one sensor (103) arranged to provide the stream of sensor data (103A/300); software code (620) for detecting an object (310) in the stream of sensor data; software code (640) for capturing an instance of the stream of sensor data; software code (650) for replacing the object (310) with a replacement object (310-R) in the captured instance of the stream; and software code (660) for storing the captured instance after the object has been replaced, and software code (620) for detecting the object (310) based on contextual information.
24. A sensor data capturing arrangement (100, 700) comprising circuitry (710) for receiving a stream of sensor data (103A/300) from at least one sensor (103) arranged to provide the stream of sensor data (103A/300); circuitry (720) for detecting an object (310) in the stream of sensor data; circuitry (740) for capturing an instance of the stream of sensor data; circuitry (750) for replacing the object (310) with a replacement object (310-R) in the captured instance of the stream; and circuitry (760) for storing the captured instance after the object has been replaced, and circuitry (720) for detecting the object (310) based on contextual information.
PCT/EP2022/073287 2022-08-22 2022-08-22 Sensor data capturing arrangement and a method for capturing sensor data WO2024041714A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/073287 WO2024041714A1 (en) 2022-08-22 2022-08-22 Sensor data capturing arrangement and a method for capturing sensor data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/073287 WO2024041714A1 (en) 2022-08-22 2022-08-22 Sensor data capturing arrangement and a method for capturing sensor data

Publications (1)

Publication Number Publication Date
WO2024041714A1 true WO2024041714A1 (en) 2024-02-29

Family

ID=83280198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/073287 WO2024041714A1 (en) 2022-08-22 2022-08-22 Sensor data capturing arrangement and a method for capturing sensor data

Country Status (1)

Country Link
WO (1) WO2024041714A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040202382A1 (en) * 2003-04-11 2004-10-14 Hewlett-Packard Development Company, L.P. Image capture method, device and system
US20120177248A1 (en) * 2011-01-12 2012-07-12 Shuster Gary S Graphic data alteration to enhance online privacy
EP3594842A1 (en) * 2018-07-09 2020-01-15 Autonomous Intelligent Driving GmbH A sensor device for the anonymization of the sensor data and an image monitoring device and a method for operating a sensor device for the anonymization of the sensor data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040202382A1 (en) * 2003-04-11 2004-10-14 Hewlett-Packard Development Company, L.P. Image capture method, device and system
US20120177248A1 (en) * 2011-01-12 2012-07-12 Shuster Gary S Graphic data alteration to enhance online privacy
EP3594842A1 (en) * 2018-07-09 2020-01-15 Autonomous Intelligent Driving GmbH A sensor device for the anonymization of the sensor data and an image monitoring device and a method for operating a sensor device for the anonymization of the sensor data

Similar Documents

Publication Publication Date Title
Verdoliva Media forensics and deepfakes: an overview
CN109325933B (en) Method and device for recognizing copied image
EP3125135B1 (en) Picture processing method and device
CN102105904B (en) Detection information registration device, object detection device, electronic device, method for controlling detection information registration device, method for controlling object detection device, program for controlling detection information registration device
KR101010081B1 (en) Media identification
US11087137B2 (en) Methods and systems for identification and augmentation of video content
CN109997130B (en) Video search device, data storage method, and data storage device
EP2336949B1 (en) Apparatus and method for registering plurality of facial images for face recognition
CN105631408A (en) Video-based face album processing method and processing device
CN106295499B (en) Age estimation method and device
JP5662670B2 (en) Image processing apparatus, image processing method, and program
CN102236890A (en) Generating a combined image from multiple images
CN111611873A (en) Face replacement detection method and device, electronic equipment and computer storage medium
CN107871001B (en) Audio playing method and device, storage medium and electronic equipment
KR20110074107A (en) Method for detecting object using camera
JP2019165431A (en) Method and system for recording privacy compliant data
CN115525140A (en) Gesture recognition method, gesture recognition apparatus, and storage medium
CN111553372A (en) Training image recognition network, image recognition searching method and related device
CN110868538A (en) Method and electronic equipment for recommending shooting posture
CN109923543A (en) The method, system and medium of three-dimensional video-frequency are detected by the fingerprint of the multiple portions of generation video frame
CN107977636B (en) Face detection method and device, terminal and storage medium
CN104077597A (en) Image classifying method and device
CN113906437A (en) Improved face quality of captured images
CN114332975A (en) Identifying objects partially covered with simulated covering
CN111881740A (en) Face recognition method, face recognition device, electronic equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22768681

Country of ref document: EP

Kind code of ref document: A1