LIVE SCENE RECOGNITION ALLOWING SCENE DEPENDENT IMAGE MODIFICATION BEFORE IMAGE RECORDING OR DISPLAY
Field of the invention
The invention relates to a device for processing a time sequence of images, an imaging system, an image display system, and a method for processing a live sequence of images.
Background of the invention
The last ten years, capturing, processing, displaying and filtering of digital images developed. Currently, most devices allow capturing of digital images at high resolution, capturing and displaying high definition digital video at high frame rates. Most devices comprise image capturing or storing, and comprise an image processor allowing for pre-processing of images, like performing noise reduction, color adjustment, white balancing, image encoding and decoding, and other basic preprocessing. In fact, this image processing may be done on images while they are being recorded or while they are being displayed. Image filtering is described by for instance US2007/297641 (Linda Criddle et al) to obscure visual content. The content has been recorded and stored previously. In order to apply the filtering, the reviewing and analyzing of the content is performed by a server and not by the display itself. As a result the display depends on the server, and the communication with the server, with regard to the correctness of the filtering.
Photographic filters modify recorded images. Sometimes they are used to make only subtle changes to images; other times the image would simply not be possible without them. Coloring filters affect the relative brightness of different colors; red lipstick may be rendered as anything from almost white to almost black with different filters. Others change the color balance of images, so that photographs under incandescent lighting show colors as they are perceived, rather than with a reddish tinge. There are filters that distort the image in a desired way, diffusing an otherwise sharp image, adding a starry effect, blur or mask an image, etc.
Photographic filters are well known as they are provided today by popular apps like Instagram, Camera+, EyeEm, Hipstamatic, Aviary, and so on. These photographic filters typically adjust locally or globally in the image the intensity, hue, saturation, contrast, color curves per red, green or blue color channel, apply color lookup tables, overlay one or more masking filters such as a vignetting mask (darker edges and corners), crop the image to adjust the width and height, add borders to the images thereby generating for example the Polaroid effect, and combinations thereof. Different filters are best applied to different types of images in order to obtain an aesthetically pleasing picture; for instance as published at http://mashable.com/2012/07/19/instagram-filters/. Well-known examples of photographic filters provided by e.g. the Instagram app are the filter:
Rise filter for close-up shots of people;
Hudson filter for outdoor photos of buildings;
Sierra filter for nature outdoor shots;
Lo-Fi filter for shots of food;
Sutro filter for photos of summer events, nights out, BBQ's, picnics;
Brannan filter if image has strong shadows;
Inkwell filter if light and shadow are prominent in image;
Hefe filter if image has vibrant colors (rainbows), and so on.
Once a user has snapped an image, a photographic filter operation or combination thereof can be applied to the image in an interactive mode, where the user manually selects the filter that gives the best aesthetic effect. Editing a captured photograph is known for instance from European patent application EP 1695548 and US2006/0023077 (Benjamin N. Alton et al).
Summary of the invention
An aspect of the invention is to provide new and/or more enhanced use of digital image capturing and/or displaying. The invention further or in combination allows live prevention of recording and/or displaying of unwanted types of images such as scenes displaying torture or sexual intercourse, child pornography, classified military objects, and the invention allows for capturing and/or displaying aesthetically pictures.
The invention provides a device for processing a time sequence of images, said device adapted for retrieving an image from said time sequence of images from a
memory, performing scene recognition on said retrieved image, and based upon the result of said scene recognition, perform an action on said image before the images are being recorded.
In an embodiment, said action comprises image modification comprising adapting at least part of said image.
In an embodiment, said action comprises modifying said image into a modified image.
In an embodiment, said action comprises blocking storage of said image.
In an embodiment, said action comprises blocking display of said image.
In an embodiment, said action comprises erasing said image from said memory.
In an embodiment, said action comprises encrypting said image.
These actions may be combined. Each action may have its advantages or applications or use.
By understanding the scene, including recognizing objects within the scene, also including recognizing an event in the scene, it can be can prevented that unwanted scenes and/or objects and/or events are being displayed or even being recorded. For example, a display device (such as screens, monitors, and the like) provided with the invention would not be able to show child pornography even though it would receive an input signal containing these images. In the same manner a camera device (such as digital cameras) pointing at a child pornography scene would not be able to record the image. Furthermore, it allows automation of image improvement and/or filtering.
In this application, image refers to a digital image. Usually, such an image is composed of pixels that each have a digital value representing a quantity of light. An image can be represented by a picture or a photograph. It can be part of a set of subsequent images. In this application when an image is being captured it has not been recorded yet. An image is only being recorded after it is captured and processed. By adapting Machine Learning methods and software compilation techniques, the invention allows embedding scene recognition within a computer program comprising software code portions, which are able to run on a data processor. So said processor could fit the dimensions of portable devices such as, but not limited to, cameras, (smart) phones and digital tablets. By tuning the performance of the scene recognition, images can be captured and processed faster then a human eye is able to. As a result the processed images can be adapted and blocked in real-time. Applications according
to the invention comprise the automated enhancing of images and filtering of images based upon the understanding of their contents.
Another advantage of the invention is that by understanding a scene the user is relieved from the burden where the user has to manually select a photographic filter resulting in an aesthetically improved image or video recording.
Scene recognition comprises recognition of different types of images or videos. This became possible using computer vision and/or machine learning algorithms. Known algorithms are for example:
- Calculating the unique digital signature of an image and then matching that signature against those of other photos [see, for particular embodiments, Microsoft
PhotoDNA Fact Sheet December 2009, or Heo et al, "Spherical hashing", in Computer Vision Pattern Recognition Conference. 2012.];
- Discriminative feature mining [see. for particular embodiments. Bangpeng Yao, Khoshla. Li Fei-Fie, "Combining randomization and discrimination for fine- grained image categorisation", in Computer Vision Pattern Recognition Con erence. 2011.] or contour-based shape descriptors [see, for particular embodiments, Hu, Jia, Ling, Huang, "Multiscale Distance Matrix for Fast Plant Leaf Recognition", IEEE Trans, on Image Processing (T-IP), 21(l l):4667-4672, 2012],
- Deep Fisher networks [see. for particular embodiments. Simonyan, Vedaldi, Zisserman, "Deep Fisher Networks for Large-Scale Image Classification", in
Advances in Neural Information Processing Systems, 2013],
- Bag of Words/Support vector machines [see, for particular embodiments, Snoek et al, "The MediaMili TRECVID 2012 Semantic Video Search Engine," in Proceedings of the 10th TRECVID Workshop, Gaithersburg, USA, 2012],
- Deep learning [see, for particular embodiments, Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing 25, MIT Press, Cambridge, MA],
- Template matching based on the characteristic shapes and colors of objects [see, for particular embodiments, R. Brunelli, Template Matching Techniques in Computer Vision: Theory and Practice, Wiley]
- Face detection [see, for particular embodiments, Viola Jones, Robust Real- Time Face Detection, International Journal of Computer Vision, 2004] and face
recognition [see, for particular embodiments, R. Brunelli and T. Poggio, "Face Recognition: Features versus Templates", IEEE Trans, on PAMI, 1993]
- or a combination thereof [see. for particular embodiments. Snoek et al,
"MediaMill at TRECVID 2013: Searching Concepts, Objects, Instances and Events in Video, " in Proceedings of the 11 th TRECVID Workshop, Gaithersburg, USA, 2013.] .
In this respect scene recognition relates to processing an image. In such processing, a setting, object, event or a combination thereof is identified. In order to process the image or images after scene recognition, in an embodiment a label, identifier or hash is applied to the image. In this respect, in an embodiment such a label or identified relates to or correlates to the result of the scene recognition.
The scene recognition allows for instance recognition of known child sexual abuse images.
The scene recognition for instance allows:
Course grained recognition of scenes such as indoor or outdoor, food, people, sunsets, mountains, dogs, and so on.
Fine grained recognition of leaves from hundreds of plant species or of different dog types such as shepherds, afghan hounds, terriers, spaniels, American foxhounds, and so on.
Recognition of acts or recognition of relations between objects such as a person changing a car tyre, individuals performing a wedding ceremony, a person making a sandwich, a person cleaning an appliance, a team while rock climbing.
Recognition of book covers or wine labels.
Recognition of known objects such as license plates and traffic signs.
Based upon the results of scene recognition algorithms, an action is performed on said image. In an embodiment, said action is selected from the group consisting of scene modification comprising adapting at least part of said scene, of modifying said image into a modified image, of blocking storage of said image, of blocking display of said image, of erasing said image from said memory, of encrypting said image, and a combination thereof.
In an embodiment, the family of filters describes above, and provided by popular apps, or combinations thereof, can be applied.
The actions, in particular the image modification algorithms, can be used in realtime to adapt an image. Also or in combination, the actions, in particular the image
modification algorithms, can be applied to a time sequence of images, for instance images forming a video film being recorded, in particular while filming. In other or related embodiments, the action of image modification may be performed before an image or sequence of images is displayed, broadcasted or stored. In this respect, the image recognition may be performed on all images that are captured and presented in a live preview, or for instance performed on a subset of the captured images from that time sequence, and the action may be performed on each of images that is displayed in the preview.
In the application, reference may be made to a server. Such a server may be one server device, for instance a computer device, located at a location. Alternatively, a server may refer to at least one server device, connected via one or more data connections, at the same location and/or located at remote, in particular physically/geographically remote locations.
In an image recording device, an image sensor captures an image. Currently, an image sensor often is a CMOS device, but also other devices may be considered. These image sensors may also be referred to as spatial images sensors. These sensors allow capturing of one or more at least two-dimensional images.
In current technology, a captured image is clocked out or read out of the image sensor, and digitized into a stream of digital values representing a digital pixel image. In some case, the image recording device may comprise an image processor for providing some basic processing and temporary storage of a captured image. Examples of the pre-processing comprise color correction, white balancing, noise reduction, and even image conversion for converting and/or compressing an image into a different digital file format.
In an image display device, an image, a set of images or a sequence of images is stored into a memory, and may be converted for allowing to be displayed. The image display device may comprise a display screen, for instance an OLED panel, an LCD panel, or the like, or may comprise a projector for projecting a picture or a film on a remote screen. Often, the image, the set of images or the sequence of images encoded or decoded.
In an embodiment of the current invention, the image or at least a subset of the set of images or of the sequence of images is subjected to the scene recognition algorithms and resulting identifiers are provided. Based upon an identifier, one of the
actions is performed on the image or the set of images or the sequence of images following and/or including the image that is provided with the specific identifier. In particular, the actions are performed before an image, the set of images or the sequence of images is presented to a user via the display panel or projector.
Image recording and image display may be combined. Many image recording devices also comprise a display that allows a direct view of images while being captured in real-time. Thus, the display functions as a viewer, allowing a user to compose an image composition. Once the user selects, for instance shoots a picture, or films a piece of film, the image sensor captures an image or a sequence of images. That image is then pre-processed by the image processor, and stored in a memory. Often, the captured image is also displayed on the display. There, a user may manually apply further image processing, like filtering, red-eye reduction, and the like.
Scene recognition and even the image modification action may be performed before an image or images are provided for preview, displayed, or stored.
In another mode, the image recording device may be in a so-called 'burst mode', or 'continuous capture mode', allowing a video to be captured. In this 'burst mode', at a video frame rate images are being captured, providing a film. Often, such a frame rate is at least 20 frames per second (fps), in particular at least 30 fps.
The device relates to a time sequence of images. An example of a time sequence of images is the recording of a film. Another example is a functionally live view though a viewer of a digital camera. In particular when a digital viewer is used, a functionally live sequence of images is displayed via the viewer. The device may for instance apply the action on each of the images that are displayed on the viewer. The time sequence of images may have a time base. The time between the images may be constant, like for instance in a film. The time sequence of images may also comprise subsequent bursts of images, each burst having the same of different time between subsequent bursts.
In an embodiment, the action comprises an action on a subset of images from said time sequence of images, said subset including said image. The scene recognition may for instance be done on an image. Subsequently, images that in time follow or precede the image may be processed using the action. Thus, if the time between images that are subjected to scene recognition is relatively small, for instance small with respect to the vision capabilities of a human, for instance a time interval smaller
than 0.2 seconds, and a following set of images within this time interval is processed, then an almost constant visual sequence of images is processed.
In an embodiment, the device is adapted for performing scene recognition on at least a subset of said time sequence of images. For instance a set of continuous images can be subjected to scene recognition. Alternatively, each n-th image can be subjected to scene recognition.
In an embodiment, the device allows the action to be dependent upon the result of the scene recognition.
In an embodiment, the device is adapted for providing an identifier based upon the result of said scene recognition. An identifier can be a number or a letter. An identifier may also be another type of label, for instance allowing the application of a hash function. In a further embodiment, if said identifier matches a predefined identifier, based upon the identifier, the device performs an action on said images. Thus, for instance, if the scene, object or event changes, it may be possible to also change the action in response of the change. The action may be selected from the group consisting of image modification comprising adapting at least part of said image, of modifying said image into a modified image, of blocking storage of said image, of erasing said image from said memory, of encrypting said image, and a combination thereof.
In an embodiment, the time sequence of images is selected from the group of a sequence of live images and a sequence of images forming a video film. One image or all the images of the entire sequence may be subjected to scene recognition.
In an embodiment, the scene recognition comprises applying an algorithm selected from the group consisting of calculating the unique digital signature of an image and then matching that signature against those of other photos, of discriminative feature mining, of contour-based shape descriptors, of deep Fisher networks, of Bag of Words, of support vector machines, of deep learning, of face detection, of template matching based on the characteristic shapes and colors of objects, and a combination thereof.
In an embodiment, the modifying said image comprises blurring at least a part of said image. For instance, a part of a scene that has been recognized, an object in the scene that has been recognized, or an event in the scene that has been recognized may be blurred. It may thus be possible to blur parts before displaying or before
(permanent) storage. Thus, it may be possible to provide an image recorder, digital camera or computer display that cannot record or display unwanted scenes and events and/or objects within scenes.
In an embodiment, the action is image processing by applying photographic filters. As mentioned, examples of these filters are filters adjust locally or globally in the image at least one selected from the intensity, hue, saturation, contrast, color curves per red, green or blue color channel. These filters may apply color lookup tables. These filters may overlay one or more masking filters such as a vignetting mask (darker edges and corners), crop the image to adjust the width and height, or add borders to the images. In an embodiment, these filters are selected from the group of Rise filter, Hudson filter, Sierra filter, Lo-Fi filter, Sutro filter, Brannan filter, Inkwell filter, Hefe filter, and a combination thereof.
In an embodiment, the device comprises an image sensor adapted for capturing an image, in particular said series of images forming a film, wherein said scene recognition is performed on said image, and said action is performed on said captured image, in particular before a next image is captured.
In an embodiment, the device comprises a data storage, wherein said device is adapted for performing said action is before record said image in said data storage. Such data storage may comprise a hard disk, solid state disk (SSD), but may also relate to external storage, for instance remote external storage like cloud storage.
In an embodiment, the device comprises a display for displaying said image, wherein said device is adapted for performing said action before displaying said image.
In an embodiment, the invention relates to an imaging system comprising an image sensor for capturing an image, a memory for storing said image, and the device of the invention.
In an embodiment, the invention relates to an image display system, comprising a memory for receiving an image for displaying, a display for displaying said image, and the device of the invention.
The invention further relates to a computer program comprising software code portions which, when running on a data processor, configure said data processor to:
- retrieve an image from a memory;
- perform scene recognition on said image, and
- based upon the result of said scene recognition performs an action selected from the group consisting of image modification comprising adapting at least part of said image, of modifying said image into a modified image, of blocking storage of said image, of erasing said image from said memory, of encrypting said image, and a combination thereof.
The invention further pertains to a data carrier provided with this computer program.
The invention further pertains to a signal carrying at least part of this computer program.
The invention further pertains to a signal sequence representing a program for being executed on a computer, said signal sequence representing this computer program.
The invention further pertains to a method for processing a live sequence of images, said method comprising performing scene recognition on at least a set of images of said sequence of images, and based upon the result of said scene recognition, perform an action on subsequent images of said sequence of images. In an embodiment, said action comprises image modification comprising adapting at least part of said image.
In an embodiment, said action comprises modifying said image into a modified image.
In an embodiment, said action comprises blocking storage of said image.
In an embodiment, said action comprises erasing said image from said memory.
In an embodiment, said action comprises encrypting said image.
These actions may be combined.
In an embodiment, the method further comprising providing an identifier based upon the result of said scene recognition.
In an embodiment, the method further comprises if said identifier matches a predefined identifier, based upon the identifier, perform an action on subsequent images of said sequence of images, said action selected from the group consisting of image modification comprising adapting at least part of said image, of modifying said image into a modified image, of blocking storage of said image, of erasing said image from said memory, of encrypting said image, and a combination thereof.
The invention further pertains to a method for processing a set of images, said method comprising performing scene recognition on at least a subset of images of said set of images, and based upon the result of said scene recognition, perform an action on subsequent images of said sequence of images. In an embodiment said action comprises image modification. In an embodiment said action is selected from the group consisting of image modification comprising adapting at least part of said image, of modifying said image into a modified image, of blocking storage of said image, of erasing said image from said memory, of encrypting said image, and a combination thereof.
Thus, in this embodiment, actions on a large set of images or on a database of images can be automated.
The term "substantially" herein, like in "substantially consists", will be understood by and clear to a person skilled in the art. The term "substantially" may also include embodiments with "entirely", "completely", "all", etc. Hence, in embodiments the adjective substantially may also be removed. Where applicable, the term "substantially" may also relate to 90% or higher, such as 95% or higher, especially 99% or higher, even more especially 99.5% or higher, including 100%. The term "comprise" includes also embodiments wherein the term "comprises" means "consists of.
The term "functionally", when used for instance in "functionally coupled" or
"functionally direct communication", will be understood by and clear to a person skilled in the art. The term "substantially" may also include embodiments with "entirely", "completely", "all", etc. Hence, in embodiments the adjective substantially may also be removed. Thus, for instance "functionally direct communication" comprises direct, live communication. It may also comprise communication that, from a perspective of the parties' communication, is experienced as "live". Thus, like for instance Voice Over IP (VOIP), there may be a small amount of time between various data packages comprising digital voice data, but these amounts of time are so small that for users it seems as if there is an open communication line or telephone line available.
Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily
for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
The devices or apparatus herein are amongst others described during operation.
As will be clear to the person skilled in the art, the invention is not limited to methods of operation or devices in operation.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb "to comprise" and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device or apparatus claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The invention further applies to an apparatus or device comprising one or more of the characterizing features described in the description and/or shown in the attached drawings. The invention further pertains to a method or process comprising one or more of the characterizing features described in the description and/or shown in the attached drawings.
The various aspects discussed in this patent can be combined in order to provide additional advantages. Furthermore, some of the features can form the basis for one or more divisional applications. Brief description of the drawings
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:
FIG. 1 schematically depicts a device for processing a time sequence of images; FIG. 2 schematically depicts an imaging system;
FIG. 3 schematically depicts a display system;
FIG. 4 depicts a camera applying a photographic filter on an outdoor scene; FIG. 5 depicts a camera applying a photographic filter on a portrait;
FIG. 6 depicts a camera which blocks the recording of an unwanted event, and FIG. 7 depicts a display screen device which blocks the scene of an unwanted event.
The drawings are not necessarily on scale.
Description of preferred embodiments
Figure 1 schematically depicts a device which receives digitized images through module 201. The image or images are a representation of scene 100. These images are stored in a temporary memory 202. Next, the image or images are subjected to scene recognition in module 203. Based on the result of the scene recognition in module 204, an identifier 205 may be provided to the images. An action alters the images in module 206, and/or identifier 205' prevents the altering of the images and stores the images in a temporary memory 202 which. By then, the images are representing scene 100'. In this altered scene 100', parts of the scene may be blurred.
Figure 2 schematically depicts an imaging system which captures images through camera 200. These images represent scene 100. The images are stored in a temporary memory 202. Next, these images are subjected to scene recognition in module 203. Based on the result of the scene recognition in module 204, an identifier 205 may be provided to the images. Based upon the identifier, one or more actions may be performed on the images in module 206. For instance, identifier 205' may prevent the altering of the images. Next, the images may be stored in a temporary memory 202 and recorded in module 207 where the images, by then, are representing scene 100' .
Figure 3 schematically depicts a display system which receives digitized images through module 201. These images represent scene 100. The images may be stored in a temporary memory 202. Next, scene recognition is applied in module 203. Based on the result of the scene recognition in module 204 an identifier 205 may be provided to the images. An action may be performed on the images in module 206, and/or
identifier 205' prevents the altering of the images. Next, images may be stored in a temporary memory 202 and displays the images on screen 210. By then, the images may represent a scene 100'.
Figure 4 depicts a camera 200 which recognizes an outdoor scene 101. The camera automatically applies a specific photographic filter on the captured images of scene 101. The modified images are then displayed on the viewer of camera 200 which shows the aesthetically enhanced scene 101 '. Additional, camera 200 allows for instance blurring of part of a scene. Unwanted parts of a scene can be blurred functionally life. Thus, a viewer will not be confronted with unwanted scenes.
Figure 5 depicts a camera 200 which recognizes a portrait scene 102. The camera automatically applies a specific photographic filter on the captured images of scene 102 and displays the modified images on the viewer of camera 200 which shows the aesthetically enhanced scene 102'. The camera 200 thus allows an action on a functionally live image or on a sequence of live images.
Figure 6 schematically shows a camera 200 which recognizes an unwanted event
103. Next, camera 200 automatically blocks the captured images of event 103 and does not record the event on camera 200. For instance, it can be prevented that children see horrible details in a film. The scene recognition thus in fact each time interprets an image and identifies the unwanted part. It then allows blocking or altering or blurring, for instance, of that unwanted part. Blocking may even be done if such an unwanted part or object or event is present in the scene during playing a movie of film. This is even possible when the object displaces in the scene, or the event changes. Thus, scene recognition provides for instance an interpretation of objects in their surrounding or in events and interprets them in an almost human intelligent way.
Figure 7 depicts a display screen device 210, which recognizes an unwanted event 103. The display screen device automatically erases the incoming images of event 103 and does not show the event on display screen 210 or display panel of the display screen device 210.
It will also be clear that the above description and drawings are included to illustrate some embodiments of the invention, and not to limit the scope of protection. Starting from this disclosure, many more embodiments will be evident to a skilled person. These embodiments are within the scope of protection and the essence of this
invention and are obvious combinations of prior art techniques and the disclosure of this patent.