WO2012085330A1 - Picture rotation based on object detection - Google Patents

Picture rotation based on object detection Download PDF

Info

Publication number
WO2012085330A1
WO2012085330A1 PCT/FI2011/050971 FI2011050971W WO2012085330A1 WO 2012085330 A1 WO2012085330 A1 WO 2012085330A1 FI 2011050971 W FI2011050971 W FI 2011050971W WO 2012085330 A1 WO2012085330 A1 WO 2012085330A1
Authority
WO
WIPO (PCT)
Prior art keywords
orientation
picture
angle
cause
evaluation
Prior art date
Application number
PCT/FI2011/050971
Other languages
French (fr)
Inventor
Veldandi Muninder
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Publication of WO2012085330A1 publication Critical patent/WO2012085330A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof

Definitions

  • the present invention generally relates to picture rotation and, more particularly, relates to picture rotation based on detection of an object such as a face in the picture.
  • the user may rotate the device at an angle (e.g., ninety degrees).
  • the rotated shots of the person/object may be at the rotated angle instead of the more-desirable original angle.
  • the viewer of the image or video may therefore have a less desirable user experience.
  • example embodiments of the present invention provide an improved apparatus, method and computer-readable storage medium for rotating one or more pictures, or more particularly in one instance one or more frames of respective picture(s), based on an angle of an object detected in the picture(s).
  • One aspect of example embodiments of the present invention is directed to an apparatus including at least one processor and at least one memory including computer program code.
  • the memory/memories and computer program code are configured to, with processor(s), cause the apparatus to at least perform a number of operations.
  • the apparatus is caused to receive multimedia content including a picture, and receive an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference.
  • the angle of orientation of the object depicted in the picture is stored with the picture, and in this example, the apparatus may be caused to read the angle of orientation of the object from the picture.
  • the apparatus being caused to receive an indication of an angle of orientation may include being caused to detect an object depicted in the picture, and determine an angle of orientation of the detected object.
  • the apparatus is also caused to direct rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.
  • the apparatus may be caused to divide the picture into a plurality of portions, and perform an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object. The apparatus may then be caused to detect an object depicted in the picture and determine an angle of orientation of the detected object based on the evaluation.
  • the apparatus being caused to perform an evaluation of each of the portions may include the apparatus being caused to calculate a classification score for each portion for the classification function at each of the plurality of angles of orientation of an object.
  • the apparatus being caused to detect an object depicted in the picture and determine an angle of orientation of the detected object may include the apparatus being caused to select the portion and angle of orientation having the largest classification score.
  • the apparatus may be further caused to perform a second evaluation of each of the portions based on a plurality of classification functions at the determined angle of orientation. In these instances, the apparatus may be caused to further detect the object depicted in the picture based on the second evaluation.
  • FIG. 1 is a block diagram of a system, in accordance with example embodiments of the present invention.
  • FIG. 2 is a schematic block diagram of the apparatus of the system of FIG. 1, in accordance with example embodiments of the present invention
  • FIG. 3a is a schematic diagram of a picture illustrating a face that may be detected in accordance with one example embodiment of the present invention
  • FIG. 4and FIG. 5 are flowcharts illustrating various operations in methods of rotating or otherwise directing rotation of one or more pictures of multimedia content based on an orientation of an object such as a face depicted in the respective picture(s), according to example embodiments of the present invention
  • FIG. 6 is a schematic diagram illustrating selection of an angle of orientation of a face from a number of different angles of orientation, in accordance with one example embodiment of the present invention
  • FIG. 7 is a schematic diagram illustrating detection of a face in a picture, in accordance with example embodiments of the present invention.
  • the terms “data,” “content,” “information,” and similar terms may be used interchangeably, according to some example embodiments of the present invention, to refer to data capable of being transmitted, received, operated on, and/or stored.
  • the term “network” may refer to a group of interconnected computers or other computing devices. Within a network, these computers or other computing devices may be interconnected directly or indirectly by various means including via one or more switches, routers, gateways, access points or the like.
  • circuitry refers to any or all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry); (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software (including digital signal processor(s)), software and memory/memories that work together to cause an apparatus, such as a mobile phone or server, to perform various functions); and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • circuitry applies to all uses of this term in this application, including in any claims.
  • circuitry applies to all uses of this term in this application, including in any claims.
  • circuitry would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware.
  • circuitry would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.
  • various messages or other communication may be transmitted or otherwise sent from one component or apparatus to another component or apparatus. It should be understood that transmitting a message or other communication may include not only transmission of the message or other communication, but may also include preparation of the message or other communication by a transmitting apparatus or various means of the transmitting apparatus.
  • system, method and computer-readable storage medium of exemplary embodiments of the present invention will be primarily described without respect to the environment within which the system, method and computer-readable storage medium operate. It should be understood, however, that the system, method and computer-readable storage medium may operate in a number of different environments, including mobile and/or fixed environments, wireline and/or wireless environments, standalone and/or networked environments or the like.
  • the system, method and computer-readable storage medium of exemplary embodiments of the present invention can operate in mobile communication environments whereby mobile terminals operating within one or more mobile networks include or are otherwise in communication with one or more sources of multimedia content.
  • exemplary embodiments of the present invention will be described below in the context of video content including frames of pictures. It should be understood, however, that example embodiments of the present invention may be generally applicable to other forms of multimedia content including one or more pictures or sequences of pictures.
  • the system 100 includes a content source 102, processing apparatus 104 and display apparatus 106.
  • the content source can include, for example, an image capture device (e.g., camera, camcorder), camera phone, video cassette recorder (VCR), digital versatile disc (DVD) player, a multimedia file stored in memory or downloaded from a network, or the like.
  • each video provided by the content source may include a plurality of frames, each of which may include an image, picture, slice or the like (generally referred to as a "picture") of a shot or scene (generally referred to as a "scene”) that may or may not depict one or more objects.
  • the content source can be configured to provide multimedia content in a number of different formats.
  • suitable formats include, for example, Third Generation Platform (3GP), AVI (Audio Video Interleave), Windows Media®, MPEG (Moving Pictures Expert Group), QuickTime®, RealVideo®, Shockwave® (Flash®) or the like.
  • the processing apparatus 104 can comprise any of a number of different components configured to process multimedia content from the content source according to example embodiments of the present invention.
  • the display apparatus 106 can comprise any of a number of different components configured to display or otherwise present the processed multimedia content from the content source according to example embodiments of the present invention.
  • a single apparatus may support multiple ones of the content source 102, processing apparatus 104 and display apparatus 106, logically separated but co-located within the respective entity.
  • a mobile terminal may support a logically separate, but co-located, content source, processing apparatus and display apparatus.
  • a camera phone may support a logically separate, but co-located, content source, processing apparatus and display apparatus.
  • FIG. 2 illustrates an apparatus 200 that may be configured to function as the processing apparatus 104 and display apparatus 106 to perform example methods of the present invention.
  • the apparatus may, be embodied as, or included as a component of, a communications device with wired or wireless communications capabilities.
  • the example apparatus may include or otherwise be in communication with one or more processors 202, memory devices 204, Input/Output (I/O) interfaces 206, communications interfaces 208 and/or user interfaces 210 (one of each being shown).
  • the processor 202 may be embodied as various means for implementing the various functionalities of example embodiments of the present invention including, for example, one or more of a microprocessor, a coprocessor, a controller, a special-purpose integrated circuit such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), DSP (digital signal processor), or a hardware accelerator, processing circuitry or other similar hardware.
  • the processor may be representative of a plurality of processors, or one or more multi-core processors, operating individually or in concert.
  • a multi-core processor enables multiprocessing within a single physical package. Examples of a multi-core processor include two, four, eight, or greater numbers of processing cores.
  • the processor may be comprised of a plurality of transistors, logic gates, a clock (e.g., oscillator), other circuitry, and the like to facilitate performance of the functionality described herein.
  • the processor may, but need not, include one or more accompanying digital signal processors (DSPs).
  • DSP digital signal processor
  • a DSP may, for example, be configured to process real- world signals in real time independent of the processor.
  • an accompanying ASIC may, for example, be configured to perform specialized functions not easily performed by a more general purpose processor.
  • the processor is configured to execute instructions stored in the memory device or instructions otherwise accessible to the processor.
  • the processor may be configured to operate such that the processor causes the apparatus to perform various functionalities described herein.
  • the processor 202 may be an apparatus configured to perform operations according to embodiments of the present invention while configured accordingly.
  • the processor is embodied as, or is part of, an ASIC, FPGA, or the like, the processor is specifically configured hardware for conducting the operations described herein.
  • the processor is embodied as an executor of instructions stored on a computer-readable storage medium
  • the instructions specifically configure the processor to perform the algorithms and operations described herein.
  • the processor is a processor of a specific device configured for employing example embodiments of the present invention by further configuration of the processor via executed instructions for performing the algorithms, methods, and operations described herein.
  • the memory device 204 may be one or more computer-readable storage media that may include volatile and/or non-volatile memory.
  • the memory device includes Random Access Memory (RAM) including dynamic and/or static RAM, on-chip or off-chip cache memory, and/or the like.
  • RAM Random Access Memory
  • the memory device may include non-volatile memory, which may be embedded and/or removable, and may include, for example, Read-Only Memory (ROM), flash memory, magnetic storage devices (e.g., hard disks, floppy disk drives, magnetic tape, etc.), optical disc drives and/or media, non-volatile random access memory (NVRAM), and/or the like.
  • the memory device may include a cache area for temporary storage of data. In this regard, at least a portion or the entire memory device may be included within the processor 202.
  • the memory device 204 may be configured to store information, data, applications, computer-readable program code instructions, and/or the like for enabling the processor 202 and the example apparatus 200 to carry out various functions in accordance with example embodiments of the present invention described herein.
  • the memory device may be configured to buffer input data for processing by the processor.
  • the memory device may be configured to store instructions for execution by the processor.
  • the memory may be securely protected, with the integrity of the data stored therein being ensured. In this regard, data access may be checked with authentication and authorized based on access control policies.
  • the I/O interface 206 may be any device, circuitry, or means embodied in hardware, software or a combination of hardware and software that is configured to interface the processor 202 with other circuitry or devices, such as the communications interface 208 and/or the user interface 210.
  • the processor may interface with the memory device via the I/O interface.
  • the I/O interface may be configured to convert signals and data into a form that may be interpreted by the processor.
  • the I/O interface may also perform buffering of inputs and outputs to support the operation of the processor.
  • the processor and the I/O interface may be combined onto a single chip or integrated circuit configured to perform, or cause the apparatus 200 to perform, various functionalities of an example embodiment of the present invention.
  • the communication interface 208 may be any device or means embodied in hardware, software or a combination of hardware and software that is configured to receive and/or transmit data from/to one or more networks 212 and/or any other device or module in communication with the example apparatus 200.
  • the processor 202 may also be configured to facilitate communications via the communications interface by, for example, controlling hardware included within the communications interface.
  • the communication interface may include, for example, one or more antennas, a transmitter, a receiver, a transceiver and/or supporting hardware, including, for example, a processor for enabling communications.
  • the example apparatus may communicate with various other network elements in a device-to-device fashion and/or via indirect communications.
  • the communications interface 208 may be configured to provide for
  • the communications interface may be configured to support communications in multiple antenna environments, such as multiple input multiple output (MIMO)
  • MIMO multiple input multiple output
  • the communications interface may be configured to support orthogonal frequency division multiplexed (OFDM) signaling.
  • OFDM orthogonal frequency division multiplexed
  • the communications interface may be configured to communicate in accordance with various techniques including, as explained above, any of a number of second generation (2G), third generation (3G), fourth generation (4G) or higher generation mobile communication technologies, radio frequency (RF), infrared data association (IrDA) or any of a number of different wireless networking techniques.
  • 2G second generation
  • 3G third generation
  • 4G fourth generation
  • RF radio frequency
  • IrDA infrared data association
  • communications interface may also be configured to support communications at the network layer, possibly via Internet Protocol (IP).
  • IP Internet Protocol
  • the user interface 210 may be in communication with the processor 202 to receive user input via the user interface and/or to present output to a user as, for example, audible, visual, mechanical or other output indications.
  • the user interface may include, for example, a keyboard, a mouse, a joystick, a display (e.g., a touch screen display), a microphone, a speaker, or other input/output mechanisms.
  • the processor may comprise, or be in communication with, user interface circuitry configured to control at least some functions of one or more elements of the user interface.
  • the processor and/or user interface circuitry may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor (e.g., the memory device 204).
  • the user interface circuitry is configured to facilitate user control of at least some functions of the apparatus 200 through the use of a display (e.g., display apparatus 106) and configured to respond to user inputs.
  • the processor may also comprise, or be in communication with, display circuitry configured to display at least a portion of a user interface, the display and the display circuitry configured to facilitate user control of at least some functions of the apparatus.
  • the apparatus 200 of example embodiments may be implemented on a chip or chip set.
  • the chip or chip set may be programmed to perform one or more operations of one or more methods as described herein and may include, for instance, one or more processors 202, memory devices 204, I/O interfaces 206 and/or other circuitry components incorporated in one or more physical packages (e.g., chips).
  • a physical package includes an arrangement of one or more materials, components, and/or wires on a structural assembly (e.g., a baseboard) to provide one or more characteristics such as physical strength, conservation of size, and/or limitation of electrical interaction. It is contemplated that in certain embodiments the chip or chip set can be implemented in a single chip.
  • the chip or chip set can be implemented as a single "system on a chip.” It is further contemplated that in certain embodiments a separate ASIC may not be used, for example, and that all relevant operations as disclosed herein may be performed by a processor or processors.
  • a chip or chip set, or a portion thereof, may constitute a means for performing one or more operations of one or more methods as described herein.
  • the chip or chip set includes a communication mechanism, such as a bus, for passing information among the components of the chip or chip set.
  • the processor 202 has connectivity to the bus to execute instructions and process information stored in, for example, the memory device 204.
  • the processors may be configured to operate in tandem via the bus to enable independent execution of instructions, pipelining, and multithreading.
  • the chip or chip set includes merely one or more processors and software and/or firmware supporting and/or relating to and/or for the one or more processors.
  • Example embodiments of the present invention therefore provide an improved apparatus, method and computer-readable storage medium for rotating one or more pictures of multimedia content based on an angle of an object detected in the picture(s).
  • the detected object is the face of a person or animal depicted in the picture(s). It should be understood, however, that the detected object may be any of a number of different objects that have one or more features that distinguish the objects from other objects. Examples of other suitable objects include the entireties or portions of persons, animals, plants, buildings, automobiles or the like.
  • FIG. 3a is a schematic diagram of a picture 300 illustrating a face 302 that may be detected in accordance with one example embodiment.
  • One or more frames in the multimedia content subsequent to the illustrated frame may also include the face.
  • the face may be detected in one frame and thereafter tracked in one or more subsequent frames.
  • Example embodiments may, but need not, highlight the detected face such as by placing a box 304 around the detected face.
  • the box may refer to a rectangular or other shaped outline configured to identify a region including the face in the frame. As the face is tracked in subsequent frames, the box may follow movement of the face in the respective subsequent frames.
  • the face 302 may be oriented at an angle ⁇ with respect to a frame of reference R.
  • the face and its angle of orientation may be detected, and the picture including the face may be correspondingly rotated so that its orientation aligns with the frame of reference.
  • the angle of orientation (or simply orientation) of the face may be detected as an intermediate operation to a multi-stage detection of the face. It should therefore be understood that the face need not be detected with a high degree of accuracy.
  • a face in a picture may be detected in accordance with any of a number of different techniques.
  • One example technique referred to as the Viola- Jones method, is disclosed in U.S. Patent No. 7,099,510, the content of which is hereby incorporated by reference in its entirety.
  • a number of suitable face detection techniques employ binary classification in which portions of the picture may be transformed into one or more features, from which a classification function may be trained on example faces to determine whether or not the respective portions of the picture include a face.
  • These features may number from one feature up to hundreds of features, and the features may more simply include, for example, a pair of eyes, mouth or the like; generally, the larger number the number of features, the more accurate the detection.
  • These face detection techniques may scan the picture to classify the portions of the picture, at all locations and at one or more scales, as either including a face or constituting a non-face portion. More particularly, for example, the face detection technique may calculate classification scores for the portions of the pictures, and classify the portions having scores above a predetermined threshold as including a face, with the portions having scores at or below the predetermined threshold being classified as non-face portions.
  • the classification function of example embodiments of the present invention may be trained on sets of example faces positioned in their respective pictures at each of a plurality of different orientations with respect to a frame of reference R.
  • the classification function may be trained on a set of example faces positioned at a first angle, a set of example faces positioned at a second angle, a set of example faces positioned at a third angle, and so forth.
  • the number of orientations x may be selected in a number of different manners.
  • the distance between orientations may be selected in a number of different manners, and may be uniform across all of the orientations or may differ between one or more pairs of orientations.
  • the classification function may therefore be considered to have been trained at each of a plurality of angles of orientation of the face.
  • This training may be performed by the processing apparatus 104 or another device (e.g., another device as which the apparatus 200 may be configured to function).
  • FIG. 4 illustrates various operations in a method of rotating or otherwise directing rotation of one or more pictures of multimedia content based on an orientation of an object such as a face depicted in the respective picture(s), according to example embodiments of the present invention. It may be understood that for describing the method of FIG. 4, references may be made to FIGS. 3a and 3b for purposes of example.
  • the method may include means such as the processor 202 for receiving a picture 300 such as in a frame of multiple frames of multimedia content. Then, as shown in block 402, the method may include means such as the processor for dividing the picture into a plurality of equally-sized portions (referred to as "detection windows"), each of which may include a set of M x N pixels in the picture of W x H pixels.
  • the detection windows may have any of a number of different sizes relative to the size of the picture; but in many instances, the detection window is smaller than the picture (i.e., M, N ⁇ W, H).
  • the picture may be divided into the detection windows in a number of different manners, but in one example, the picture may be divided so that the resulting detection windows cover most, if not all, possible sets of M x N pixels. In these instances, many detection windows may at least partially if not substantially overlap.
  • the method may include means such as the processor 202 for evaluating the detection windows of the picture based on the classification function at each of a plurality of different angles of orientation.
  • each detection window may be evaluated based on the classification window at each angle of orientation (e.g., first angle, second angle, third angle, etc.) to calculate a classification score for each detection window at each angle of orientation.
  • each detection window may have x
  • the method may then include means such as the processor for determining the angle of orientation of the face such as by selecting, across the detection windows of the picture, the angle of orientation having the largest classification score, as shown in block 406.
  • FIG. 6 is a schematic diagram illustrating selection of an angle of orientation of a face from a number of different angles of orientation, in accordance with one example embodiment.
  • FIG. 6 depicts a schematic representation of a logical sequence 600 for selecting the angle of orientation of a face depicted in a picture.
  • the detection windows are depicted as block 602 in FIG. 6.
  • the detection windows may be evaluated based on the classification function for each of a number of different angles of orientation.
  • block 604 refers to evaluation of the detection windows based on the classification function at a first angle.
  • block 606 refers to evaluation of the detection windows based on the classification function at a second angle; and block 608 refers to evaluation of the detection windows based on the classification function at the xth angle.
  • the detection windows may be evaluated for each of a number of different angles of orientation over the classification function.
  • the angle of orientation having the largest classification score then, may be determined as the angle of orientation of a face of the picture (the selected angle being shown as a), as referenced by block 610.
  • the detection window having the largest classification score may in various instances also be considered to include a face, and may therefore be the detected face.
  • the method may include means such as the processor 202 for directing rotation of the picture based on the angle, such as to align the orientation of the face with the frame of reference. Given a selected angle of orientation of a, the picture may be rotated by the negative of the selected angle (-a) so that the rotated angle of orientation of the face is approximately if not absolutely 0°.
  • the method may (but need not) include means such as the processor 202 for further detection of a face in the picture, as shown in block 410.
  • This further detection may be accomplished in any of a number of different manners, but in one example embodiment, may be accomplished by evaluating one or more of the detection windows based on a plurality of classification functions, which may include the aforementioned classification function ("initial classification function") and one or more additional classification functions.
  • These additional classification function(s) may be trained in a manner similar to the initial classification function, including training the additional classification function(s) for different angles of orientation.
  • the additional classification function(s) though, may be trained on different features or different numbers of features than the initial classification function.
  • the further detection may be performed by evaluation of the picture through a cascade of a number of classification functions of increasing numbers of features.
  • FIG. 7 depicts a schematic representation of a logical sequence 700 detecting a face in the picture, in accordance with example embodiments of the present invention.
  • the detection windows may again be depicted as block 602.
  • the detection windows may be evaluated over n classification functions at the selected angle of orientation a.
  • a first classification function may correspond to the classification function from which the selected angle of orientation is selected; although in various instances, the first classification function may be a different classification function.
  • block 702 refers to evaluation of the detection windows based on a first classification function at the selected angle of orientation, including calculation of a classification score for each of the detection windows.
  • the detection window may be passed to the next classification function for evaluation.
  • the classification score of a detection window is less than or equal to the predetermined threshold - indicating a lower likelihood of a face in the respective detection window (depicted as "NO" in the FIG.
  • the detection window may be categorized as a non-face portion of the picture in block 704, and evaluation of the respective detection window may cease. It should be understood, though, that depending on how the thresholds are calculated during training of the classification functions, the above inequalities may be reversed to distinguish a higher likelihood and lower likelihood of a face in a detection window.
  • the detection window may be similarly evaluated by the next classification function.
  • the detection window may be evaluated for the remaining classification functions until the nth classification function depicted by block 706.
  • evaluation of the respective detection window may cease, and the detection window may be categorized as a non-face portion as depicted in block 704.
  • the face may be detected in the set of pixels at block 708.
  • the detection of the face may be performed in pictures of a plurality of frames of multimedia content.
  • the method may repeat for successive frames of the content. In an instance in which the same face is depicted in successive frames, this may permit tracking of the face in the content.
  • performing the method across a number of frames may result in various frames being rotated different angles relative to various other frames, depending on the angle of orientation of a detected face in the respective frames.
  • a picture including a face may be rotated after detection of the face. It should be understood that the picture may be rotated at any time after detection of the face.
  • the face may be detected and picture rotated in continuous operations.
  • the face may be detected at some time prior to rotation of the picture.
  • the face may be detected and its angle of orientation may be selected or otherwise determined, and the angle of orientation may be stored with the picture - such as in metadata of the picture.
  • the picture and its metadata may be later received, the metadata read and picture rotated in accordance with the angle stored in the metadata.
  • the face may be detected and angle of orientation determined by a source 102, and the angle may be rotated by a processing apparatus 104 such as for output to a user interface (e.g., user interface 210).
  • FIG. 5 is a flowchart that illustrates various operations in a method of rotating or otherwise directing rotation of one or more pictures of multimedia content, according to another example embodiment of the present invention.
  • the method may include means such as the processor 202 for receiving and dividing a picture of multimedia content into detection windows, evaluating the detection windows of the picture, and determining the angle of orientation of the face based on the evaluation, such as in a manner similar to that described above with respect to respective ones of blocks 400, 402, 404 and 406.
  • the method may include means such as the processor 202 for storing or directing storage of the angle of orientation with the picture, such as in metadata of the picture. In appropriate instances, the method may then repeat for any other pictures of the multimedia content.
  • the method may include means such as the processor 202 of the same or another apparatus, for receiving the picture and its stored angle of orientation may be received, and reading the angle of orientation of a face in the picture, as shown in blocks 510 and 512 of FIG. 5b.
  • the method may include means such as the processor for directing rotation of the picture based on the angle, such as to align the orientation of the face with the frame of reference. This rotation may be accomplished in a manner similar to that described above with respect to block 408.
  • the method may (but need not) include means such as the processor 202 for further detection of a face in the picture, as shown in block 516. This further detection may be accomplished in any of a number of different manners, such as in a manner similar to that described above with respect to block 410. And again, in appropriate instances, the method may then repeat for any other pictures of the multimedia content.
  • the picture(s) may be evaluated based on one or more further parameters that may facilitate realization of a more robust evaluation of the face(s).
  • Examples of these parameters may include, for example, the scale or scales of a face that may be detected within one frame or from one frame to the next successive frame, step size between neighboring sets of pixels within one frame or from one frame to the next successive frame, starting point of the detection window from the picture of one frame to that of the next successive frame, or the like. Additionally, one or more further processing operations may be performed on the picture(s), such as application of a skin filter and/or a texture filter.
  • a skin filter and/or a texture filter.
  • functions performed by the processing apparatus 104 and/or apparatus 200 may be performed by various means. It will be understood that each block or operation of the flowcharts, and/or combinations of blocks or operations in the flowcharts, can be implemented by various means. Means for implementing the blocks or operations of the flowcharts, combinations of the blocks or operations in the flowcharts, or other functionality of example embodiments of the present invention described herein may include hardware, alone or under direction of one or more computer program code instructions, program instructions or executable computer- readable program code instructions from a computer-readable storage medium.
  • program code instructions may be stored on a memory device, such as the memory device 204 of the example apparatus, and executed by a processor, such as the processor 202 of the example apparatus.
  • any such program code instructions may be loaded onto a computer or other programmable apparatus (e.g., processor, memory device, or the like) from a computer-readable storage medium to produce a particular machine, such that the particular machine becomes a means for implementing the functions specified in the flowcharts' block(s) or operation(s).
  • These program code instructions may also be stored in a computer-readable storage medium that can direct a computer, a processor, or other programmable apparatus to function in a particular manner to thereby generate a particular machine or particular article of manufacture.
  • the instructions stored in the computer-readable storage medium may produce an article of manufacture, where the article of manufacture becomes a means for implementing the functions specified in the flowcharts' block(s) or operation(s).
  • the program code instructions may be retrieved from a computer-readable storage medium and loaded into a computer, processor, or other programmable apparatus to configure the computer, processor, or other programmable apparatus to execute operations to be performed on or by the computer, processor, or other programmable apparatus.
  • Retrieval, loading, and execution of the program code instructions may be performed sequentially such that one instruction is retrieved, loaded, and executed at a time. In some example embodiments, retrieval, loading and/or execution may be performed in parallel such that multiple instructions are retrieved, loaded, and/or executed together.
  • Execution of the program code instructions may produce a computer- implemented process such that the instructions executed by the computer, processor, or other programmable apparatus provide operations for implementing the functions specified in the flowcharts' block(s) or operation(s).

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An example apparatus is caused to receive multimedia content including a picture, and receive an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference. In this regard, the apparatus may be caused to divide the picture into a plurality of portions, perform an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object, and detect an object depicted in the picture and determine an angle of orientation of the detected object based on the evaluation. The apparatus is also caused to direct rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.

Description

PICTURE ROTATION BASED ON OBJECT DETECTION
TECHNICAL FIELD
The present invention generally relates to picture rotation and, more particularly, relates to picture rotation based on detection of an object such as a face in the picture.
BACKGROUND
The modern communications era has brought about a tremendous expansion of wireline and wireless networks. Computer networks, television networks, and telephony networks are experiencing an unprecedented technological expansion, fueled by consumer demand. Wireless and mobile networking technologies have addressed related consumer demands, while providing more flexibility and immediacy of information transfer.
Current and future networking technologies continue to facilitate ease of information transfer and convenience to users by expanding the capabilities of mobile electronic devices. One such expansion in the capabilities of mobile electronic devices relates to an ability of such devices to capture images. In fact, camera functionality has become a popular feature of devices such as mobile telephones. Mobile telephones having image capturing functionality may be referred to as camera phones. Camera phones and other mobile electronic devices such as camcorders capable of capturing images typically enable a person to capture an image which can then be saved, deleted, transmitted to another device, etc.
It is currently common for users of such devices to aim a view finder at an object. In various instances, in order to obtain the full view of a nearby person or other object, the user may rotate the device at an angle (e.g., ninety degrees). When the image or video is played back, however, the rotated shots of the person/object may be at the rotated angle instead of the more-desirable original angle. The viewer of the image or video may therefore have a less desirable user experience.
BRIEF SUMMARY
In light of the foregoing background, example embodiments of the present invention provide an improved apparatus, method and computer-readable storage medium for rotating one or more pictures, or more particularly in one instance one or more frames of respective picture(s), based on an angle of an object detected in the picture(s). One aspect of example embodiments of the present invention is directed to an apparatus including at least one processor and at least one memory including computer program code. The memory/memories and computer program code are configured to, with processor(s), cause the apparatus to at least perform a number of operations.
The apparatus is caused to receive multimedia content including a picture, and receive an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference. In one example, the angle of orientation of the object depicted in the picture is stored with the picture, and in this example, the apparatus may be caused to read the angle of orientation of the object from the picture. In another example, the apparatus being caused to receive an indication of an angle of orientation may include being caused to detect an object depicted in the picture, and determine an angle of orientation of the detected object. In either example, the apparatus is also caused to direct rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.
In instances in which the apparatus is caused to detect an object and determine an angle of orientation of the detected object, the apparatus may be caused to divide the picture into a plurality of portions, and perform an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object. The apparatus may then be caused to detect an object depicted in the picture and determine an angle of orientation of the detected object based on the evaluation.
In a more particular example, the apparatus being caused to perform an evaluation of each of the portions may include the apparatus being caused to calculate a classification score for each portion for the classification function at each of the plurality of angles of orientation of an object. In such instances, the apparatus being caused to detect an object depicted in the picture and determine an angle of orientation of the detected object may include the apparatus being caused to select the portion and angle of orientation having the largest classification score.
In various instances, the apparatus may be further caused to perform a second evaluation of each of the portions based on a plurality of classification functions at the determined angle of orientation. In these instances, the apparatus may be caused to further detect the object depicted in the picture based on the second evaluation. BRIEF DESCRIPTION OF THE DRAWINGS
Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
FIG. 1 is a block diagram of a system, in accordance with example embodiments of the present invention;
FIG. 2 is a schematic block diagram of the apparatus of the system of FIG. 1, in accordance with example embodiments of the present invention;
FIG. 3a is a schematic diagram of a picture illustrating a face that may be detected in accordance with one example embodiment of the present invention;
FIG. 4and FIG. 5 (including FIGS. 5a and 5b) are flowcharts illustrating various operations in methods of rotating or otherwise directing rotation of one or more pictures of multimedia content based on an orientation of an object such as a face depicted in the respective picture(s), according to example embodiments of the present invention;
FIG. 6 is a schematic diagram illustrating selection of an angle of orientation of a face from a number of different angles of orientation, in accordance with one example embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating detection of a face in a picture, in accordance with example embodiments of the present invention;
FIGS. 8a and 8b illustrate a picture having a detected face at an angle of orientation of Θ = 270°, and a resulting rotated picture, in accordance with an example embodiment of the present invention;
FIGS. 9a and 9b illustrate a picture having a detected face at an angle of orientation of Θ = 300°, and a resulting rotated picture, in accordance with an example embodiment of the present invention; and
FIGS. 10a and 10b illustrate a picture having a detected face at an angle of orientation of Θ = 330°, and a resulting rotated picture, in accordance with an example embodiment of the present invention.
DETAILED DESCRIPTION
Example embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. Reference may be made herein to terms specific to a particular system, architecture or the like, but it should be understood that example embodiments of the present invention may be equally applicable to other similar systems, architectures or the like.
The terms "data," "content," "information," and similar terms may be used interchangeably, according to some example embodiments of the present invention, to refer to data capable of being transmitted, received, operated on, and/or stored. The term "network" may refer to a group of interconnected computers or other computing devices. Within a network, these computers or other computing devices may be interconnected directly or indirectly by various means including via one or more switches, routers, gateways, access points or the like.
Further, as used herein, the term "circuitry" refers to any or all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry); (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software (including digital signal processor(s)), software and memory/memories that work together to cause an apparatus, such as a mobile phone or server, to perform various functions); and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
This definition of "circuitry" applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term
"circuitry" would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term "circuitry" would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.
Further, as described herein, various messages or other communication may be transmitted or otherwise sent from one component or apparatus to another component or apparatus. It should be understood that transmitting a message or other communication may include not only transmission of the message or other communication, but may also include preparation of the message or other communication by a transmitting apparatus or various means of the transmitting apparatus.
Referring to FIG. 1, an illustration of one system that may benefit from the present invention is provided. The system, method and computer-readable storage medium of exemplary embodiments of the present invention will be primarily described without respect to the environment within which the system, method and computer-readable storage medium operate. It should be understood, however, that the system, method and computer-readable storage medium may operate in a number of different environments, including mobile and/or fixed environments, wireline and/or wireless environments, standalone and/or networked environments or the like. For example, the system, method and computer-readable storage medium of exemplary embodiments of the present invention can operate in mobile communication environments whereby mobile terminals operating within one or more mobile networks include or are otherwise in communication with one or more sources of multimedia content.
As another example, the system, method and computer-readable storage medium of exemplary embodiments of the present invention will be described below in the context of video content including frames of pictures. It should be understood, however, that example embodiments of the present invention may be generally applicable to other forms of multimedia content including one or more pictures or sequences of pictures.
The system 100 includes a content source 102, processing apparatus 104 and display apparatus 106. The content source can include, for example, an image capture device (e.g., camera, camcorder), camera phone, video cassette recorder (VCR), digital versatile disc (DVD) player, a multimedia file stored in memory or downloaded from a network, or the like. In the context of video content, each video provided by the content source may include a plurality of frames, each of which may include an image, picture, slice or the like (generally referred to as a "picture") of a shot or scene (generally referred to as a "scene") that may or may not depict one or more objects. In this regard, the content source can be configured to provide multimedia content in a number of different formats. In the context of video, suitable formats include, for example, Third Generation Platform (3GP), AVI (Audio Video Interleave), Windows Media®, MPEG (Moving Pictures Expert Group), QuickTime®, RealVideo®, Shockwave® (Flash®) or the like.
Like the content source 102, the processing apparatus 104 can comprise any of a number of different components configured to process multimedia content from the content source according to example embodiments of the present invention. Further, the display apparatus 106 can comprise any of a number of different components configured to display or otherwise present the processed multimedia content from the content source according to example embodiments of the present invention.
Although shown as separate components, it should be understood that in some embodiments, a single apparatus may support multiple ones of the content source 102, processing apparatus 104 and display apparatus 106, logically separated but co-located within the respective entity. For example, a mobile terminal may support a logically separate, but co-located, content source, processing apparatus and display apparatus. In a more particular example, in various instances, a camera phone may support a logically separate, but co-located, content source, processing apparatus and display apparatus.
Reference is now made to FIG. 2, which illustrates an apparatus 200 that may be configured to function as the processing apparatus 104 and display apparatus 106 to perform example methods of the present invention. In some example embodiments, the apparatus may, be embodied as, or included as a component of, a communications device with wired or wireless communications capabilities. The example apparatus may include or otherwise be in communication with one or more processors 202, memory devices 204, Input/Output (I/O) interfaces 206, communications interfaces 208 and/or user interfaces 210 (one of each being shown).
The processor 202 may be embodied as various means for implementing the various functionalities of example embodiments of the present invention including, for example, one or more of a microprocessor, a coprocessor, a controller, a special-purpose integrated circuit such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), DSP (digital signal processor), or a hardware accelerator, processing circuitry or other similar hardware. According to one example embodiment, the processor may be representative of a plurality of processors, or one or more multi-core processors, operating individually or in concert. A multi-core processor enables multiprocessing within a single physical package. Examples of a multi-core processor include two, four, eight, or greater numbers of processing cores. Further, the processor may be comprised of a plurality of transistors, logic gates, a clock (e.g., oscillator), other circuitry, and the like to facilitate performance of the functionality described herein. The processor may, but need not, include one or more accompanying digital signal processors (DSPs). A DSP may, for example, be configured to process real- world signals in real time independent of the processor. Similarly, an accompanying ASIC may, for example, be configured to perform specialized functions not easily performed by a more general purpose processor. In some example embodiments, the processor is configured to execute instructions stored in the memory device or instructions otherwise accessible to the processor. The processor may be configured to operate such that the processor causes the apparatus to perform various functionalities described herein.
Whether configured as hardware alone or via instructions stored on a computer- readable storage medium, or by a combination thereof, the processor 202 may be an apparatus configured to perform operations according to embodiments of the present invention while configured accordingly. Thus, in example embodiments where the processor is embodied as, or is part of, an ASIC, FPGA, or the like, the processor is specifically configured hardware for conducting the operations described herein.
Alternatively, in example embodiments where the processor is embodied as an executor of instructions stored on a computer-readable storage medium, the instructions specifically configure the processor to perform the algorithms and operations described herein. In some example embodiments, the processor is a processor of a specific device configured for employing example embodiments of the present invention by further configuration of the processor via executed instructions for performing the algorithms, methods, and operations described herein.
The memory device 204 may be one or more computer-readable storage media that may include volatile and/or non-volatile memory. In some example embodiments, the memory device includes Random Access Memory (RAM) including dynamic and/or static RAM, on-chip or off-chip cache memory, and/or the like. Further, the memory device may include non-volatile memory, which may be embedded and/or removable, and may include, for example, Read-Only Memory (ROM), flash memory, magnetic storage devices (e.g., hard disks, floppy disk drives, magnetic tape, etc.), optical disc drives and/or media, non-volatile random access memory (NVRAM), and/or the like. The memory device may include a cache area for temporary storage of data. In this regard, at least a portion or the entire memory device may be included within the processor 202.
Further, the memory device 204 may be configured to store information, data, applications, computer-readable program code instructions, and/or the like for enabling the processor 202 and the example apparatus 200 to carry out various functions in accordance with example embodiments of the present invention described herein. For example, the memory device may be configured to buffer input data for processing by the processor. Additionally, or alternatively, the memory device may be configured to store instructions for execution by the processor. The memory may be securely protected, with the integrity of the data stored therein being ensured. In this regard, data access may be checked with authentication and authorized based on access control policies.
The I/O interface 206 may be any device, circuitry, or means embodied in hardware, software or a combination of hardware and software that is configured to interface the processor 202 with other circuitry or devices, such as the communications interface 208 and/or the user interface 210. In some example embodiments, the processor may interface with the memory device via the I/O interface. The I/O interface may be configured to convert signals and data into a form that may be interpreted by the processor. The I/O interface may also perform buffering of inputs and outputs to support the operation of the processor. According to some example embodiments, the processor and the I/O interface may be combined onto a single chip or integrated circuit configured to perform, or cause the apparatus 200 to perform, various functionalities of an example embodiment of the present invention.
The communication interface 208 may be any device or means embodied in hardware, software or a combination of hardware and software that is configured to receive and/or transmit data from/to one or more networks 212 and/or any other device or module in communication with the example apparatus 200. The processor 202 may also be configured to facilitate communications via the communications interface by, for example, controlling hardware included within the communications interface. In this regard, the communication interface may include, for example, one or more antennas, a transmitter, a receiver, a transceiver and/or supporting hardware, including, for example, a processor for enabling communications. Via the communication interface, the example apparatus may communicate with various other network elements in a device-to-device fashion and/or via indirect communications.
The communications interface 208 may be configured to provide for
communications in accordance with any of a number of wired or wireless communication standards. The communications interface may be configured to support communications in multiple antenna environments, such as multiple input multiple output (MIMO)
environments. Further, the communications interface may be configured to support orthogonal frequency division multiplexed (OFDM) signaling. In some example embodiments, the communications interface may be configured to communicate in accordance with various techniques including, as explained above, any of a number of second generation (2G), third generation (3G), fourth generation (4G) or higher generation mobile communication technologies, radio frequency (RF), infrared data association (IrDA) or any of a number of different wireless networking techniques. The
communications interface may also be configured to support communications at the network layer, possibly via Internet Protocol (IP).
The user interface 210 may be in communication with the processor 202 to receive user input via the user interface and/or to present output to a user as, for example, audible, visual, mechanical or other output indications. The user interface may include, for example, a keyboard, a mouse, a joystick, a display (e.g., a touch screen display), a microphone, a speaker, or other input/output mechanisms. Further, the processor may comprise, or be in communication with, user interface circuitry configured to control at least some functions of one or more elements of the user interface. The processor and/or user interface circuitry may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor (e.g., the memory device 204). In some example embodiments, the user interface circuitry is configured to facilitate user control of at least some functions of the apparatus 200 through the use of a display (e.g., display apparatus 106) and configured to respond to user inputs. The processor may also comprise, or be in communication with, display circuitry configured to display at least a portion of a user interface, the display and the display circuitry configured to facilitate user control of at least some functions of the apparatus.
In some cases, the apparatus 200 of example embodiments may be implemented on a chip or chip set. In an example embodiment, the chip or chip set may be programmed to perform one or more operations of one or more methods as described herein and may include, for instance, one or more processors 202, memory devices 204, I/O interfaces 206 and/or other circuitry components incorporated in one or more physical packages (e.g., chips). By way of example, a physical package includes an arrangement of one or more materials, components, and/or wires on a structural assembly (e.g., a baseboard) to provide one or more characteristics such as physical strength, conservation of size, and/or limitation of electrical interaction. It is contemplated that in certain embodiments the chip or chip set can be implemented in a single chip. It is further contemplated that in certain embodiments the chip or chip set can be implemented as a single "system on a chip." It is further contemplated that in certain embodiments a separate ASIC may not be used, for example, and that all relevant operations as disclosed herein may be performed by a processor or processors. A chip or chip set, or a portion thereof, may constitute a means for performing one or more operations of one or more methods as described herein.
In one example embodiment, the chip or chip set includes a communication mechanism, such as a bus, for passing information among the components of the chip or chip set. In accordance with one example embodiment, the processor 202 has connectivity to the bus to execute instructions and process information stored in, for example, the memory device 204. In instances in which the apparatus 200 includes multiple processors, the processors may be configured to operate in tandem via the bus to enable independent execution of instructions, pipelining, and multithreading. In one example embodiment, the chip or chip set includes merely one or more processors and software and/or firmware supporting and/or relating to and/or for the one or more processors.
As explained in the background section, it is currently common for users of image capture devices to rotate the device at an angle in order to obtain the full view of a nearby person or other object. When the image or video is played back, however, the rotated shots of the person/object may be at the rotated angle instead of the more-desirable original angle. The viewer of the image or video may therefore have a less desirable user experience. Example embodiments of the present invention therefore provide an improved apparatus, method and computer-readable storage medium for rotating one or more pictures of multimedia content based on an angle of an object detected in the picture(s). As described herein, the detected object is the face of a person or animal depicted in the picture(s). It should be understood, however, that the detected object may be any of a number of different objects that have one or more features that distinguish the objects from other objects. Examples of other suitable objects include the entireties or portions of persons, animals, plants, buildings, automobiles or the like.
FIG. 3a is a schematic diagram of a picture 300 illustrating a face 302 that may be detected in accordance with one example embodiment. One or more frames in the multimedia content subsequent to the illustrated frame may also include the face. The face may be detected in one frame and thereafter tracked in one or more subsequent frames. Example embodiments may, but need not, highlight the detected face such as by placing a box 304 around the detected face. The box may refer to a rectangular or other shaped outline configured to identify a region including the face in the frame. As the face is tracked in subsequent frames, the box may follow movement of the face in the respective subsequent frames.
In various instances, the face 302 may be oriented at an angle Θ with respect to a frame of reference R. FIG. 3b illustrates an example of the frame of FIG. 3a in which the picture and corresponding face are oriented at an angle Θ = 270° (or - 90°) with respect to the corresponding picture of FIG. 3a. In accordance with example embodiments, the face and its angle of orientation may be detected, and the picture including the face may be correspondingly rotated so that its orientation aligns with the frame of reference. As explained herein, the angle of orientation (or simply orientation) of the face may be detected as an intermediate operation to a multi-stage detection of the face. It should therefore be understood that the face need not be detected with a high degree of accuracy.
In accordance with example embodiments of the present invention, a face in a picture may be detected in accordance with any of a number of different techniques. One example technique, referred to as the Viola- Jones method, is disclosed in U.S. Patent No. 7,099,510, the content of which is hereby incorporated by reference in its entirety. A number of suitable face detection techniques employ binary classification in which portions of the picture may be transformed into one or more features, from which a classification function may be trained on example faces to determine whether or not the respective portions of the picture include a face. These features may number from one feature up to hundreds of features, and the features may more simply include, for example, a pair of eyes, mouth or the like; generally, the larger number the number of features, the more accurate the detection. These face detection techniques may scan the picture to classify the portions of the picture, at all locations and at one or more scales, as either including a face or constituting a non-face portion. More particularly, for example, the face detection technique may calculate classification scores for the portions of the pictures, and classify the portions having scores above a predetermined threshold as including a face, with the portions having scores at or below the predetermined threshold being classified as non-face portions.
The classification function of example embodiments of the present invention may be trained on sets of example faces positioned in their respective pictures at each of a plurality of different orientations with respect to a frame of reference R. Thus, the classification function may be trained on a set of example faces positioned at a first angle, a set of example faces positioned at a second angle, a set of example faces positioned at a third angle, and so forth. The number of orientations x may be selected in a number of different manners. Similarly, the distance between orientations may be selected in a number of different manners, and may be uniform across all of the orientations or may differ between one or more pairs of orientations. In one example, the classification function may be trained on sets of example faces positioned at 30° increments from 0° to 360° for a total of twelve orientations (x = 12) (0°, 30°, 60°, 90°, 120°, 150°, 180°, 210°, 240°, 270°, 300° and 330°). The classification function may therefore be considered to have been trained at each of a plurality of angles of orientation of the face. This training may be performed by the processing apparatus 104 or another device (e.g., another device as which the apparatus 200 may be configured to function).
Reference is now made to the flowchart of FIG. 4, which illustrates various operations in a method of rotating or otherwise directing rotation of one or more pictures of multimedia content based on an orientation of an object such as a face depicted in the respective picture(s), according to example embodiments of the present invention. It may be understood that for describing the method of FIG. 4, references may be made to FIGS. 3a and 3b for purposes of example.
As shown in block 400, the method may include means such as the processor 202 for receiving a picture 300 such as in a frame of multiple frames of multimedia content. Then, as shown in block 402, the method may include means such as the processor for dividing the picture into a plurality of equally-sized portions (referred to as "detection windows"), each of which may include a set of M x N pixels in the picture of W x H pixels. The detection windows may have any of a number of different sizes relative to the size of the picture; but in many instances, the detection window is smaller than the picture (i.e., M, N < W, H). The picture may be divided into the detection windows in a number of different manners, but in one example, the picture may be divided so that the resulting detection windows cover most, if not all, possible sets of M x N pixels. In these instances, many detection windows may at least partially if not substantially overlap.
As shown in block 404, the method may include means such as the processor 202 for evaluating the detection windows of the picture based on the classification function at each of a plurality of different angles of orientation. In this regard, each detection window may be evaluated based on the classification window at each angle of orientation (e.g., first angle, second angle, third angle, etc.) to calculate a classification score for each detection window at each angle of orientation. Thus, each detection window may have x
classification scores for x angles of orientation. The method may then include means such as the processor for determining the angle of orientation of the face such as by selecting, across the detection windows of the picture, the angle of orientation having the largest classification score, as shown in block 406.
The evaluation of the detection windows is further shown in FIG. 6, which is a schematic diagram illustrating selection of an angle of orientation of a face from a number of different angles of orientation, in accordance with one example embodiment. FIG. 6 depicts a schematic representation of a logical sequence 600 for selecting the angle of orientation of a face depicted in a picture. The detection windows are depicted as block 602 in FIG. 6. The detection windows may be evaluated based on the classification function for each of a number of different angles of orientation. As shown, for example, block 604 refers to evaluation of the detection windows based on the classification function at a first angle. Similarly, for example, block 606 refers to evaluation of the detection windows based on the classification function at a second angle; and block 608 refers to evaluation of the detection windows based on the classification function at the xth angle. In this manner, the detection windows may be evaluated for each of a number of different angles of orientation over the classification function. The angle of orientation having the largest classification score, then, may be determined as the angle of orientation of a face of the picture (the selected angle being shown as a), as referenced by block 610. The detection window having the largest classification score may in various instances also be considered to include a face, and may therefore be the detected face.
As shown in block 408, after determining the angle of orientation of a face in the picture, the method may include means such as the processor 202 for directing rotation of the picture based on the angle, such as to align the orientation of the face with the frame of reference. Given a selected angle of orientation of a, the picture may be rotated by the negative of the selected angle (-a) so that the rotated angle of orientation of the face is approximately if not absolutely 0°.
Before, after or as the picture is rotated, the method may (but need not) include means such as the processor 202 for further detection of a face in the picture, as shown in block 410. This further detection may be accomplished in any of a number of different manners, but in one example embodiment, may be accomplished by evaluating one or more of the detection windows based on a plurality of classification functions, which may include the aforementioned classification function ("initial classification function") and one or more additional classification functions. These additional classification function(s) may be trained in a manner similar to the initial classification function, including training the additional classification function(s) for different angles of orientation. The additional classification function(s), though, may be trained on different features or different numbers of features than the initial classification function. In one example, the further detection may be performed by evaluation of the picture through a cascade of a number of classification functions of increasing numbers of features.
FIG. 7 depicts a schematic representation of a logical sequence 700 detecting a face in the picture, in accordance with example embodiments of the present invention. The detection windows may again be depicted as block 602. The detection windows may be evaluated over n classification functions at the selected angle of orientation a. In this example, a first classification function may correspond to the classification function from which the selected angle of orientation is selected; although in various instances, the first classification function may be a different classification function.
In FIG. 7, block 702 refers to evaluation of the detection windows based on a first classification function at the selected angle of orientation, including calculation of a classification score for each of the detection windows. In instances in which the classification score of a detection window is greater than a predetermined threshold - indicating a higher likelihood of a face in the respective detection window (depicted as "Yes" in the FIG. 7), the detection window may be passed to the next classification function for evaluation. In instances in which the classification score of a detection window is less than or equal to the predetermined threshold - indicating a lower likelihood of a face in the respective detection window (depicted as "NO" in the FIG. 7), the detection window may be categorized as a non-face portion of the picture in block 704, and evaluation of the respective detection window may cease. It should be understood, though, that depending on how the thresholds are calculated during training of the classification functions, the above inequalities may be reversed to distinguish a higher likelihood and lower likelihood of a face in a detection window.
In an instance in which a face is detected in a detection window during the evaluation based on the first classification function, and the detection window is passed to the next classification function, the detection window may be similarly evaluated by the next classification function. Thus, the detection window may be evaluated for the remaining classification functions until the nth classification function depicted by block 706. In an instance in which, for evaluation of a detection window based upon a particular classification function, a face is not detected, evaluation of the respective detection window may cease, and the detection window may be categorized as a non-face portion as depicted in block 704. However, in an instance in which the presence of a face is detected in a detection window for each of the classification function(s), the face may be detected in the set of pixels at block 708.
As indicated above, the detection of the face may be performed in pictures of a plurality of frames of multimedia content. In such instances, referring again to FIG. 4, the method may repeat for successive frames of the content. In an instance in which the same face is depicted in successive frames, this may permit tracking of the face in the content. Also, performing the method across a number of frames may result in various frames being rotated different angles relative to various other frames, depending on the angle of orientation of a detected face in the respective frames. FIGS. 8a and 8b, for example, illustrate a picture having a detected face at an angle of orientation of Θ = 270°, and the resulting rotated picture. FIGS. 9a and 9b, for example, illustrate a picture having a detected face at an angle of orientation of Θ = 300°, and the resulting rotated picture. And FIGS. 10a and 10b, for example, illustrate a picture having a detected face at an angle of orientation of Θ = 330°, and the resulting rotated picture.
As explained above, a picture including a face may be rotated after detection of the face. It should be understood that the picture may be rotated at any time after detection of the face. In one example, the face may be detected and picture rotated in continuous operations. In another example, the face may be detected at some time prior to rotation of the picture. In this example, the face may be detected and its angle of orientation may be selected or otherwise determined, and the angle of orientation may be stored with the picture - such as in metadata of the picture. The picture and its metadata may be later received, the metadata read and picture rotated in accordance with the angle stored in the metadata. In a more particular example, the face may be detected and angle of orientation determined by a source 102, and the angle may be rotated by a processing apparatus 104 such as for output to a user interface (e.g., user interface 210).
FIG. 5 (including FIGS. 5a and 5b) is a flowchart that illustrates various operations in a method of rotating or otherwise directing rotation of one or more pictures of multimedia content, according to another example embodiment of the present invention. As shown in blocks 500, 502, 504 and 506, the method may include means such as the processor 202 for receiving and dividing a picture of multimedia content into detection windows, evaluating the detection windows of the picture, and determining the angle of orientation of the face based on the evaluation, such as in a manner similar to that described above with respect to respective ones of blocks 400, 402, 404 and 406. As shown in block 508, after determining the angle of orientation of a face in the picture, the method may include means such as the processor 202 for storing or directing storage of the angle of orientation with the picture, such as in metadata of the picture. In appropriate instances, the method may then repeat for any other pictures of the multimedia content.
After the angle of orientation is stored with a picture, the method may include means such as the processor 202 of the same or another apparatus, for receiving the picture and its stored angle of orientation may be received, and reading the angle of orientation of a face in the picture, as shown in blocks 510 and 512 of FIG. 5b. As shown in block 514, after reading the angle of orientation of the face in the picture, the method may include means such as the processor for directing rotation of the picture based on the angle, such as to align the orientation of the face with the frame of reference. This rotation may be accomplished in a manner similar to that described above with respect to block 408.
Further, before, after or as the picture is rotated, the method may (but need not) include means such as the processor 202 for further detection of a face in the picture, as shown in block 516. This further detection may be accomplished in any of a number of different manners, such as in a manner similar to that described above with respect to block 410. And again, in appropriate instances, the method may then repeat for any other pictures of the multimedia content.
In various instances, the picture(s) may be evaluated based on one or more further parameters that may facilitate realization of a more robust evaluation of the face(s).
Examples of these parameters may include, for example, the scale or scales of a face that may be detected within one frame or from one frame to the next successive frame, step size between neighboring sets of pixels within one frame or from one frame to the next successive frame, starting point of the detection window from the picture of one frame to that of the next successive frame, or the like. Additionally, one or more further processing operations may be performed on the picture(s), such as application of a skin filter and/or a texture filter. For more information on a number of example parameters and operations, see Indian Patent Application No. 1769/CHE/2010, entitled Method for Robust and Realtime Face Tracking, filed June 23, 2010, the content of which is hereby incorporated by reference in its entirety.
According to one aspect of the example embodiments of present invention, functions performed by the processing apparatus 104 and/or apparatus 200, such as those illustrated by the flowcharts of FIGS. 4 and 5, may be performed by various means. It will be understood that each block or operation of the flowcharts, and/or combinations of blocks or operations in the flowcharts, can be implemented by various means. Means for implementing the blocks or operations of the flowcharts, combinations of the blocks or operations in the flowcharts, or other functionality of example embodiments of the present invention described herein may include hardware, alone or under direction of one or more computer program code instructions, program instructions or executable computer- readable program code instructions from a computer-readable storage medium. In this regard, program code instructions may be stored on a memory device, such as the memory device 204 of the example apparatus, and executed by a processor, such as the processor 202 of the example apparatus. As will be appreciated, any such program code instructions may be loaded onto a computer or other programmable apparatus (e.g., processor, memory device, or the like) from a computer-readable storage medium to produce a particular machine, such that the particular machine becomes a means for implementing the functions specified in the flowcharts' block(s) or operation(s). These program code instructions may also be stored in a computer-readable storage medium that can direct a computer, a processor, or other programmable apparatus to function in a particular manner to thereby generate a particular machine or particular article of manufacture. The instructions stored in the computer-readable storage medium may produce an article of manufacture, where the article of manufacture becomes a means for implementing the functions specified in the flowcharts' block(s) or operation(s). The program code instructions may be retrieved from a computer-readable storage medium and loaded into a computer, processor, or other programmable apparatus to configure the computer, processor, or other programmable apparatus to execute operations to be performed on or by the computer, processor, or other programmable apparatus. Retrieval, loading, and execution of the program code instructions may be performed sequentially such that one instruction is retrieved, loaded, and executed at a time. In some example embodiments, retrieval, loading and/or execution may be performed in parallel such that multiple instructions are retrieved, loaded, and/or executed together. Execution of the program code instructions may produce a computer- implemented process such that the instructions executed by the computer, processor, or other programmable apparatus provide operations for implementing the functions specified in the flowcharts' block(s) or operation(s).
Accordingly, execution of instructions associated with the blocks or operations of the flowcharts by a processor, or storage of instructions associated with the blocks or operations of the flowcharts in a computer-readable storage medium, supports
combinations of operations for performing the specified functions. It will also be understood that one or more blocks or operations of the flowcharts, and combinations of blocks or operations in the flowcharts, may be implemented by special purpose hardware- based computer systems and/or processors which perform the specified functions, or combinations of special purpose hardware and program code instructions.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions other than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

We Claim:
1. An apparatus comprising:
at least one processor; and
at least one memory including computer program code,
the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least:
receive multimedia content including a picture;
receive an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference; and
direct rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.
2. The apparatus of Claim 1, wherein the angle of orientation of the object depicted in the picture is stored with the picture, and
wherein being configured to cause the apparatus to receive an indication of an angle of orientation of an object includes being configured to cause the apparatus to read the angle of orientation of the object from the picture.
3. The apparatus of Claim 1, wherein being configured to cause the apparatus to receive an indication of an angle of orientation of an object includes being configured to cause the apparatus to at least:
detect an object depicted in the picture; and
determine an angle of orientation of the detected object.
4. The apparatus of Claim 3, wherein being configured to cause the apparatus to detect an object and determine an angle of orientation of the detected object includes being configured to cause the apparatus to at least:
divide the picture into a plurality of portions;
perform an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object; and
detect an object depicted in the picture and determine an angle of orientation of the detected object based on the evaluation.
5. The apparatus of Claim 4, wherein being configured to cause the apparatus to perform an evaluation of each of the portions includes being configured to cause the apparatus to calculate a classification score for each portion for the classification function at each of the plurality of angles of orientation of an object, and
wherein being configured to cause the apparatus to detect an object depicted in the picture and determine an angle of orientation of the detected object includes being configured to cause the apparatus to select the portion and angle of orientation having the largest classification score.
6. The apparatus of Claim 4, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to further:
perform a second evaluation of each of the portions based on a plurality of classification functions at the determined angle of orientation; and
further detect the object depicted in the picture based on the second evaluation.
7. An apparatus comprising:
means for receiving multimedia content including a picture;
means for receiving an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference; and
means for directing rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.
8. The apparatus of Claim 7, wherein the angle of orientation of the object depicted in the picture is stored with the picture, and
wherein the means for receiving an indication of an angle of orientation of an object includes means for reading the angle of orientation of the object from the picture.
9. The apparatus of Claim 7, wherein the means for receiving an indication of an angle of orientation of an object includes means for at least:
detecting an object depicted in the picture; and
determining an angle of orientation of the detected object.
10. The apparatus of Claim 9, wherein the means for detecting an object and determining an angle of orientation of the detected object include means for at least:
dividing the picture into a plurality of portions;
performing an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object; and
detecting an object depicted in the picture and determining an angle of orientation of the detected object based on the evaluation.
11. The apparatus of Claim 10, wherein the means for performing an evaluation of each of the portions includes means for calculating a classification score for each portion for the classification function at each of the plurality of angles of orientation of an object, and
wherein the means for detecting an object depicted in the picture and determining an angle of orientation of the detected object include means for selecting the portion and angle of orientation having the largest classification score.
12. The apparatus of Claim 10 further comprising:
means for performing a second evaluation of each of the portions based on a plurality of classification functions at the determined angle of orientation; and
means for further detecting the object depicted in the picture based on the second evaluation.
13. A metho d comprising :
receiving multimedia content including a picture;
receiving an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference; and
directing rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.
14. The method of Claim 13, wherein the angle of orientation of the object depicted in the picture is stored with the picture, and
wherein receiving an indication of an angle of orientation of an object comprises reading the angle of orientation of the object from the picture.
15. The method of Claim 13, wherein receiving an indication of an angle of orientation of an object comprises:
detecting an object depicted in the picture; and
determining an angle of orientation of the detected object.
16. The method of Claim 15, wherein detecting an object and determining an angle of orientation of the detected object comprise:
dividing the picture into a plurality of portions;
performing an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object; and
detecting an object depicted in the picture and determining an angle of orientation of the detected object based on the evaluation.
17. The method of Claim 16, wherein performing an evaluation of each of the portions comprises calculating a classification score for each portion for the classification function at each of the plurality of angles of orientation of an object, and
wherein detecting an object depicted in the picture and determining an angle of orientation of the detected object comprises selecting the portion and angle of orientation having the largest classification score.
18. The method of Claim 16 further comprising:
performing a second evaluation of each of the portions based on a plurality of classification functions at the determined angle of orientation; and
further detecting the object depicted in the picture based on the second evaluation.
19. A computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable storage medium and computer- readable program code portions being configured to, with at least one processor, cause an apparatus to at least:
receive multimedia content including a picture;
receive an indication of an angle of orientation of an object depicted in the picture relative to a frame of reference; and direct rotation of the picture based on the angle to thereby align the orientation of the object with the frame of reference.
20. The computer-readable storage medium of Claim 19, wherein the angle of orientation of the object depicted in the picture is stored with the picture, and
wherein being configured to cause an apparatus to receive an indication of an angle of orientation of an object includes being configured to cause an apparatus to read the angle of orientation of the object from the picture.
21. The computer-readable storage medium of Claim 19, wherein being configured to cause an apparatus to receive an indication of an angle of orientation of an object includes being configured to cause an apparatus to at least:
detect an object depicted in the picture; and
determine an angle of orientation of the detected object.
22. The computer-readable storage medium of Claim 21, wherein being configured to cause an apparatus to detect an object and determine an angle of orientation of the detected object includes being configured to cause an apparatus to at least:
divide the picture into a plurality of portions;
perform an evaluation of each of the portions based on a classification function at each of a plurality of angles of orientation of an object; and
detect an object depicted in the picture and determine an angle of orientation of the detected object based on the evaluation.
23. The computer-readable storage medium of Claim 22, wherein being configured to cause an apparatus to perform an evaluation of each of the portions includes being configured to cause the apparatus to calculate a classification score for each portion for the classification function at each of the plurality of angles of orientation of an object, and
wherein being configured to cause an apparatus to detect an object depicted in the picture and determine an angle of orientation of the detected object includes being configured to cause an apparatus to select the portion and angle of orientation having the largest classification score.
24. The computer-readable storage medium of Claim 22, wherein the computer- readable storage medium and computer-readable program code portions are further configured to, with the at least one processor, cause the apparatus to further:
perform a second evaluation of each of the portions based on a plurality of classification functions at the determined angle of orientation; and
further detect the object depicted in the picture based on the second evaluation.
PCT/FI2011/050971 2010-12-20 2011-11-04 Picture rotation based on object detection WO2012085330A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN3892CH2010 2010-12-20
IN3892/CHE/2010 2010-12-20

Publications (1)

Publication Number Publication Date
WO2012085330A1 true WO2012085330A1 (en) 2012-06-28

Family

ID=46313230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2011/050971 WO2012085330A1 (en) 2010-12-20 2011-11-04 Picture rotation based on object detection

Country Status (1)

Country Link
WO (1) WO2012085330A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128397A (en) * 1997-11-21 2000-10-03 Justsystem Pittsburgh Research Center Method for finding all frontal faces in arbitrarily complex visual scenes
US20060067591A1 (en) * 2004-09-24 2006-03-30 John Guzzwell Method and system for classifying image orientation
US20060204110A1 (en) * 2003-06-26 2006-09-14 Eran Steinberg Detecting orientation of digital images using face detection information
US20080253664A1 (en) * 2007-03-21 2008-10-16 Ricoh Company, Ltd. Object image detection method and object image detection device
US20090202175A1 (en) * 2008-02-12 2009-08-13 Michael Guerzhoy Methods And Apparatus For Object Detection Within An Image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128397A (en) * 1997-11-21 2000-10-03 Justsystem Pittsburgh Research Center Method for finding all frontal faces in arbitrarily complex visual scenes
US20060204110A1 (en) * 2003-06-26 2006-09-14 Eran Steinberg Detecting orientation of digital images using face detection information
US20060067591A1 (en) * 2004-09-24 2006-03-30 John Guzzwell Method and system for classifying image orientation
US20080253664A1 (en) * 2007-03-21 2008-10-16 Ricoh Company, Ltd. Object image detection method and object image detection device
US20090202175A1 (en) * 2008-02-12 2009-08-13 Michael Guerzhoy Methods And Apparatus For Object Detection Within An Image

Similar Documents

Publication Publication Date Title
US20230077355A1 (en) Tracker assisted image capture
JP6694829B2 (en) Rule-based video importance analysis
US8718324B2 (en) Method, apparatus and computer program product for providing object tracking using template switching and feature adaptation
CN109035304B (en) Target tracking method, medium, computing device and apparatus
US9600744B2 (en) Adaptive interest rate control for visual search
US9715903B2 (en) Detection of action frames of a video stream
US20130251274A1 (en) Limited-context-based identifying key frame from video sequence
US8908911B2 (en) Redundant detection filtering
US8879894B2 (en) Pixel analysis and frame alignment for background frames
US10674066B2 (en) Method for processing image and electronic apparatus therefor
US20130182767A1 (en) Identifying a key frame from a video sequence
CN110991287A (en) Real-time video stream face detection tracking method and detection tracking system
CN108776822B (en) Target area detection method, device, terminal and storage medium
US10764499B2 (en) Motion blur detection
WO2012137621A1 (en) Image processing method and device
CN103999448A (en) Method and apparatus for correcting rotation of video frames
CN108960130B (en) Intelligent video file processing method and device
JP2014021901A (en) Object detection device, object detection method and program
US9741393B2 (en) Method and method for shortening video with event preservation
US10282633B2 (en) Cross-asset media analysis and processing
WO2012085330A1 (en) Picture rotation based on object detection
EP3855350A1 (en) Detection of action frames of a video stream
JP2009266169A (en) Information processor and method, and program
CN114007134B (en) Video processing method, device, electronic equipment and storage medium
JP2023097667A (en) Video photographic device and video recording method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11851512

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11851512

Country of ref document: EP

Kind code of ref document: A1