US20120039515A1 - Method and system for classifying scene for each person in video - Google Patents

Method and system for classifying scene for each person in video Download PDF

Info

Publication number
US20120039515A1
US20120039515A1 US13/317,509 US201113317509A US2012039515A1 US 20120039515 A1 US20120039515 A1 US 20120039515A1 US 201113317509 A US201113317509 A US 201113317509A US 2012039515 A1 US2012039515 A1 US 2012039515A1
Authority
US
United States
Prior art keywords
person
representation frame
scene
frame
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/317,509
Inventor
Jin Guk Jeong
Ji Yeun Kim
Sang Kyun Kim
San Ko
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US13/317,509 priority Critical patent/US20120039515A1/en
Publication of US20120039515A1 publication Critical patent/US20120039515A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • G06V40/173Classification, e.g. identification face re-identification, e.g. recognising unknown faces across different face tracks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Definitions

  • the present invention relates to a method and system for classifying a scene for each person in a video, and more particularly, to a method and system for classifying a scene for each person in a video based on person information and background information in video data.
  • a scene is a unit between when video contents are changed.
  • scenes are classified by using low level information such as color information or edge information.
  • shots are clustered using low level information such as color information extracted in all frames, and a scene segmentation is detected in a conventional automatic scene segmentation algorithm.
  • low level information such as color information extracted in all frames
  • a scene segmentation is detected in a conventional automatic scene segmentation algorithm.
  • a person in a video moves or a camera moves
  • low level information changes. Accordingly, a degree of accuracy decreases.
  • An aspect of the present invention provides a method and system for classifying a scene for each person in a video which may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
  • An aspect of the present invention also provides a method and system for classifying a scene for each person in a video which may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
  • a method of classifying a scene for each person in a video including: detecting a face within input video frames; detecting a shot change of the input video frames; extracting a person representation frame in the shot; performing a person clustering in the extracted person representation frame based on time information; detecting a scene change by separating a person portion from a background based on face extraction information, and comparing the person portion and the background; and merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
  • a system for classifying a scene for each person in a video including: a face detection unit detecting a face within input video frames; a shot change detection unit detecting a shot change of the input video frames; a person representation frame extraction unit extracting a person representation frame in the shot; a person clustering unit performing a person clustering in the extracted person representation frame based on time information; a scene change detection unit detecting a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background; and a scene clustering unit merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
  • FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention
  • FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention
  • FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention
  • FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention
  • FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated in FIG. 4 according to another embodiment of the present invention
  • FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated in FIG. 4 according to another embodiment of the present invention.
  • FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention.
  • FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention.
  • the system for classifying a scene for each person in a video 100 includes a face detection unit 110 , a shot change detection unit 120 , a person representation frame extraction unit 130 , a person clustering unit 140 , a scene change detection unit 150 , and a scene clustering unit 160 .
  • the face detection unit 110 detects a face of input video frames. Specifically, the face detection unit 110 analyzes the input video frames, and detects the face of the input video frames.
  • the shot change detection unit 120 detects a shot change within the input video frames. Specifically, the shot change detection unit 120 detects the shot change of the input video frames to segment the input video frames into a shot which is a basic unit of the video.
  • the person representation frame extraction unit 130 extracts a person representation frame in the shot. Using all person frames for a person clustering is inefficient. Accordingly, the person representation frame extraction unit 130 extracts a frame which is closest to a center frame having a greatest similarity in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the person representation frame extraction unit 130 extracts the frame one by one in all clusters and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
  • the person clustering unit 140 performs the person clustering in the extracted person representation frame based on time information.
  • an algorithm for various poses or lightings may not be strict. Accordingly, the person clustering unit 140 performs the person clustering by using the time information to start clustering based on various forms of each person.
  • a single person generally wears same clothes within a similar time period in same video data, and such clothes information has a clearer difference than face information. Accordingly, the person clustering unit 140 obtains various forms of the single person by using the clothes information.
  • FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention.
  • the size of clothes is determined in proportion to a size of a key person in the person representation frame 210 , 220 , 230 , 240 , and 250 in the shot.
  • the person clustering unit 140 extracts clothes information from current cluster information, a current person representation frame, and a comparison person representation frame, i.e. a person representation frame to be compared.
  • the person clustering unit 140 compares the current person representation frame and the comparison person representation frame, and determines whether the current person representation frame is similar to the comparison person representation frame as a result of the comparing.
  • the person clustering unit 140 extends a time window when the current person representation frame is similar to the comparison person representation frame, and includes the person representation frame which has been currently compared in the current cluster information.
  • the person clustering unit 140 sets a subsequent person representation frame as another comparison person representation frame on the time window.
  • the person clustering unit 140 determines whether the current person representation frame and the comparison person representation frame are at an end of the time window, when the current person representation frame is different from the comparison person representation frame. The person clustering unit 140 sets the subsequent person representation frame in the time window as the other comparison person representation frame, when the current person representation frame and the comparison person representation frame are not at the end of the time window.
  • a scene change detection unit 150 detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background. Specifically, the scene change detection unit 150 may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted.
  • the scene change detection unit 150 receives current scene information, a current shot representation frame, and a comparison shot representation frame, and extracts background information from the current shot representation frame and the comparison shot representation frame.
  • the scene change detection unit 150 compares the current shot representation frame and the comparison shot representation frame, and determines whether the current shot representation frame is similar to the comparison shot representation frame.
  • the scene change detection unit 150 extends the time window when the current shot representation frame is similar to the comparison shot representation frame, and marks that the comparing of the current shot representation frame is completed.
  • the scene change detection unit 150 assigns the comparison shot representation frame to the current shot representation frame, and assigns a subsequent shot representation frame in the time window to the comparison shot representation frame.
  • the scene change detection unit 150 marks that the comparing of the current shot representation frame is completed, when the current shot representation frame is different from the comparison shot representation frame, and determines whether comparing all frames in the time window is completed.
  • the scene change detection unit 150 assigns a subsequent shot representation frame where the comparing is incomplete to the current shot representation frame, and assigns the subsequent shot representation frame to the comparison shot representation frame, when the comparing is incomplete.
  • a scene clustering unit 160 merges similar clusters from the extracted person representation frame and performs a scene clustering for each person. Specifically, the scene clustering unit 160 may perform the scene clustering for each person by comparing the person representation frame in the shot and merging the similar clusters according to the comparison, as illustrated in FIG. 3 .
  • the scene clustering unit 160 receives time information-based clusters, and selects two clusters having a minimum difference value.
  • the scene clustering unit 160 compares the minimum difference value and a threshold value, and merges the two clusters when the minimum difference value is less than the threshold value.
  • the scene clustering detection unit 160 connects scenes including a person frame in a same cluster, when the minimum difference value is equal to or greater than the threshold value. A scene clustering method for each person is described in greater detail with reference to FIG. 3 .
  • FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention.
  • a scene clustering unit 160 compares a first person representation frame 310 and a second person representation frame 320 , and performs a first merge of similar clusters based on a result of the comparison.
  • the scene clustering unit 160 compares a fifth person representation frame 350 and a sixth person representation frame 360 , and performs a second merge of similar clusters based on a result of the comparison.
  • the scene clustering unit 160 compares a third person representation frame 330 and a seventh person representation frame 370 , and performs a third merge of similar clusters based on a result of the comparison.
  • the scene clustering unit 160 compares the first merge and the second merge, and performs a fourth merge of similar clusters based on a result of the comparison.
  • FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention.
  • a system for classifying a scene for each person in a video detects a face within input video frames. Specifically, the system for classifying a scene for each person in a video analyzes the input video frames via a face detector and thereby may detect the face within the input video frames.
  • the system for classifying a scene for each person in a video detects a shot change within the input video frames. Specifically, the system for classifying a scene for each person in a video detects the shot change within the input video frames to segment the input video frames into a shot which is a basic unit of the video.
  • the system for classifying a scene for each person in a video extracts a person representation frame in the shot. Since using all person frames for a person clustering is inefficient, the system for classifying a scene for each person in a video extracts a frame which is closest to a center in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the system for classifying a scene for each person in a video extracts the frame one by one in all frames and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
  • the system for classifying a scene for each person in a video performs the person clustering in the extracted person representation frame based on time information.
  • an algorithm for various poses or lightings may not be strict.
  • the system for classifying a scene for each person in a video performs the person clustering by using the time information to start clustering based on various forms of each person.
  • a single person generally wears the same clothes within a similar time period in the same video data, and such clothes information has a clearer difference than face information.
  • the system for classifying a scene for each person in a video obtains various forms of the single person by using the clothes information. An operation of a time information-based person clustering is described in greater detail with reference to FIG. 5 .
  • FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated in FIG. 4 according to another embodiment of the present invention.
  • the system for classifying a scene for each person in a video receives current cluster information, a current person representation frame, and a comparison person representation frame.
  • the comparison person representation frame is a person representation frame to be compared.
  • the system for classifying a scene for each person in a video extracts clothes information of each of the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video may extract the clothes information by referring to the location and size of the face from the face information as illustrated in FIG. 2 to reduce a time to extract clothes information.
  • the system for classifying a scene for each person in a video compares the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video adds a comparison value of color information corresponding to the clothes information and a weight of a comparison value corresponding to the face information, when comparing.
  • the system for classifying a scene for each person in a video determines whether the current person representation frame is similar to the comparison person representation frame, as a result of the comparing.
  • the system for classifying a scene for each person in a video includes the comparison person representation frame which has been currently compared in the current cluster information. Specifically, the system for classifying a scene for each person in a video includes the comparison person representation frame, which has been compared with the current person representation frame, in the current cluster information.
  • the system for classifying a scene for each person in a video sets a subsequent person representation frame in the time window T fw as other comparison person representation frame, and performs operation S 502 . Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent person representation frame in the time window T fw .
  • the system for classifying a scene for each person in a video determines whether the current person representation frame and the comparison person representation frame are at an end of the time window T fw . Specifically, when the current person representation frame is different from the comparison person representation frame, the system for classifying a scene for each person in a video determines whether the all frames in the time window T fw are compared by using a result of the determining whether the current person representation frame and the comparison person representation frame are at the end of the time window T fw .
  • operation S 510 when the current person representation frame and the comparison person representation frame are not at the end of the time window T fw , the system for classifying a scene for each person in a video sets the subsequent person representation frame as the comparison person representation frame, and performs operation S 502 , since the all person representation frames corresponding to the current cluster are not detected.
  • the system for classifying a scene for each person in a video detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background.
  • the system for classifying a scene for each person in a video may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted.
  • a scene change detection operation is described in greater detail with reference to FIG. 6 .
  • FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated in FIG. 4 according to another embodiment of the present invention.
  • the system for classifying a scene for each person in a video receives current scene information, a current shot representation frame P f , and a comparison shot representation frame C f .
  • the system for classifying a scene for each person in a video extracts background information of the current shot representation frame P f and the comparison shot representation frame C f .
  • the background information is information about a pixel of another location excluding a face location and a clothes location.
  • the system for classifying a scene for each person in a video compares the current shot representation frame P f and the comparison shot representation frame C f . Specifically, the system for classifying a scene for each person in a video adds the comparison value of the color information corresponding to the clothes information and the weight of the comparison value corresponding to the face information, when comparing. Also, when comparing the background information, a normalized color histogram, and a hue, saturation, value (HSV) are used.
  • HSV hue, saturation, value
  • the system for classifying a scene for each person in a video determines whether the current shot representation frame P f is similar to the comparison shot representation frame C f , as a result of the comparing.
  • the system for classifying a scene for each person in a video extends a time window T sw . Specifically, the system for classifying a scene for each person in a video resets the time window T sw to extend a scene again, since a same scene is continued up to a point in time when the current shot representation frame P f is similar to the comparison shot representation frame C f .
  • the system for classifying a scene for each person in a video marks that the comparing of the current shot representation frame P f is completed, and sets the comparison shot representation frame C f as the current shot representation frame P f .
  • the system for classifying a scene for each person in a video sets a subsequent shot representation frame in the time window T sw as a comparison shot representation frame (*C f ?), and performs operation S 602 . Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent shot representation frame in the time window T sw .
  • operation S 609 the system for classifying a scene for each person in a video determines whether comparing all frames in the time window T sw is completed.
  • the system for classifying a scene for each person in a video determines a shot, which is examined last and determined to be a similar shot, as a last shot of a current scene, since all shots corresponding to the current scene are detected. Also, the system for classifying a scene for each person in a video performs a detection operation of a subsequent scene.
  • the system for classifying a scene for each person in a video sets a subsequent shot representation frame where the comparing is incomplete as the current shot representation frame P f , and sets the subsequent shot representation frame as the comparison shot representation frame C f . Also, the system for classifying a scene for each person in a video performs operation S 602 .
  • the system for classifying a scene for each person in a video merges similar clusters from the extracted person representation frame and performs the scene clustering for each person.
  • the system for classifying a scene for each person in a video may perform the scene clustering by comparing and merging as illustrated in FIG. 3 . An operation of a scene clustering for each person is described in greater detail with reference to FIG. 7 .
  • FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention.
  • the system for classifying a scene for each person in a video receives time information-based clusters.
  • the system for classifying a scene for each person in a video selects two clusters having a minimum difference value from difference values from among all clusters. Specifically, the difference values of all clusters may be compared using an average value of each cluster. Also, the minimum difference value may be used after comparing all objects of a corresponding cluster and all objects of a comparison cluster.
  • the system for classifying a scene for each person in a video compares the minimum difference value and a threshold value and determines whether the minimum difference value is less than the threshold value.
  • operation S 704 when the minimum difference value is less than the threshold value, the system for classifying a scene for each person in a video merges the two clusters, as illustrated in FIG. 3 , since the two clusters include a similar person. Also, the system for classifying a scene for each person in a video performs operation S 702 .
  • the system for classifying a scene for each person in a video connects scenes including a person frame in a same cluster. Specifically, the system for classifying a scene for each person in a video determines that all clustering are completed when the minimum difference value is equal to or greater than the threshold value. Also, when connecting the scenes including a same person, the operation of a scene clustering for each person is completed. Each scene may be included in many clusters since various persons may exist in a single scene.
  • the method and system for classifying a scene for each person in a video may be recorded in computer-readable media including program instructions to implement various operations embodied by a computer.
  • the media may also include, alone or in combination with the program instructions, data files, data structures, and the like.
  • Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVD; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like.
  • the media may also be a transmission medium such as optical or metallic lines, wave guides, etc.
  • program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • the described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described exemplary embodiments of the present invention.
  • a method and system for classifying a scene for each person in a video may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
  • a method and system for classifying a scene for each person in a video may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
  • a method and system for classifying a scene for each person in a video may replay for each person in video data, and thereby may enable a user to selectively view a scene including a person that the user likes.
  • a method and system for classifying a scene for each person in a video may classify a person by a scene unit, which is a story unit in video data, and thereby may improve a scene classification accuracy and enable a scene-based navigation.
  • a method and system for classifying a scene for each person in a video may perform a video data analysis more easily by improving a scene classification accuracy in video data.

Abstract

Described is a method of classifying a scene for each person in a video, the method including: detecting a face within input video frames; detecting a shot change of the input video frames; extracting a person representation frame in the shot; performing a person clustering in the extracted person representation frame based on time information; detecting a scene change by separating a person portion from a background based on face extraction information, and comparing the person portion and the background; and merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of application Ser. No. 11/882,733 filed on Aug. 3, 2007, which claims the priority of Korean Patent Application No. 10-2007-0000957, filed on Jan. 4, 2007, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
  • BACKGROUND
  • 1. Field
  • The present invention relates to a method and system for classifying a scene for each person in a video, and more particularly, to a method and system for classifying a scene for each person in a video based on person information and background information in video data.
  • 2. Description of the Related Art
  • Generally, a scene is a unit between when video contents are changed. In a conventional art, scenes are classified by using low level information such as color information or edge information.
  • Specifically, shots are clustered using low level information such as color information extracted in all frames, and a scene segmentation is detected in a conventional automatic scene segmentation algorithm. However, when a person in a video moves or a camera moves, low level information changes. Accordingly, a degree of accuracy decreases.
  • Also, persons in a video are clustered using face information, and thus the persons are classified in a conventional person classification method. However, face information changes depending on poses, lighting, and the like, which causes a low accuracy.
  • Accordingly, a method and system for classifying a scene for each person in a video is required.
  • SUMMARY
  • An aspect of the present invention provides a method and system for classifying a scene for each person in a video which may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
  • An aspect of the present invention also provides a method and system for classifying a scene for each person in a video which may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
  • According to an aspect of the present invention, there is provided a method of classifying a scene for each person in a video, the method including: detecting a face within input video frames; detecting a shot change of the input video frames; extracting a person representation frame in the shot; performing a person clustering in the extracted person representation frame based on time information; detecting a scene change by separating a person portion from a background based on face extraction information, and comparing the person portion and the background; and merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
  • According to another aspect of the present invention, there is provided a system for classifying a scene for each person in a video, the system including: a face detection unit detecting a face within input video frames; a shot change detection unit detecting a shot change of the input video frames; a person representation frame extraction unit extracting a person representation frame in the shot; a person clustering unit performing a person clustering in the extracted person representation frame based on time information; a scene change detection unit detecting a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background; and a scene clustering unit merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
  • Additional aspects and/or advantages of the invention will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
  • These and/or other aspects, features, and advantages of the invention will become apparent and more readily appreciated from the following description of exemplary embodiments, taken in conjunction with the accompanying drawings of which:
  • FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention;
  • FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention;
  • FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention;
  • FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention;
  • FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated in FIG. 4 according to another embodiment of the present invention;
  • FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated in FIG. 4 according to another embodiment of the present invention; and
  • FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention.
  • DETAILED DESCRIPTION
  • Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
  • FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention.
  • Referring to FIG. 1, the system for classifying a scene for each person in a video 100 includes a face detection unit 110, a shot change detection unit 120, a person representation frame extraction unit 130, a person clustering unit 140, a scene change detection unit 150, and a scene clustering unit 160.
  • The face detection unit 110 detects a face of input video frames. Specifically, the face detection unit 110 analyzes the input video frames, and detects the face of the input video frames.
  • The shot change detection unit 120 detects a shot change within the input video frames. Specifically, the shot change detection unit 120 detects the shot change of the input video frames to segment the input video frames into a shot which is a basic unit of the video.
  • The person representation frame extraction unit 130 extracts a person representation frame in the shot. Using all person frames for a person clustering is inefficient. Accordingly, the person representation frame extraction unit 130 extracts a frame which is closest to a center frame having a greatest similarity in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the person representation frame extraction unit 130 extracts the frame one by one in all clusters and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
  • The person clustering unit 140 performs the person clustering in the extracted person representation frame based on time information. When simply performing a clustering based on all person representation frames, an algorithm for various poses or lightings may not be strict. Accordingly, the person clustering unit 140 performs the person clustering by using the time information to start clustering based on various forms of each person. Specifically, as illustrated in FIG. 2, a single person generally wears same clothes within a similar time period in same video data, and such clothes information has a clearer difference than face information. Accordingly, the person clustering unit 140 obtains various forms of the single person by using the clothes information.
  • FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention.
  • A location and size of a face 211, 221, 231, 241, and 251, automatically detected in a person representation frame 210, 220, 230, 240, and 250 in a shot, and a location and size of clothes 212, 222, 232, 242, and 252, extracted in the person representation frame 210, 220, 230, 240, and 250, are illustrated in FIG. 2. The size of clothes is determined in proportion to a size of a key person in the person representation frame 210, 220, 230, 240, and 250 in the shot.
  • The person clustering unit 140 extracts clothes information from current cluster information, a current person representation frame, and a comparison person representation frame, i.e. a person representation frame to be compared. The person clustering unit 140 compares the current person representation frame and the comparison person representation frame, and determines whether the current person representation frame is similar to the comparison person representation frame as a result of the comparing. The person clustering unit 140 extends a time window when the current person representation frame is similar to the comparison person representation frame, and includes the person representation frame which has been currently compared in the current cluster information. The person clustering unit 140 sets a subsequent person representation frame as another comparison person representation frame on the time window. Also, the person clustering unit 140 determines whether the current person representation frame and the comparison person representation frame are at an end of the time window, when the current person representation frame is different from the comparison person representation frame. The person clustering unit 140 sets the subsequent person representation frame in the time window as the other comparison person representation frame, when the current person representation frame and the comparison person representation frame are not at the end of the time window.
  • A scene change detection unit 150 detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background. Specifically, the scene change detection unit 150 may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted.
  • The scene change detection unit 150 receives current scene information, a current shot representation frame, and a comparison shot representation frame, and extracts background information from the current shot representation frame and the comparison shot representation frame. The scene change detection unit 150 compares the current shot representation frame and the comparison shot representation frame, and determines whether the current shot representation frame is similar to the comparison shot representation frame. The scene change detection unit 150 extends the time window when the current shot representation frame is similar to the comparison shot representation frame, and marks that the comparing of the current shot representation frame is completed. The scene change detection unit 150 assigns the comparison shot representation frame to the current shot representation frame, and assigns a subsequent shot representation frame in the time window to the comparison shot representation frame. The scene change detection unit 150 marks that the comparing of the current shot representation frame is completed, when the current shot representation frame is different from the comparison shot representation frame, and determines whether comparing all frames in the time window is completed. The scene change detection unit 150 assigns a subsequent shot representation frame where the comparing is incomplete to the current shot representation frame, and assigns the subsequent shot representation frame to the comparison shot representation frame, when the comparing is incomplete.
  • A scene clustering unit 160 merges similar clusters from the extracted person representation frame and performs a scene clustering for each person. Specifically, the scene clustering unit 160 may perform the scene clustering for each person by comparing the person representation frame in the shot and merging the similar clusters according to the comparison, as illustrated in FIG. 3.
  • The scene clustering unit 160 receives time information-based clusters, and selects two clusters having a minimum difference value. The scene clustering unit 160 compares the minimum difference value and a threshold value, and merges the two clusters when the minimum difference value is less than the threshold value. The scene clustering detection unit 160 connects scenes including a person frame in a same cluster, when the minimum difference value is equal to or greater than the threshold value. A scene clustering method for each person is described in greater detail with reference to FIG. 3.
  • FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention.
  • In operation S1, a scene clustering unit 160 compares a first person representation frame 310 and a second person representation frame 320, and performs a first merge of similar clusters based on a result of the comparison. In operation S2, the scene clustering unit 160 compares a fifth person representation frame 350 and a sixth person representation frame 360, and performs a second merge of similar clusters based on a result of the comparison. In operation S3, the scene clustering unit 160 compares a third person representation frame 330 and a seventh person representation frame 370, and performs a third merge of similar clusters based on a result of the comparison. In operation S4, the scene clustering unit 160 compares the first merge and the second merge, and performs a fourth merge of similar clusters based on a result of the comparison.
  • FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention.
  • Referring to FIG. 4, in operation S410, a system for classifying a scene for each person in a video detects a face within input video frames. Specifically, the system for classifying a scene for each person in a video analyzes the input video frames via a face detector and thereby may detect the face within the input video frames.
  • In operation S420, the system for classifying a scene for each person in a video detects a shot change within the input video frames. Specifically, the system for classifying a scene for each person in a video detects the shot change within the input video frames to segment the input video frames into a shot which is a basic unit of the video.
  • In operation S430, the system for classifying a scene for each person in a video extracts a person representation frame in the shot. Since using all person frames for a person clustering is inefficient, the system for classifying a scene for each person in a video extracts a frame which is closest to a center in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the system for classifying a scene for each person in a video extracts the frame one by one in all frames and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
  • In operation S440, the system for classifying a scene for each person in a video performs the person clustering in the extracted person representation frame based on time information. When simply clustering based on all person representation frames, an algorithm for various poses or lightings may not be strict. Accordingly, the system for classifying a scene for each person in a video performs the person clustering by using the time information to start clustering based on various forms of each person. Specifically, a single person generally wears the same clothes within a similar time period in the same video data, and such clothes information has a clearer difference than face information. Accordingly, the system for classifying a scene for each person in a video obtains various forms of the single person by using the clothes information. An operation of a time information-based person clustering is described in greater detail with reference to FIG. 5.
  • FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated in FIG. 4 according to another embodiment of the present invention.
  • Referring to FIG. 5, in operation S501, the system for classifying a scene for each person in a video receives current cluster information, a current person representation frame, and a comparison person representation frame. The comparison person representation frame is a person representation frame to be compared.
  • In operation S502, the system for classifying a scene for each person in a video extracts clothes information of each of the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video may extract the clothes information by referring to the location and size of the face from the face information as illustrated in FIG. 2 to reduce a time to extract clothes information.
  • In operation S503, the system for classifying a scene for each person in a video compares the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video adds a comparison value of color information corresponding to the clothes information and a weight of a comparison value corresponding to the face information, when comparing.
  • In operation S504, the system for classifying a scene for each person in a video determines whether the current person representation frame is similar to the comparison person representation frame, as a result of the comparing.
  • In Operation S505, when the current person representation frame is similar to the comparison person representation frame, the system for classifying a scene for each person in a video extends a time window Tfw. Specifically, when the current person representation frame is similar to the comparison person representation frame, the system for classifying a scene for each person in a video resets the time window Tfw from a present point in time, since a same person exists up to the present point in time.
  • In operation S506, the system for classifying a scene for each person in a video includes the comparison person representation frame which has been currently compared in the current cluster information. Specifically, the system for classifying a scene for each person in a video includes the comparison person representation frame, which has been compared with the current person representation frame, in the current cluster information.
  • In operation S507, the system for classifying a scene for each person in a video sets a subsequent person representation frame in the time window Tfw as other comparison person representation frame, and performs operation S502. Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent person representation frame in the time window Tfw.
  • In operation S508, when the current person representation frame is different from the comparison person representation frame, the system for classifying a scene for each person in a video determines whether the current person representation frame and the comparison person representation frame are at an end of the time window Tfw. Specifically, when the current person representation frame is different from the comparison person representation frame, the system for classifying a scene for each person in a video determines whether the all frames in the time window Tfw are compared by using a result of the determining whether the current person representation frame and the comparison person representation frame are at the end of the time window Tfw.
  • In operation S509, when the current person representation frame and the comparison person representation frame are at the end of the time window Tfw, the system for classifying a scene for each person in a video moves to a subsequent cluster and performs a time information-based person clustering for the subsequent cluster, since all person representation frames corresponding to a current cluster are extracted.
  • In operation S510, when the current person representation frame and the comparison person representation frame are not at the end of the time window Tfw, the system for classifying a scene for each person in a video sets the subsequent person representation frame as the comparison person representation frame, and performs operation S502, since the all person representation frames corresponding to the current cluster are not detected.
  • In operation S450, the system for classifying a scene for each person in a video detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background. Specifically, the system for classifying a scene for each person in a video may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted. A scene change detection operation is described in greater detail with reference to FIG. 6.
  • FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated in FIG. 4 according to another embodiment of the present invention.
  • Referring to FIG. 6, in operation S601, the system for classifying a scene for each person in a video receives current scene information, a current shot representation frame Pf, and a comparison shot representation frame Cf.
  • In operation S602, the system for classifying a scene for each person in a video extracts background information of the current shot representation frame Pf and the comparison shot representation frame Cf. The background information is information about a pixel of another location excluding a face location and a clothes location.
  • In operation S603, the system for classifying a scene for each person in a video compares the current shot representation frame Pf and the comparison shot representation frame Cf. Specifically, the system for classifying a scene for each person in a video adds the comparison value of the color information corresponding to the clothes information and the weight of the comparison value corresponding to the face information, when comparing. Also, when comparing the background information, a normalized color histogram, and a hue, saturation, value (HSV) are used.
  • In operation S604, the system for classifying a scene for each person in a video determines whether the current shot representation frame Pf is similar to the comparison shot representation frame Cf, as a result of the comparing.
  • In operation S605, when the current shot representation frame Pf is similar to the comparison shot representation frame Cf, the system for classifying a scene for each person in a video extends a time window Tsw. Specifically, the system for classifying a scene for each person in a video resets the time window Tsw to extend a scene again, since a same scene is continued up to a point in time when the current shot representation frame Pf is similar to the comparison shot representation frame Cf.
  • In operation S606, the system for classifying a scene for each person in a video marks that the comparing of the current shot representation frame Pf is completed, and sets the comparison shot representation frame Cf as the current shot representation frame Pf.
  • In operation S607, the system for classifying a scene for each person in a video sets a subsequent shot representation frame in the time window Tsw as a comparison shot representation frame (*Cf?), and performs operation S602. Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent shot representation frame in the time window Tsw.
  • In operation S608, when the current shot representation frame Pf is different from the comparison shot representation frame Cf, the system for classifying a scene for each person in a video marks that the comparing of the current shot representation frame Pf is completed.
  • In operation S609, the system for classifying a scene for each person in a video determines whether comparing all frames in the time window Tsw is completed.
  • In operation S610, when the comparing all frames in the time window Tsw is completed, the system for classifying a scene for each person in a video determines a shot, which is examined last and determined to be a similar shot, as a last shot of a current scene, since all shots corresponding to the current scene are detected. Also, the system for classifying a scene for each person in a video performs a detection operation of a subsequent scene.
  • In operation S611, when the comparing is incomplete, the system for classifying a scene for each person in a video sets a subsequent shot representation frame where the comparing is incomplete as the current shot representation frame Pf, and sets the subsequent shot representation frame as the comparison shot representation frame Cf. Also, the system for classifying a scene for each person in a video performs operation S602.
  • In operation S460, the system for classifying a scene for each person in a video merges similar clusters from the extracted person representation frame and performs the scene clustering for each person. Specifically, the system for classifying a scene for each person in a video may perform the scene clustering by comparing and merging as illustrated in FIG. 3. An operation of a scene clustering for each person is described in greater detail with reference to FIG. 7.
  • FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention.
  • Referring to FIG. 7, in operation S701, the system for classifying a scene for each person in a video receives time information-based clusters.
  • In operation S702, the system for classifying a scene for each person in a video selects two clusters having a minimum difference value from difference values from among all clusters. Specifically, the difference values of all clusters may be compared using an average value of each cluster. Also, the minimum difference value may be used after comparing all objects of a corresponding cluster and all objects of a comparison cluster.
  • In operation S703, the system for classifying a scene for each person in a video compares the minimum difference value and a threshold value and determines whether the minimum difference value is less than the threshold value.
  • In operation S704, when the minimum difference value is less than the threshold value, the system for classifying a scene for each person in a video merges the two clusters, as illustrated in FIG. 3, since the two clusters include a similar person. Also, the system for classifying a scene for each person in a video performs operation S702.
  • In operation S705, when the minimum difference value is equal to or greater than the threshold value, the system for classifying a scene for each person in a video connects scenes including a person frame in a same cluster. Specifically, the system for classifying a scene for each person in a video determines that all clustering are completed when the minimum difference value is equal to or greater than the threshold value. Also, when connecting the scenes including a same person, the operation of a scene clustering for each person is completed. Each scene may be included in many clusters since various persons may exist in a single scene.
  • The method and system for classifying a scene for each person in a video according to the above-described exemplary embodiments of the present invention may be recorded in computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVD; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. The media may also be a transmission medium such as optical or metallic lines, wave guides, etc. including a carrier wave transmitting signals specifying the program instructions, data structures, etc. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described exemplary embodiments of the present invention.
  • A method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
  • Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
  • Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may replay for each person in video data, and thereby may enable a user to selectively view a scene including a person that the user likes.
  • Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may classify a person by a scene unit, which is a story unit in video data, and thereby may improve a scene classification accuracy and enable a scene-based navigation.
  • Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may perform a video data analysis more easily by improving a scene classification accuracy in video data.
  • Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (6)

What is claimed is:
1. A method of classifying a scene for each person in a video, the method comprising:
extracting a person representation frame in a shot;
comparing a first person representation frame and a second person representation frame;
performing a person clustering by extending a time window when the first person representation frame is similar to the second person representation frame;
merging similar clusters using a person cluster extracted from a representation frame and performing a scene clustering for each person based on a scene change,
wherein the scene change is determined using a person portion and a background portion.
2. The method of claim 1, wherein the performing of the person clustering further comprises:
receiving cluster information, the first person representation frame, and the second person representation frame to be compared;
including the second person representation frame which has been currently compared in the current cluster information when the first person representation frame is similar to the second person representation frame; and
setting a subsequent person representation frame as third person representation frame to be compared on the time window.
3. The method of claim 2, further comprising:
moving to a subsequent cluster when the first person representation frame and the second person representation frame are at the end of the time window; or
setting the subsequent person representation frame in the time window as the other person representation frame to be compared on the time window, when the first person representation frame and the second person representation frame to be compared are not at the end of the time window.
4. The method of claim 1, wherein the performing of the scene clustering comprises:
receiving time information-based clusters;
selecting two clusters having a minimum difference value;
comparing the minimum difference value and a threshold value; and
merging the two clusters when the minimum difference value is less than the threshold value.
5. A non-transitory computer-readable recording medium storing a program for implementing a method of classifying a scene for each person in a video, the method comprising:
extracting a person representation frame in a shot;
comparing a first person representation frame and a second person representation frame;
performing a person clustering by extending a time window when the first person representation frame is similar to the second person representation frame;
merging similar clusters using a person cluster extracted from a representation frame and performing a scene clustering for each person based on a scene change,
wherein the scene change is determined using a person portion and a background portion.
6. A system for classifying a scene for each person in a video, the system comprising:
a person representation frame extracting unit to extract a person representation frame in a shot;
a person clustering unit to compare a first person representation frame and a second person representation frame and to perform a person clustering by extending a time window when the first person representation frame is similar to the second person representation frame;
a scene clustering unit to merge similar clusters using a person cluster extracted from a representation frame and to perform a scene clustering for each person based on a scene change,
wherein the scene change is determined using a person portion and a background portion.
US13/317,509 2007-01-04 2011-10-20 Method and system for classifying scene for each person in video Abandoned US20120039515A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/317,509 US20120039515A1 (en) 2007-01-04 2011-10-20 Method and system for classifying scene for each person in video

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020070000957A KR100804678B1 (en) 2007-01-04 2007-01-04 Method for classifying scene by personal of video and system thereof
KR10-2007-0000957 2007-01-04
US11/882,733 US8073208B2 (en) 2007-01-04 2007-08-03 Method and system for classifying scene for each person in video
US13/317,509 US20120039515A1 (en) 2007-01-04 2011-10-20 Method and system for classifying scene for each person in video

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/882,733 Continuation US8073208B2 (en) 2007-01-04 2007-08-03 Method and system for classifying scene for each person in video

Publications (1)

Publication Number Publication Date
US20120039515A1 true US20120039515A1 (en) 2012-02-16

Family

ID=39382421

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/882,733 Expired - Fee Related US8073208B2 (en) 2007-01-04 2007-08-03 Method and system for classifying scene for each person in video
US13/317,509 Abandoned US20120039515A1 (en) 2007-01-04 2011-10-20 Method and system for classifying scene for each person in video

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/882,733 Expired - Fee Related US8073208B2 (en) 2007-01-04 2007-08-03 Method and system for classifying scene for each person in video

Country Status (2)

Country Link
US (2) US8073208B2 (en)
KR (1) KR100804678B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394422A (en) * 2014-11-12 2015-03-04 华为软件技术有限公司 Video segmentation point acquisition method and device
WO2015135277A1 (en) * 2014-03-14 2015-09-17 小米科技有限责任公司 Clustering method and related device
US9449216B1 (en) * 2013-04-10 2016-09-20 Amazon Technologies, Inc. Detection of cast members in video content
CN108446390A (en) * 2018-03-22 2018-08-24 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN110807368A (en) * 2019-10-08 2020-02-18 支付宝(杭州)信息技术有限公司 Injection attack identification method, device and equipment

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009076982A (en) * 2007-09-18 2009-04-09 Toshiba Corp Electronic apparatus, and face image display method
JP2009081699A (en) * 2007-09-26 2009-04-16 Toshiba Corp Electronic apparatus and method of controlling face image extraction
JP2009089065A (en) * 2007-09-28 2009-04-23 Toshiba Corp Electronic device and facial image display apparatus
US8121358B2 (en) * 2009-03-06 2012-02-21 Cyberlink Corp. Method of grouping images by face
US8531478B2 (en) * 2009-03-19 2013-09-10 Cyberlink Corp. Method of browsing photos based on people
JP2012039524A (en) * 2010-08-10 2012-02-23 Sony Corp Moving image processing apparatus, moving image processing method and program
US8726161B2 (en) 2010-10-19 2014-05-13 Apple Inc. Visual presentation composition
US20120155717A1 (en) * 2010-12-16 2012-06-21 Microsoft Corporation Image search including facial image
CN102682281A (en) * 2011-03-04 2012-09-19 微软公司 Aggregated facial tracking in video
US9179201B2 (en) * 2011-08-26 2015-11-03 Cyberlink Corp. Systems and methods of detecting significant faces in video streams
US9417756B2 (en) 2012-10-19 2016-08-16 Apple Inc. Viewing and editing media content
US20140181668A1 (en) 2012-12-20 2014-06-26 International Business Machines Corporation Visual summarization of video for quick understanding
KR20160011532A (en) * 2014-07-22 2016-02-01 삼성전자주식회사 Method and apparatus for displaying videos
EP3570207B1 (en) 2018-05-15 2023-08-16 IDEMIA Identity & Security Germany AG Video cookies
US11127221B1 (en) 2020-03-18 2021-09-21 Facebook Technologies, Llc Adaptive rate control for artificial reality
CN115103223B (en) * 2022-06-02 2023-11-10 咪咕视讯科技有限公司 Video content detection method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040062520A1 (en) * 2002-09-27 2004-04-01 Koninklijke Philips Electronics N.V. Enhanced commercial detection through fusion of video and audio signatures

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7742641B2 (en) * 2004-12-06 2010-06-22 Honda Motor Co., Ltd. Confidence weighted classifier combination for multi-modal identification
KR101195613B1 (en) * 2005-08-04 2012-10-29 삼성전자주식회사 Apparatus and method for partitioning moving image according to topic
US7555149B2 (en) * 2005-10-25 2009-06-30 Mitsubishi Electric Research Laboratories, Inc. Method and system for segmenting videos using face detection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040062520A1 (en) * 2002-09-27 2004-04-01 Koninklijke Philips Electronics N.V. Enhanced commercial detection through fusion of video and audio signatures

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Chaisorn et al, "A multi-modal approach to story segmentation for news video", IWI system, 2003. *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9449216B1 (en) * 2013-04-10 2016-09-20 Amazon Technologies, Inc. Detection of cast members in video content
WO2015135277A1 (en) * 2014-03-14 2015-09-17 小米科技有限责任公司 Clustering method and related device
CN104394422A (en) * 2014-11-12 2015-03-04 华为软件技术有限公司 Video segmentation point acquisition method and device
CN108446390A (en) * 2018-03-22 2018-08-24 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN110807368A (en) * 2019-10-08 2020-02-18 支付宝(杭州)信息技术有限公司 Injection attack identification method, device and equipment

Also Published As

Publication number Publication date
KR100804678B1 (en) 2008-02-20
US8073208B2 (en) 2011-12-06
US20080166027A1 (en) 2008-07-10

Similar Documents

Publication Publication Date Title
US8073208B2 (en) Method and system for classifying scene for each person in video
US8316301B2 (en) Apparatus, medium, and method segmenting video sequences based on topic
US7555149B2 (en) Method and system for segmenting videos using face detection
CN106937114B (en) Method and device for detecting video scene switching
US6195458B1 (en) Method for content-based temporal segmentation of video
US20090274364A1 (en) Apparatus and methods for detecting adult videos
US8401313B2 (en) Image processing apparatus and image processing method
US20100188580A1 (en) Detection of similar video segments
US9773322B2 (en) Image processing apparatus and image processing method which learn dictionary
US7813552B2 (en) Methods of representing and analysing images
KR100717402B1 (en) Apparatus and method for determining genre of multimedia data
JP2015536094A (en) Video scene detection
US20120148157A1 (en) Video key-frame extraction using bi-level sparsity
Han et al. Video scene segmentation using a novel boundary evaluation criterion and dynamic programming
EP2270748A2 (en) Methods of representing images
JP6557592B2 (en) Video scene division apparatus and video scene division program
KR101195613B1 (en) Apparatus and method for partitioning moving image according to topic
JP2009123095A (en) Image analysis device and image analysis method
KR100779074B1 (en) Method for discriminating a obscene video using characteristics in time flow and apparatus thereof
US8666175B2 (en) Method and apparatus for detecting objects
KR100656373B1 (en) Method for discriminating obscene video using priority and classification-policy in time interval and apparatus thereof
KR102221792B1 (en) Apparatus and method for extracting story-based scene of video contents
Yilmaz et al. Shot detection using principal coordinate system
Shah et al. Shot boundary detection using logarithmic intensity histogram: An application for video retrieval
Bailer et al. Detecting and clustering multiple takes of one scene

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION