WO2015125815A1 - Video image editing apparatus - Google Patents

Video image editing apparatus Download PDF

Info

Publication number
WO2015125815A1
WO2015125815A1 PCT/JP2015/054406 JP2015054406W WO2015125815A1 WO 2015125815 A1 WO2015125815 A1 WO 2015125815A1 JP 2015054406 W JP2015054406 W JP 2015054406W WO 2015125815 A1 WO2015125815 A1 WO 2015125815A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
image
moving image
digest moving
unit
Prior art date
Application number
PCT/JP2015/054406
Other languages
French (fr)
Japanese (ja)
Inventor
内海 端
将伸 八杉
貴也 山本
知宏 猪飼
山本 智幸
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Priority to JP2016504128A priority Critical patent/JPWO2015125815A1/en
Publication of WO2015125815A1 publication Critical patent/WO2015125815A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/93Regeneration of the television signal or of selected parts thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal

Definitions

  • the present invention relates to a video editing apparatus that automatically edits video information such as moving images and still images.
  • the digest moving image is a moving image of a relatively short time that is reconstructed so that a large number or a long time moving image can be input, and all or all of the moving images can be viewed in summary or partially.
  • Patent Document 1 discloses an image display device that simultaneously and continuously displays a digest moving image and a still image.
  • still images or moving images arranged continuously are assigned to areas arranged in a frame shape on a movie film, and a plurality of images can be viewed simultaneously.
  • Patent Document 1 a still image or a moving image that is desired to be displayed needs to be displayed once on the screen as a thumbnail image and selected from the displayed images. If the number of captured images is small, there is no particular problem. However, if a large number of images are taken and not organized, the user can select each of the huge thumbnail images. There is a need to choose. As the number of images increases, the time and labor required for the selection work increase, and the burden on the user increases. Further, if a still image is a target, it is easy to grasp the content from the thumbnail image, but if a moving image is a target, it may be difficult to grasp the content from the thumbnail image. In such a case, it leads to dissatisfaction that the user cannot select an appropriate image instead of selecting it with effort.
  • Patent Document 1 there is a problem that the display is monotonous and easy to get tired because a still image or a moving image is separately displayed in a display area fixedly arranged on the display device.
  • a display device with a small screen such as a smartphone or a small tablet PC, there is a problem that it is difficult to see each image displayed separately because the display area is small.
  • the present invention has been made in view of the above points, and it takes a long time to confirm or appreciate the contents of an image, and it takes a lot of still images and moving images to take a long time or troublesome operation.
  • a video editing apparatus or method that can be confirmed and viewed in a short time without spending time.
  • a video editing apparatus divides an image data group including a moving image into one or more scenes and generates scene information indicating scene-specific features. And a digest moving image generating unit that generates a digest moving image of the image data based on the scene information, wherein the digest moving image generating unit is based on the scene information, Whether to use each scene when generating a digest video, whether to place multiple scenes in the same frame, and the spatial arrangement pattern of scenes when placing multiple scenes in the same frame It is characterized by deciding.
  • a video editing apparatus includes a playback time candidate derivation unit that derives a playback time candidate for a digest moving image based on an image data group, and the playback time candidate to a user.
  • a playback time candidate display unit that presents and sets a designated time based on a user event, a scene information generation unit that divides an image data group including a moving image into one or more scenes, and an image based on the scene
  • a video editing apparatus comprising: a digest moving image generating unit configured to generate a clip and generate a digest moving image by temporally combining the image clips, wherein the digest moving image generating unit Adjustment is performed such that the reproduction time becomes the specified time.
  • the video editing apparatus divides an image data group including a moving image into one or more scenes and generates scene information indicating scene-specific features.
  • An information generation unit, an output control unit for determining a digest moving image generation policy, and notifying the determined generation policy to the digest moving image generation unit, and a plurality of scenes on the screen based on the scene information and the generation policy A digest moving image generating unit that generates a digest moving image of the image data group, and a video display unit that displays video and operation information;
  • a digest moving image editing control unit that reproduces the digest moving image and outputs it to the video display unit, and an operation unit that detects an operation input from the outside. That a video editing apparatus is characterized by changing the configuration of the digest moving image by the detected operation input by the operation unit.
  • FIG. 1 is a schematic diagram showing the configuration of a video editing apparatus according to the first embodiment of the present invention.
  • the video editing apparatus 100 includes an image data classification unit 101, a scene information generation unit 102, a digest moving image generation unit 103, an event selection unit 104, and an output control unit 105.
  • the video editing apparatus 100 may further include a data recording unit that stores image data, a video display unit that displays images, or a data recording device that has the same functions as those described above.
  • the video display device may be configured to be connectable to the outside.
  • the image data classification unit 101 classifies image data.
  • the image data is data that records a moving image.
  • the playback time of the moving image date and time information indicating the date and time when the image was shot or created, position information that indicates the location (position) when the image was shot and created, and shooting or creation.
  • Electronic data including metadata such as creator information indicating the user or the device that has performed.
  • Each image data may be an electronic file stored in a recording medium (not shown), or may be digital data including an image / audio signal input from the photographing apparatus.
  • the image data may include a still image.
  • the image data classification unit 101 classifies each image into one or more image data groups that match a predetermined condition based on metadata included in the image data. For example, image data captured on the same date is classified as one image data group. Furthermore, referring to position information at the time of shooting, a plurality of image data having the same shooting date and time and position information within a predetermined range may be classified as one image data group. Alternatively, a plurality of pieces of image data whose position information at the time of shooting is within a predetermined range even when the shooting dates and times are different may be classified as one image data group. Further, for example, a plurality of pieces of image data having position information within a predetermined range and the same creator information may be classified as one image data group.
  • FIG. 15 shows an example of an image data group classified by the image data classification unit 101. It is assumed that image data 11, 12, 13,... 1n, 21, 22, 23,.
  • the image data group 10 includes image data 11, 12, 13,... 1n, and the image data group 20 includes image data 21, 22, 23,.
  • Image data 11, 12, 13,..., 1n includes metadata 11a, 12a, 13a... Included in each image data, the date and time information is “01/01/2014” and the position information is “around home”. The point is common.
  • the image data group 10 is an example in which the image data 11, 12, 13,... Having the same date information (shooting date) and position information are classified as one image data group.
  • the image data group 20 is an example in which the image data 21, 22, 23,..., Which have different date information (shooting date) but whose position information is within a predetermined range, is classified as one image data group.
  • the image data classification unit 101 generates image data group identification information 10A and 20A as information indicating the image data groups classified in this way.
  • the image data group identification information 10A and 20A includes the name of the image data group and information indicating the image data included in the image data group in order to identify the image data group. In the example of FIG.
  • the image data classifying unit 101 gives a character string “01/01/2014 around the home” as the name of the image data group 10.
  • a character string “02.05 to 05.2014, Hawaii Island” is given.
  • the image data group identification information 10A, 20A the name of each image data included in the image data group (file name such as /data/DSC_1001.mov in the figure)
  • the image data group identification information may be configured to include the shooting date and time.
  • the scene information generation unit 102 analyzes the image data, classifies the image data into one or more scenes characterized by an image signal or an audio signal, and generates scene information that is information indicating the feature of each scene. Generate.
  • the scene information is, for example, motion information indicating changes in the time direction in the image, person information indicating the number or size of a person's area appearing in the image, or conversation indicating the presence or length of an utterance section in an audio signal. It includes information. Details of the scene information generation unit 102 and the generated scene information will be described later.
  • the digest moving image generation unit 103 reads the scene information generated by the scene information generation unit 102 in units of image data groups classified by the image data classification unit 101, and follows the time series of image data shooting / creation. Then, a digest moving image is generated. When there are a plurality of image data groups, a digest moving image is generated for the image data group selected by the event selection unit 104 described later. In addition, the digest moving image generation unit 103 generates a digest moving image according to a generation policy notified from the output control unit 105 described later when generating a digest moving image.
  • the video editing device 100 outputs the generated digest moving image to a video display unit built in the video editing device 100 or an externally connected video display device, or a built-in data recording unit or an externally connected data recording Output to the device. Details of the operation of the digest moving image generating unit 103 will be described later.
  • the event selection unit 104 determines which image data group among the image data groups classified by the image data classification unit 101 is to be edited. For example, the image data group captured on the previous day, that is, the image data group whose shooting date and time is the previous day of the editing date, is determined as an editing target based on the editing date on which the digest moving image is automatically edited. Further, on the basis of the designated date and time designated by the user instead of the editing date, the image data group with the shooting date and time before and after the designated date and time may be determined as the editing target.
  • the image data group that the event selection unit 104 determines to be edited may be based not only on date / time information but also on position information and creator information.
  • an image data group including image data having position information specified by the user or position information within a predetermined range including the position may be determined as an editing target.
  • an image data group including image data having position information specified by the user or position information within a predetermined range including the position may be determined as an editing target.
  • only an image data group having specific creator information may be determined as an editing target.
  • an image data group excluding an image data group having specific creator information may be determined as an editing target.
  • the number of image data groups that the event selection unit 104 determines to be edited is not limited to one, and may be two or more.
  • a timing when the event selection unit 104 determines an image data group to be edited a day change may be used as a trigger.
  • the event selection unit 104 may determine the image data group according to a user selection. For example, the event selection unit 104 displays information indicating one or more image data groups classified by the image data classification unit 101 on a display unit (not shown).
  • the information indicating the image data group may be, for example, a character string representing the shooting date or creator of the image data group, or may be an icon or thumbnail image on a map image indicating the range of shooting position information included in the image data group. Good.
  • the user designates an image data group desired to be edited for the digest moving image.
  • the event selection unit 104 determines the image data group designated by the user as the image data group to be edited.
  • the event selection unit 104 notifies the digest moving image generation unit 103 of information (selection information) indicating the determined image data group.
  • FIG. 16 shows an example of a display screen by the event selection unit 104 when the user selects and determines an image data group to be edited.
  • the event selection unit 104 uses the image data group identification information 10A, 20A,... Indicating the image data groups 10, 20,.
  • the selection display screen 40 including the names 41, 42, 43,... Is output.
  • the user inputs the name (41, 42, 43..., Etc.) of the image data group desired to be edited via an operation means (for example, touch panel, mouse, keyboard, etc.) connected to or incorporated in the video editing apparatus 100. specify.
  • the event selection unit 104 selects the name of the image data group designated by the user (“January 02 to 05, 2014, Hawaii Island” in the example of FIG. 16) or image data group identification information (10A, 20A or the like) is sent to the digest moving image generation unit 103 as selection information indicating the image data group.
  • the output control unit 105 determines an output destination and a generation policy of the digest moving image generated by the digest moving image generating unit 103.
  • the output control unit 105 inputs capability information indicating the number of display pixels of a video display device (not shown), audio output specifications, and the like, and determines a digest moving image generation policy based on the capability information.
  • the video editing device of this embodiment has a built-in video display unit capable of displaying a digest moving image, and another video display device is connected to the outside. In this case, a digest moving image generation policy is determined for each of the built-in video display unit and the externally connected video display device.
  • the output control unit 105 notifies the digest video generation unit 103 of the determined generation policy.
  • the output control unit 105 determines a digest moving image generation policy based on information constituting a digest moving image generation policy given by an input unit (not shown).
  • the digest moving image generation policy is a kind of parameter set including information indicating output destination information, output image specifications, output audio specifications, scene selection criteria, and simultaneous arrangement of a plurality of scenes.
  • the process of determining the digest moving image generation policy in the output control unit 105 will be described for each parameter.
  • the output control unit 105 determines, as output destination information, information indicating whether the output destination video display device is a video display unit built in the video editing device 100 or an externally connected video display device. To do. Whether the output destination is built-in or externally connected is determined by electrically detecting whether or not a display device is connected to the outside of the video editing apparatus 100, or specified by an input unit (not shown). When the video editing device 100 has a built-in video display unit and an external video display device is connected, the output control unit 105 determines output destination information so that both are output destinations. Also good.
  • the output control unit 105 displays information indicating image display specifications such as the number of display pixels of an output destination video display device, that is, a video display unit built in the video editing device or an externally connected video display device. To determine the output image specifications.
  • the output image specification is configured to include at least the number of output horizontal pixels and the number of output vertical pixels. Basically, the number of display pixels in the horizontal direction and the vertical direction of the output destination video display device is used as it is. Set to the number of output vertical pixels. However, if it is known that the output video display device does not display an image on the full screen of the display device, such as displaying an image in a window, the value is smaller than the number of display pixels of the output video display device. May be set to the number of output horizontal pixels and the number of output vertical pixels, respectively.
  • the output control unit 105 outputs information indicating the audio playback capability of the audio output device of the output destination, that is, the audio output unit incorporated in the video editing device or the audio output unit of the video display device connected to the outside, to the video display device.
  • the output audio specification may include a sampling frequency, the number of quantization bits, etc., and any information sets information indicated by the audio reproduction capability of the output audio reproduction device. For example, there are examples of sampling frequencies such as 32 kHz, 44.1 kHz, 48 kHz, and 96 kHz, and examples of the number of quantization bits include 8 bits, 16 bits, and 24 bits.
  • the output control unit 105 receives information indicating the user's preference regarding the digest moving image generation by an input unit (not shown), and determines scene selection criteria such as “person main” and “landscape main”.
  • the information indicating the user's preference may be language information such as “person” or “landscape”, or may be information indicating the image itself obtained by selecting from thumbnail images having different tendencies, for example.
  • Information indicating user preferences is not only “person-centric” or “landscape-centric”, but also information indicating major subjects such as “specific persons”, “animals”, and “flowers” based on face recognition and shape recognition.
  • Information indicating the details of the image such as “seaside” and “forest” based on a distribution analysis of pixel values, may be used.
  • the output control unit 105 may set “person subject” as a standard scene selection criterion.
  • the output control unit 105 sets, as “multiple scene simultaneous arrangement”, information indicating whether or not multiple scenes are allowed to be simultaneously arranged in the same image frame in accordance with the video display device of the output destination. For example, if the video display unit built in the video editing device is valid as the output destination, the simultaneous arrangement of multiple scenes is set to “No”, and if the video display device connected externally is valid as the output destination, multiple scenes Set simultaneous placement to “Yes”.
  • This criterion is based on the premise that the display device built in the video editing apparatus is small (for example, when the video editing apparatus is a smartphone), and therefore, when connected externally, it is connected to a large display device. is there.
  • the size of the display device of the output destination is known in advance, or when information capable of calculating the size of the display device (for example, information indicating pixel density: dpi) is obtained, The size is calculated, and if it is larger than the predetermined threshold, the simultaneous arrangement of a plurality of scenes is set to “permitted”, and if not, the simultaneous arrangement of a plurality of scenes is set to “No”.
  • FIG. 2 is an example of scene information generated by the scene information generation unit.
  • the scene information 200 shown in FIG. 2 describes scene-related information in units of lines, and each line 201, 202, 203,... Is configured to correspond to one scene.
  • Information described in each row 201, 202, 203,... Indicates an image file name, shooting date, shooting time, scene start frame number, scene end frame number, person information, motion information, and conversation information in order from the left.
  • the image file name is a character string indicating a storage location of still image data or moving image data including each scene.
  • the shooting date and shooting time are basically character strings indicating the date and time when the image file including each scene is recorded.
  • the scene start frame number and the scene end frame number are information indicating the time range (scene length) of the scene in the corresponding image file. For example, when the scene start frame number is 0 and the scene end frame number is 149, if the corresponding image file is moving image data of 30 fps, it indicates that the scene is 5 seconds from the start of the file.
  • the person information, motion information, and conversation information are information indicating the characteristics of the image signal / audio signal of the scene. Next, personal information, motion information, and conversation information, which are information indicating the characteristics of the image signal / audio signal of each scene, will be described.
  • Person information is information including the presence or absence of a person in the scene. Furthermore, information indicating the number of persons, the personal name, the posture, the size of the person area, and the distribution pattern of a plurality of persons may be included.
  • the motion information is information indicating the presence / absence and type of motion in the scene. The movement of each object may be shown, and the movement for each area may be shown.
  • Conversation information is information indicating the volume and type of sound (silence, human voice, music, etc.) for a scene. Furthermore, information for identifying a speaker and information on a sound source such as music type may be included. In FIG. 2, the three types of information are represented as index numbers corresponding to predetermined types.
  • no person (0) means that there is no or almost no figure of the person throughout the scene.
  • the main character (1) means that one or two persons are shown in the scene and the area is larger than a predetermined size. For example, a scene in which a photographer has photographed a specific person with an intention corresponds.
  • the other person (2) means that a person is captured in the scene, but the number of persons is large, or the captured area is smaller than a predetermined size. For example, it corresponds to a group photo-like scene including a specific person, or a scene shot so that people can see how people move without knowing who they are.
  • the motion information for example, three types of values of “no motion (0)”, “part of motion (1)”, and “entire motion (2)” can be taken.
  • No motion (0) means that there is almost no change in the image throughout the scene.
  • the motion part (1) means that there is motion in a part of the image area in the scene. For example, a scene in which a person is dancing in front of a fixed camera corresponds.
  • the entire motion (2) means that there is a motion over the entire image area in the scene. For example, it corresponds to a scene shot while moving the camera itself horizontally.
  • conversation information for example, three kinds of values of “no sound (0)”, “conversation exists (1)”, and “other voice (2)” can be taken.
  • No sound (0) means that no usable sound signal is recorded, for example, the sound signal level is extremely low throughout the scene.
  • “with conversation (1)” means that a voice including a conversation of a person is recorded in the scene.
  • the other voice (2) means that a sound signal of a predetermined level or higher is continuously recorded although it is not a conversation. For example, a scene where music is flowing corresponds.
  • the scene information generation unit 102 determines and generates the scene information as described above by analyzing the image signal and the audio signal in the image data. At that time, for example, when an image signal and an audio signal are analyzed in units of length of 1 second, and there is no change in the three types of information indicating the characteristics of the image signal / audio signal, the scene is regarded as one continuous scene. Generate information. On the other hand, when a change occurs in any of the three types of information, the change is regarded as a scene break, and one piece of moving image data is divided into a plurality of scenes to generate respective scene information.
  • the scene information generation unit 102 may generate scene information so as to exclude scenes unsuitable for inclusion in the digest moving image in the process of generating scene information. For example, in the process of image signal analysis, scenes that are likely to not know what was captured even if you look at the image, such as when there is a sudden movement of the entire image or when there is no focus The information generation unit 102 does not generate scene information itself. Alternatively, the scene information generation unit 102 generates a digest inappropriate flag indicating that the scene is not suitable for inclusion in the digest moving image. Thus, for example, when starting to take a moving image with a digital camera, a smartphone, or the like, a scene that is not useful for inclusion in the digest moving image can be eliminated, such as when a large camera shake or focus shift occurs.
  • scene information 200, 209 and 211 are scene information corresponding to a scene that is a still image. Since there is no time element in the still image, there is no scene start frame number and scene end frame number, which are information indicating the time range of the scene. In addition, since there is no movement or sound on the image, neither movement information nor conversation information exists. Such non-existing information is represented by a symbol * in FIG.
  • the scene information generation unit 102 analyzes the image signal of the still image, and “no person (0)”, “main person (1)”, “other person (2)” like the moving image. Give any information.
  • the shooting time of the image file is normally recorded as the time when the image is recorded as a file, that is, the time when shooting is completed.
  • the scene information generation unit 102 does not divide the image file into a plurality of scenes in the process of generating scene information of a certain image file, the shooting time of the image file corresponds to the shooting time of the corresponding scene as it is.
  • the scene information generation unit 102 divides one image file into a plurality of scenes, the time corresponding to the shooting time of each scene may not match the time indicated by the shooting time of the original image file. is there.
  • the scene information generation unit 102 divides one image file into a plurality of scenes, the shooting time ShootTime recorded for the image file, the frame rate fr_rate of the moving image, the number of frames FALL, and each divided
  • Generate information By doing so, when referring to the scene information later, the image file name and frame number are not referred to, and only the shooting date and time are compared to compare the scenes in time. It becomes possible to specify the relationship.
  • the scene information generation unit 102 acquires the time length of each scene in the process of analyzing the image signal or audio signal of each scene, and the acquired scene length and the shooting time of the image data including the scene. Based on this comparison, adjustment is made so that an appropriate shooting date and shooting time are recorded. For example, even if the shooting date as an image file is recorded as January 1, 2014, and the shooting time is recorded as 0:00 am, the length of the entire moving image of the image file is several minutes. In this case, when the head part of the image file is extracted as a scene, the actual shooting date of the extracted scene is December 31, 2013 which is different from the shooting date of the image file, and the shooting time is For example, 23:55:00. In this way, by calculating an appropriate shooting date and time for each scene and recording it as scene information, a scene included in the image data group to be edited determined by the event selection unit 104 is appropriately selected.
  • the event selection unit 104 is appropriately selected.
  • the scene start frame number and the scene end frame number generated as the scene information may be replaced with other information specifying the temporal position in the image file.
  • an image file elapsed time indicating the scene start and an image file elapsed time indicating the scene end may be generated as the scene information.
  • the elapsed time in the image file is expressed, for example, in seconds, milliseconds, or seconds + frame number with respect to the head of the image file.
  • the information for specifying the temporal position of the scene may be represented by a character string as described above, or represented by a numerical value (for example, a numerical value representing an elapsed time with reference to a predetermined date and time or time). Also good. Further, information representing the frame rate of the moving image may be included.
  • the example in which the person information, the motion information, and the conversation information are represented by numerical values is shown, but may be represented by a character string that represents the feature of each information.
  • characters such as “NO_HUMAN” meaning “no person (0)”, “HERO” meaning “main person (1)”, “OTHERS” meaning “other person (2)”, etc. Represented by a column.
  • movement information and conversation information may be represented by character strings.
  • FIG. 2 shows an example in which scene information, person information, motion information, and conversation information are indicated by numerical values (indexes corresponding to predetermined types).
  • numerical index it may be stored as a character string corresponding to a predetermined type, or instead of a single numerical value or character string, parameters (number of people, motion vector, volume at each frequency, etc.) May be stored as a set of
  • the data need not be readable text data, and may be binary data.
  • FIG. 3 is a conceptual diagram illustrating a process of generating a digest moving image by the video editing apparatus according to the present embodiment.
  • the digest moving image generation unit 103 reads the corresponding scene information 303 for the selected image data group 302 of the image data group 301, and performs the digest according to the predetermined digest moving image generation policy 305. Generate a moving image.
  • a group of image data 302 for which a digest moving image is to be generated is, for example, all image data photographed on a certain day. This image data group is determined by the image data classification unit 101 and the event selection unit 104 as described above.
  • the digest moving image generation unit 103 refers to the scene information generated by the scene information generation unit 102 from the top, and reads the scene information of the scene corresponding to the selection information.
  • the digest moving image generation unit 103 refers to the read scene information in the order of shooting date and shooting time, and selects the type of scene, such as a scene used alone and a scene used in combination with other scenes. decide. Based on the determined scene type, the digest moving image generation unit 103 generates image clips 306a, 306b, 306c,...
  • notations such as S01, S02, and S03 each represent a scene.
  • the notation “S01 + S02” in the image clip 306a indicates that the image clip 306a is an image clip in which both the scene S01 and the scene S02 are spatially arranged.
  • the image clips 306a, 306b, 306c, and the like are still images or moving images that include at least one scene and have an appropriate length (for example, 1 second or longer).
  • FIG. 7 shows an internal configuration of the digest moving image generating unit 103 in the present embodiment.
  • the digest moving image generation unit 103 includes a target image extraction unit 1031, a scene type determination unit 1032, a scene space arrangement unit 1033, a scene time arrangement unit 1034, and a digest control unit 1035.
  • the target image extraction unit 1031 refers to the selection information indicating the target image data group notified from the event selection unit 104, and extracts an input image when generating a digest moving image. Information indicating the extracted image data is notified to the scene type determination unit 1032 and the scene space arrangement unit 1033.
  • the scene type determination unit 1032 refers to the scene information generated by the scene information generation unit 102, reads the scene information of the scene corresponding to the information indicating the image data extracted by the target image extraction unit 1031, and determines the scene type. decide.
  • FIG. 4 shows an example of the relationship between the scene information and the scene type determined by the scene type determination unit 1032.
  • FIG. 4 shows an example of scene information in the same manner as FIG. 2, and each row 401, 402, 403... Included in the scene information 400 describes scene information corresponding to one scene.
  • the scene information 401, 402, 403... Is also described as meaning the scene itself in order to simplify the description.
  • the scene type determination unit 1032 refers to the scene information 400, compares the shooting times of two consecutive scenes in the order of the shooting times, and the difference ⁇ T between the two shooting times is within a predetermined threshold (scene proximity determination threshold THt).
  • the scene type determination unit 1032 further determines a main scene or a sub-scene for each scene determined as a combination scene as follows.
  • the scene type determination unit 1032 refers to the person information, motion information, and conversation information included in the scene information of each scene. If it is determined that the scene is a main scene, the scene type determination unit 1032 determines that the scene is a main scene. Each scene is classified. For example, in the example of FIG.
  • both the person information of both the scene 401 and the scene 402 is “main person (1)”, both are determined to be main scenes and classified as main scenes. Since the scene information of the scene 403 is “main person (1)”, the scene 403 is determined to be a main scene and classified as a main scene. Since the scene information of the scene 404 is “other person (2)”, the scene 404 is determined not to be a main scene and is classified as a sub-scene. In the scenes 405 and 406, the person information is “other person (2)” and “no person (0)”, respectively, and the scene 405 is relatively more important than the scene 406 and is a main scene. The scene 405 is classified as a main scene and the scene 406 is classified as a sub-scene.
  • the scene space placement unit 1033 determines the spatial placement of each scene and generates an image clip in which the scene is spatially placed.
  • FIG. 5 shows an example of scene arrangement by the scene space arrangement unit 1033.
  • the scene space arrangement unit 1033 determines the spatial arrangement (layout) of each scene based on the scene type determined by the scene type determination unit 1032 and the relationship between the scene information of the combination scenes as described above. For example, since the scene 401 and the scene 402 shown in the example of FIG. 4 are both main scenes, they are determined to be “parallel arrangement”, which is an arrangement that displays them in parallel at the same size (FIG. 5A). .
  • the main scene is displayed in the area 503 in the central portion of the screen while the sub scene is displayed on the entire image frame so that the main scene is noticed.
  • the central area of the scene is determined to be “central arrangement”, which is an arrangement for superimposing and displaying (FIG. 5B).
  • the reason why the central area of the main scene is displayed in a superimposed manner is that the person information of the scene 403 that is the main scene is “main person (1)”.
  • a scene whose person information is “main person (1)” means that one or two persons having a relatively large size are captured in the image frame.
  • FIG. 5B instead of the central area of the main scene, the entire image frame of the main scene may be reduced and displayed in the area 503.
  • FIG. 5D an arrangement as shown in FIG. 5D may be selected. In the arrangement of FIG. 5D, the sub-scene is displayed in the entire image frame as in FIG. 5B, and the central area of the main scene is cut out larger than that in FIG.
  • the display area of the sub-scene is smaller than that in FIG.
  • the scene space arrangement unit 1033 selects this arrangement when, for example, the motion information of the sub-scene is “whole movement (2)”.
  • the motion information of the sub-scene is “whole movement (2)”.
  • the scene 405 and the scene 406 shown in FIG. 4 are the same as the scene 403 and the scene 404 in that the scene 405 and the scene 406 are the main scene and the sub scene, respectively.
  • Is “other person (2)” it is unlikely that a specific area such as the center area of the image has an important meaning in the scene 405. Therefore, while the main scene is arranged in the entire image frame, an image obtained by reducing the sub-scene is determined as the “sub-screen arrangement”, which is an arrangement for superimposing the main scene on the main scene (FIG. 5C). At this time, the size of the small-screen area 506 is determined to be smaller than the main scene area (503, 507) in the above-described central arrangement.
  • the scene to be noticed is basically the main scene and the sub-scene is not particularly noticeable.
  • the size of the area 503 where the main scene in FIG. 5B, which is the central arrangement is about 1/4 of the entire image frame (the area 507 in FIG. 5D is the number of pixels in the horizontal direction). Is about 1/2 of the total number of pixels in the horizontal direction of the entire image frame), and the size of the area 506 in which the sub-scene is arranged in the sub-screen arrangement is set to about 1/9 of the entire image frame. Cut out from the original image or reduce the original image.
  • the size or scene to be noticed stand out.
  • FIG. 5E another example of the “small screen arrangement” shown in FIG. 5C is shown in FIG.
  • the main scene is arranged in the area 505 in the same manner as in FIG. 5C, but the area 508 in which the sub-scene is arranged is different from the area 506 in FIG. 5C. It was changed to a spatial position.
  • 5 (c) and FIG. 5 (e) is characterized in that the sub-scene is arranged in an area that does not hinder attention to the main scene.
  • an area 506 on the area 505 where the main scene is arranged in the process of scene analysis in the scene information generation unit 102 is an area 506 on the area 505 where the main scene is arranged in the process of scene analysis in the scene information generation unit 102.
  • the area where the sub-scene is superimposed is changed to the area 508 instead of the area 506 so that the person area on the area 505 where the main scene is arranged is not hidden in the sub-scene.
  • a spatial filter may be applied to some scenes to make an image that emphasizes the difference between the main scene and the sub-scene. For example, if the sharpness of the image is reduced by applying a smoothing filter to the region 504 in FIGS. 5B and 5D, the difference between the central region displaying the main scene and the peripheral region displaying the sub-scene is at a glance. You will be able to understand and the areas of interest will become clearer. Whether or not to apply such a spatial filter is determined based on, for example, the similarity between the images of the main scene and the sub-scene.
  • the scene space arrangement unit 1033 applies a smoothing filter to the sub scene when the similarity between the main scene and the sub scene is high, and does not apply the smoothing filter to the sub scene when the similarity is low.
  • the average values for the color components of the pixel values in the images of the main scene region 503 or 507 and the sub-scene region 504 are compared and averaged.
  • the difference between the values is smaller than the predetermined value, that is, when the similarity between the pixel values between the regions 503 and 507 and the region 504 is high, it is determined to apply the spatial filter to the region 504.
  • the spatial filter is not limited to the smoothing filter, and may be a color conversion filter that changes the color tone for each region.
  • the scene space arrangement unit 1033 may convert the sub-scene into gray scale or so-called sepia tone. If the image of the area 504 is converted to a gray scale or so-called sepia tone by color conversion, the areas 503 and 507 which are main scenes can be easily noticeable.
  • the scene space arrangement unit 1033 does not apply a spatial filter, but makes the change in the time direction of the image in the region 504 zero, that is, makes a still image, so that the difference from the regions 503 and 507 that are the main scenes is different. May be emphasized.
  • FIG. 5F shows an example in which three scenes are arranged.
  • the example shown in FIG. 5F is an arrangement example in the case where all of the person information of three scenes close in time are “main person (1)”.
  • the scene space arrangement unit 1033 determines that the three scenes are combination scenes with each other, and determines all of them as main scenes. Since three are main scenes, the central area of each scene is cut out and arranged in parallel in areas 509, 510, and 511 so that they have the same size.
  • the scenes are not the same length in time, other scenes can be matched to the scene with the shortest time between scenes placed in the same image clip. Cut off part of the scene and adjust.
  • the scene space arrangement unit 1033 outputs the image clip generated by the above method to the scene time arrangement unit 1034.
  • the scene time arrangement unit 1034 further combines the image clips in which the scenes are arranged spatially as described above in the time direction. 3, image clips 306a, 306b, 306c,... Each correspond to an image clip composed of only a single scene or an image clip in which a combination scene is arranged.
  • the scene time arrangement unit 1034 combines a plurality of image clips according to the context of the shooting time of the scene corresponding to each image clip.
  • the shooting time of that image clip is the scene information of the scene with the latest shooting time among the multiple scenes included in the image clip. It is regarded as shooting time information.
  • the combination scene described above is a scene in which the difference in shooting time is relatively small, that is, the shooting time is close compared to the length of the entire event.
  • scenes whose shooting times are close to each other are the same or similar scenes.
  • the generated digest moving images will become redundant when the similar scenes continue to be easily bored. . Therefore, by arranging scenes with high similarity in parallel in parallel or including them in a part of the same frame, it is possible to effectively use a large number of captured images and diversify the display layout. . Thereby, it is possible to generate a digest moving image that is difficult to get tired of, and to improve the satisfaction of the user.
  • an audio track included in image data corresponding to each scene used for the digest moving image is used as it is.
  • the audio track of the scene is used as it is.
  • the audio track to be used is determined by the method described below.
  • the combination scene arrangement is other than “parallel arrangement”, that is, “center arrangement” or “sub-screen arrangement”
  • the audio track of the main scene is used as the audio track of the digest moving image.
  • the audio track of each scene is used so as to be allocated to the left channel and the right channel of the audio track of the digest moving image in accordance with the positional relationship of the arranged scenes.
  • the digest control unit 1035 changes the digest moving image generation method (generation algorithm) according to the digest moving image generation policy determined by the output control unit 105. Specifically, whether or not to include a scene in the digest, switching judgment criteria of the main scene and sub-scene, presence / absence and arrangement pattern of multiple scenes, image coding quality, audio coding quality, etc. A digest video is generated. Changes in the digest moving image generation method will be described in detail below.
  • the scene type determination unit 1032 determines whether or not each scene included in the image data group that is the generation target of the digest moving image is a main scene. At this time, the scene type determination unit 1032 may make the above determination based on a scene selection criterion included in the digest moving image generation policy. For example, the above description is used to indicate that the scene selection criterion is “human subject”, and when the scene selection criterion is different from this, the digest control unit 1035 determines the criterion for determining the main scene. The information indicating the determination criterion is notified to the scene type determination unit 1032, and the scene type determination unit 1032 determines the type of the scene according to the information.
  • a scene other than a scene capturing a person's figure or conversation that is, a scene mainly including scenery such as nature is determined as a main scene.
  • scenes whose person information is “no person” or whose conversation information is other than “conversation” are classified as main scenes, and other combination scenes Are classified into sub-scenes.
  • select only scenes with the person information “No People”, and other scenes, that is, scenes with people in them are converted into digest video images as single scenes. Select a scene not to use. With such a configuration, it is possible to preferentially select a scene that matches a specified feature and generate a digest moving image that reflects user preferences.
  • the digest control unit 1035 switches whether or not to arrange in the same image frame when a plurality of scenes are close in time. Also good.
  • the digest control unit 1035 determines whether or not to arrange a plurality of scenes in the same image frame, and notifies the scene type determination unit 1032 and the scene space arrangement unit 1033.
  • the scene space arrangement unit 1033 treats a plurality of scenes close in time as a combined scene as described above, and uses the same image frame.
  • a digest moving image is generated so as to be placed inside.
  • the scene space arrangement unit 1033 treats each scene as a single scene and generates a digest moving image so as not to arrange the scenes in the same image frame.
  • the output control unit 105 sets “simultaneous placement of multiple scenes”, for example, a child whose scene is reduced. It is possible to avoid selecting the layout of the screen layout and not to impair the visibility of the generated digest.
  • the digest control unit 1035 determines whether to encode an image or sound based on output destination information included in the digest moving image generation policy. When not encoded, the generated digest moving image is output to a built-in video display unit or an externally connected video display device so as to be displayed and reproduced as it is.
  • encoding when a digest video is generated, an image or sound is encoded according to a predetermined encoding method, and a digest moving image is output as encoded data.
  • the encoding method used for encoding is, for example, MPEG-2, AVC / H. H.264, HEVC / H. In accordance with a method such as H.265, the sound follows a method such as MPEG-1, AAC-LC, HE-AAC.
  • the digest control unit 1035 is a method having the highest performance as a basic encoding method for encoding, for example, an image is HEVC / H. 265. It is assumed that HE-AAC is used for audio, and the encoding method and encoding quality to be actually used are determined based on output image specifications and output audio specifications described later. The encoding method and encoding quality will be described later.
  • the digest control unit 1035 determines the image coding quality of the generated digest moving image and the arrangement pattern of a plurality of scenes based on the output image specifications included in the digest moving image generation policy.
  • the output image specification includes at least information indicating the number of display pixels of an output destination video display device.
  • the number of display pixels is composed of the number of pixels in the horizontal direction and the number of pixels in the vertical direction.
  • the screen aspect ratio of the display device is also found.
  • the digest control unit 1035 performs a digest so as to maintain the number of pixels of the input image as it is. Generate a moving image.
  • the digest control unit 1035 determines the image encoding rate based on the number of display pixels of the output destination.
  • the digest control unit 1035 displays the encoding rate corresponding to the display pixel number of the output destination in the information table.
  • the digest control unit 1035 determines an encoding method for encoding the digest moving image according to the image reproduction capability.
  • an image encoding method supported by an externally connected video display apparatus is MPEG-2 and AVC / H.
  • the digesting control unit 1035 performs HEVC / H.
  • AVC / H.264 is set in accordance with the image reproduction capability indicated by the output image specification. H.264 is selected as the encoding method, and the digest moving image is encoded.
  • the digest control unit 1035 determines the audio coding quality of the digest video to be generated and the configuration of the audio track based on the output audio specifications included in the digest video generation policy.
  • the output audio specification is information indicating an audio reproduction capability of at least an output destination audio output device, that is, an audio output unit built in the video editing apparatus 100 or an audio output unit of an externally connected video display apparatus. It is configured including the number of channels, sampling frequency, number of quantization bits, and the like.
  • the digest control unit 1035 determines whether or not to use the audio track of the scene to be included in the digest moving image and channel allocation according to the number of output audio channels in the output audio specification. Also, audio resampling and bit number conversion are performed in accordance with the sampling frequency and quantization bit number of the output audio specification.
  • the digest control unit 1035 determines a speech encoding method based on the information. For example, if the audio encoding method supported by the externally connected video display apparatus is only MPEG-1 or AAC-LC, the digest control unit 1035 may select HE-AAC with higher encoding performance. AAC-LC is selected as the encoding method, and the audio track of the digest moving image is encoded.
  • FIG. 6 shows an arrangement example of a plurality of scenes determined by the digest moving image generation unit 103 when the screen aspect ratio of the input image is horizontally long and the screen aspect ratio of the output destination is vertically long.
  • FIG. 6A is an example in which two scenes that are close in time are both main scenes, as in the “parallel arrangement” example of FIG. 5A, and are displayed in parallel at the same size. Arrangement.
  • the original image is a horizontally long image including the region 602 and the region 602 ′ and a horizontally long image including the region 603 and the region 603 ′
  • the original image is matched with the screen aspect ratio of the display region 601.
  • the central area (602, 603) of each scene is cut out and arranged.
  • 6B as in the “center arrangement” example of FIG. 5B, the sub-scene is arranged in the entire image frame (display area) 601, and the center of the main scene is displayed in the area 604 in the central portion of the screen. It is an example arrange
  • FIG. 6D shows an arrangement as shown in FIG. 6D.
  • the sub-scene is displayed on the entire image frame 601 similarly to FIG. 6B, and the central area of the main scene is cut out larger than FIG. It is arranged.
  • Reference numeral 608 ′ in the figure denotes a partial area of the original image that is discarded by cutting out the area 608.
  • FIG. 6C shows an example in which an image obtained by reducing the sub-scene is arranged as a sub-screen area 606 as in the “sub-screen arrangement” of FIG.
  • the input image since the input image is horizontally long, it is arranged in the area 604 so as to display the entire main scene, and the sub-scene is arranged in a part of the vacant area on the screen.
  • the size of the small-screen area 606 is determined so as to be smaller than the main scene area 604 so that it can be distinguished from the main scene that is the scene to be noticed.
  • the main scene area 604 is determined so that the horizontal size is the same as the horizontal size of the entire image frame 601, and the sub-scene area 606 is horizontal in the horizontal direction of the entire image frame 601. It is determined to be about 2/3 of the size of the direction.
  • a spatial filter may be applied to some scenes to make an image that emphasizes the difference between the main scene and the sub-scene. For example, if the sharpness of the image is reduced by applying a smoothing filter to the region 605 in FIGS. 6B and 6D, the difference between the central region displaying the main scene and the peripheral region displaying the sub-scene is at a glance. You will be able to understand and the areas of interest will become clearer.
  • a main scene or a sub scene to which a spatial filter is applied may be displayed. By displaying the image on the entire image frame 601 including the region 607, the size of the displayed image is made the same as the image of the arrangement pattern other than that in FIG. 6C, so that no image is displayed in the region 607. Compared with other arrangements of image clips that may be combined in the time direction, it is possible to avoid the uncomfortable feeling that may occur when continuously viewing images. be able to.
  • FIG. 6E shows an example in which three scenes are arranged.
  • the example shown in FIG. 6 (e) is an arrangement example of three main scenes that are close in time as in FIG. 5 (f).
  • Each scene is arranged in parallel in the areas 609, 610, and 611 so as to include the central area.
  • the display layout of a single scene when the screen aspect ratio is vertically long will be described.
  • the single scene is arranged in the area 604 and the single scene is arranged in the area 605 as well.
  • the sharpness of the image is lowered by applying the smoothing filter as described above to the region 605.
  • a spatial feeling when viewing an image can be obtained, and the size of the displayed image can be combined in the time direction. Therefore, it is possible to avoid a sense of incongruity that may occur when continuously watching images.
  • the above spatial filter is not limited to a smoothing filter, and may be a color conversion filter that changes the color tone of each region.
  • a color conversion filter that changes the color tone of each region.
  • the areas 604 and 608 that are the main scene can be made conspicuous.
  • the difference from the regions 604 and 608 that are the main scenes may be emphasized by making the change in the time direction of the images of the regions 605 and 607 zero, that is, making a still image instead of the spatial filter.
  • the example in which the image with the screen aspect ratio of the landscape is arranged on the screen with the screen aspect ratio of the portrait has been described.
  • the size and position of the area where the scene is arranged, the area cut out from each scene, and whether or not to apply the spatial filter can be determined based on the same concept.
  • FIG. 19 shows a scene space layout when an image with a screen aspect ratio of portrait (hereinafter referred to as “portrait image”) is placed on a screen with a screen aspect ratio of landscape (hereinafter referred to as “landscape screen”).
  • FIG. 19A is an example of “parallel arrangement” for a landscape screen, as in FIG. 5A.
  • the images arranged in FIG. 19A are a main scene A that is a portrait image including areas 1902 and 1902 ', and a main scene B that is a portrait image including areas 1903 and 1903'.
  • the scene space arrangement unit 1033 cuts out the central areas of the main scene A and the main scene B and arranges them in the areas 1902 and 1903 so that they are displayed in parallel in the display area 1901.
  • a region 1902 'and a region 1903' are regions that are not displayed in the main scene A and the main scene B, respectively.
  • FIG. 19 (b) is an example of “center arrangement” for the landscape screen, as in FIG. 5 (b).
  • the image arranged in FIG. 19B is a main scene A which is a portrait image corresponding to the area 1904 and a sub-scene B which is a portrait image including areas 1905 and 1905 '.
  • the scene space arrangement unit 1033 arranges the central part of the sub-scene B so as to be displayed in the area 1905 corresponding to the entire display area 1901, and the main scene A in the area 1904 located in the central part of the display area 1901. Deploy. Of the sub-scene B, an area 1905 'is an area that is not displayed.
  • FIG. 19D is another example of the “center arrangement” for the landscape screen. Similar to FIG. 5D for FIG. 5B, the main scene A is compared with FIG. 19B. The display area of the sub-scene B is accordingly reduced.
  • the scene space arrangement unit 1033 arranges the central part of the sub-scene B so as to be displayed in the area 1905 corresponding to the entire display area 1901, and the central part of the main scene A is located in the central part of the display area 1901.
  • an area 1906 ' is an area that is not displayed.
  • the area 1905 ' is an area that is not displayed.
  • FIG. 19C is an example of “child screen arrangement” for the landscape screen, as in FIG. 5C.
  • the images arranged in FIG. 19C are a main scene A that is a portrait image including areas 1906 and 1906 ′, and a sub-scene B that is a portrait image corresponding to the area 1907.
  • the scene space arrangement unit 1033 arranges the central part of the main scene A in the area 1906 located in the central part of the display area 1901, reduces the sub-scene B, and arranges it in the area 1907 adjacent to the area 1906 of the main scene. To do.
  • an area 1906 ' is an area that is not displayed.
  • the scene space arrangement unit 1033 may also be arranged to display the central part of the main scene A or the central part of the sub-scene B in the area 1908 as the background of the areas 1906 and 1907.
  • FIG. 19 (e) is an example in which three scenes are arranged for the landscape screen, as in FIG. 5 (f).
  • the images arranged in FIG. 19E are a main scene A that is a portrait image including a region 1909, a main scene B that is a portrait image including a region 1910, and a main scene that is a portrait image including a region 1911.
  • the scene space arrangement unit 1033 cuts out the central areas of the main scene A, the main scene B, and the main scene C, and displays them in areas 1909, 1910, and 1911 so that they are displayed in parallel in the display area 1901. Deploy.
  • FIG. 20 shows an example of scene arrangement by the scene space arrangement unit 1033 when the portrait image is arranged on the portrait screen.
  • FIG. 20A is an example of “parallel arrangement” for a portrait screen in which two scenes are arranged in the vertical direction.
  • the images arranged in FIG. 20A are a main scene A that is a portrait image including a region 2002 and a main scene B that is a portrait image including a region 2003.
  • the scene space arrangement unit 1033 cuts out the central areas of the main scene A and the main scene B and displays them in the areas 2002 and 2003 so that they are displayed in parallel in the vertical direction in the display area 2001 corresponding to the portrait screen. Deploy. Note that in FIG. 20, an area that is not displayed when the image area is cut out is not illustrated and will be described separately with reference to FIGS. 21 and 22.
  • FIG. 20B shows an example of “center arrangement” for a portrait screen in which the sub-scene is arranged as a background over the entire display area and the main scene is arranged so as to be superimposed on the center portion.
  • the images arranged in FIG. 20B are a main scene A that is a portrait image including a region 2004 and a sub-scene B that is a portrait image including a region 2005.
  • the scene space arrangement unit 1033 arranges the sub-scene B in the area 2005 corresponding to the entire display area 2001, cuts out the central area of the main scene A, and arranges it in the central area 2004 in the vertical direction of the display area 2001.
  • FIG. 20C shows a “child screen arrangement” for a portrait screen in which the main scene A is arranged in an area corresponding to the entire display area, and the sub scene B is arranged as a child screen area so as to be superimposed on the main scene.
  • the images arranged in FIG. 20C are a main scene A that is a portrait image corresponding to the area 2006 and a sub-scene B that is a portrait image corresponding to the area 2007.
  • the scene space arrangement unit 1033 arranges the main scene A in the area 2006 corresponding to the entire display area 2001, and arranges the sub scene B so as to fit in the area 2007 having a size smaller than a quarter of the area of the entire display area 2001. To do.
  • the size of the area 2007 is, for example, about 1/9 of the area of the entire display area 2001.
  • FIG. 20D shows an example in which three scenes are arranged in the vertical direction for the portrait screen.
  • the images arranged in FIG. 20D are a main scene A that is a portrait image including a region 2008, a main scene B that is a portrait image including a region 2009, and a main scene that is a portrait image including a region 2010.
  • the scene space arrangement unit 1033 cuts out the central areas of the main scene A, the main scene B, and the main scene C and displays them in the areas 2008, 2009, and 2010 so that they are displayed in parallel in the vertical direction in the display area 2001. Deploy.
  • the main scene and the sub-scene to be output are output regardless of whether the screen aspect ratio of the output video display device is landscape (landscape screen) or portrait (portrait screen).
  • Each of these is an example of a scene arrangement in the case of a vertically long screen aspect ratio image (portrait image).
  • FIGS. 5 and 6 what has been described with reference to FIGS. 5 and 6 is an example of a scene arrangement in the case where both the main scene and the sub scene to be output are images having a horizontally long screen aspect ratio (hereinafter referred to as “landscape images”). It is.
  • the main scene and the sub scene arranged on the same screen are not necessarily images having the same screen aspect ratio.
  • the scene space arrangement unit 1033 when determining the arrangement of a plurality of scenes to “parallel arrangement”, “center arrangement”, “sub-screen arrangement”, etc., the size and screen aspect ratio of the display area of the video display device of the output destination and Based on the image size and aspect ratio of each scene to be arranged, an area to be displayed in the image of each scene is determined. At this time, image processing such as scaling (enlargement / reduction) and cropping (cutting) of each image is performed so that the image of each scene can be used most effectively according to the arrangement pattern. These image processing steps will be described with reference to FIGS. 21 and 22.
  • FIG. 21 shows a processing example of image scaling and cropping in the scene space arrangement unit 1033 when an image is output to the landscape screen.
  • examples of the scene arrangement are shown in FIGS. 5 and 19, but in FIG. 21, which areas 2101 ′ to 2104 ′ are extracted from the original images 2101 to 2104 for display. An example of how to determine this will be described.
  • the shaded area indicates a display area extracted from each original image.
  • Ho and Vo mean the horizontal size (number of pixels) and the vertical size (number of pixels) of the display area of the output destination, respectively, and H and V are the values before scaling and cropping processing. It means the size (number of pixels) in the horizontal direction and the size (number of pixels) in the vertical direction of the original image.
  • FIG. 21A shows an example of how to determine the display area 2101 ′ when the landscape image 2101 is used as the main scene of the “parallel arrangement” for the landscape screen as shown in FIG. 5A.
  • the scene space arrangement unit 1033 first scales the entire original image 2101 so that the vertical size V of the original image 2101 matches the vertical size Vo of the display area of the output destination (V ⁇ Vo). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2101 so that the horizontal size becomes Ho / 2, and extracts a display area 2101 ′.
  • enlargement / reduction is performed so as to maintain the screen aspect ratio of the original image so as not to cause distortion of the image in the scene.
  • scaling is performed so that the size ratio in the horizontal direction and the size ratio in the vertical direction are the same before and after scaling.
  • all the scaling processing in the description related to FIGS. 21 and 22 is performed based on the same concept.
  • FIG. 21B shows an example of how to determine the display area 2102 ′ when the portrait image 2102 is used as the main scene of the “parallel arrangement” for the landscape screen as shown in FIG. is there.
  • the scene space arrangement unit 1033 also uses the portrait image 2102 as the main scene of the “center arrangement” for the landscape screen as shown in FIG. 5B, according to FIG. 21B.
  • Display area 2102 ′ may be determined from 2102.
  • FIG. 21C shows how to determine the display area 2103 ′ when the landscape image 2103 is used as another “center arrangement” main scene for the landscape screen as shown in FIG. 5D.
  • the scene space arrangement unit 1033 first scales the entire original image 2103 so that the vertical size V of the original image 2103 matches the vertical size Vo of the display area of the output destination (V ⁇ Vo).
  • the predetermined number of pixels ⁇ is determined to be, for example, 5% of the vertical size Vo of the output destination display area.
  • FIG. 21D shows an example of how to determine the display area 2104 ′ when the portrait image 2104 is used as the main scene of the “child screen arrangement” for the landscape screen as shown in FIG. 5C. It is. First, the scene space arrangement unit 1033 scales the entire original image 2104 so that the horizontal size H of the original image 2104 matches the horizontal size Ho of the display area of the output destination (H ⁇ Ho). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2104 so that the vertical size is Vo, and extracts a display area 2104 ′. The scene space arrangement unit 1033 also determines the display area 2104 ′ from the original image 2104 according to FIG. 21D when using the portrait image 2104 as a sub-scene of the “center arrangement” for the landscape screen. May be.
  • FIG. 22 shows an example of scene scaling and cropping in the scene space layout unit 1033 when an image is output on the portrait screen.
  • examples of scene arrangement are shown in FIGS. 6 and 20, but in FIG. 22, regions 2201 ′ to 2203 ′ extracted for display from the original images 2201 to 2203 are displayed.
  • An example of how to determine is described. The meanings of the symbols in the figure are the same as those in FIG.
  • FIG. 22A shows an example of how to determine the display area 2201 ′ when the landscape image 2201 is used as the main scene of the “parallel arrangement” for the portrait screen as shown in FIG. 6A. is there.
  • the scene space arrangement unit 1033 first scales the entire original image 2201 so that the vertical size V of the original image 2201 is adjusted to one half (Vo / 2) of the vertical size of the display area of the output destination. (V ⁇ Vo / 2). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2201 so that the horizontal size becomes Ho, and extracts a display region 2201 ′.
  • the scene space arrangement unit 1033 also uses the landscape image 2201 as the main scene of the “center arrangement” for the portrait screen as shown in FIG.
  • the display area 2201 ′ may be determined from 2201.
  • the scene space layout unit 1033 also uses the landscape image 2201 as the main scene of the “child screen layout” for the portrait screen as shown in FIG. 6C according to FIG. 22A.
  • the display area 2201 ′ may be determined from the image 2201.
  • FIG. 22B shows an example of how to determine the display area 2202 ′ when the portrait image 2202 is used as the main scene of the “parallel arrangement” for the portrait screen as shown in FIG. It is.
  • the scene space arrangement unit 1033 first scales the entire original image 2202 so that the horizontal size H of the original image 2202 matches the horizontal size (Ho) of the display area of the output destination (H ⁇ Ho). After that, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2202 so that the vertical size is one half (Vo / 2) of the vertical size of the display area of the output destination. Thus, the display area 2202 ′ is extracted.
  • the scene space arrangement unit 1033 also uses the portrait image 2202 as the main scene of the “center arrangement” for the portrait screen as shown in FIG. 6B according to FIG. 22B.
  • the display area 2202 ′ may be determined from the image 2202.
  • FIG. 22C shows an example of how to determine the display area 2203 ′ when the landscape image 2203 is used as the “center arrangement” sub-scene for the portrait screen as shown in FIG. 6B. is there.
  • the scene space arrangement unit 1033 first scales the entire original image 2203 so that the vertical size V of the original image 2203 matches the horizontal size (Vo) of the display area of the output destination (V ⁇ Vo). After that, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2203 so that the horizontal size is Ho, and extracts a display area 2203 ′.
  • the scene space layout unit 1033 also uses the landscape image 2203 as the main scene or sub-scene of the background portion (area 607) of the “child screen layout” for the portrait screen as shown in FIG.
  • the display area 2203 ′ may be determined from the original image 2203 in accordance with FIG.
  • either the number of pixels in the horizontal direction (H) or the number of pixels in the vertical direction (V) of the original image corresponds to the number of pixels (Ho, Ho) in the corresponding direction in the display area of the output destination. / 2, Vo, Vo / 2, etc.).
  • the digest moving image suitable for the output destination device can be generated.
  • the digest moving image suitable for the output destination device can be generated.
  • the video editing apparatus according to the second embodiment is characterized in that the digest moving image generation unit 103 is different from the video editing apparatus according to the first embodiment.
  • the digest moving image generation unit in the present embodiment is configured to further include a digest moving image generation count unit and a random arrangement pattern determination unit.
  • the difference from the first embodiment will be described in detail.
  • the digest moving image generation counting unit counts the number of times a digest moving image is generated in units of image data groups indicated by the selection information notified from the event selection unit 104.
  • the digest moving image generation counting unit notifies the random arrangement pattern determination unit of the counted number of generations.
  • the random arrangement pattern determination unit does nothing, and when the number of generations is two or more, when determining the spatial arrangement pattern of a plurality of scenes, The arrangement pattern is changed randomly.
  • the scene space arranging unit 1033 according to the first embodiment of the present invention has been described in the unit of the selected image data group at the first generation.
  • the arrangement pattern of a plurality of scenes is determined based on the relationship between the type of scene and the scene information between the combination scenes. However, at the second and subsequent generations, the arrangement pattern of the plurality of scenes is randomly changed for each combination scene. Regarding the determination of the combination scene, as described with respect to the scene type determination unit 1032 in the first embodiment, it is determined that scenes close in time are selected.
  • FIG. 8A shows the internal configuration of the video editing apparatus 100a according to the present embodiment.
  • the video editing apparatus 100a includes an image data classification unit 101, a scene information generation unit 102a, a digest moving image generation unit 103a, an event selection unit 104, and an output control unit 105.
  • the difference from the video editing apparatus 100 according to the first embodiment will be described in detail.
  • the scene information generation unit 102a analyzes the image data, classifies the image data into one or more scenes characterized by an image signal or an audio signal, and generates scene information that is information indicating the feature of each scene.
  • the scene information is configured to include “number of persons”, “maximum person size”, and “maximum person position” as information regarding the feature region in the image (hereinafter, these three types of information are collectively referred to as person information. Called).
  • “Number of persons” represents the maximum number of image areas (person areas) of persons appearing in the image of each scene in units of image frames.
  • “Maximum person size” represents the number of person areas in each scene.
  • the scene information generation unit 102a detects a face image and a whole body image in the image as feature regions of each scene, and when a face image is detected, generates scene information from information related to the region of the face image and detects the face image. If it is not performed (such as a person appearing but looking sideways or behind), scene information is generated from information about the whole body image area.
  • image feature amounts are extracted in units of regions of a predetermined size, and face regions are detected (identified) based on a face image discriminator using Haar-Like feature amounts.
  • a gradient direction histogram (HOG: Histograms of oriented gradients) is calculated in a predetermined image area unit, and a whole body image region based on a whole body image discriminator using the HOG feature amount is calculated.
  • HOG Histograms of oriented gradients
  • the method for detecting the face image and the whole body image is an example, and the method is not limited to the above method as long as the size and position of the area to be detected can be obtained.
  • the face image and the whole body image but also the upper body image and lower body image areas are detected based on a classifier using separately prepared feature amounts.
  • the upper body image Scene information may be generated based on information (number, size, position) on the area, and if no upper body image is detected, scene information based on information (number, size, position) on the lower body image area May be generated.
  • FIG. 9 is a diagram showing the concept of the person information.
  • FIG. 9A shows an example of a scene in which the person area 701 is located at the coordinates 702 (x1, y1) and the size thereof is (H1 ⁇ V1).
  • the “maximum person size” and the “maximum person position” are uniquely determined as (H1 ⁇ V1) and (x1, y1), respectively.
  • two person regions 703 and 704 are located at coordinates 705 (x2, y2) and 706 (x3, y3), respectively, and the sizes of the regions are (H2 ⁇ V2) and (H3 ⁇ ), respectively. It is an example of the scene which is V3).
  • FIG. 9B when the number of persons in the scene is two, information indicating the size (H2 ⁇ V2) of the area 703 having the larger area out of the two person areas (703, 704) is “ “Maximum person size” is defined, and information indicating the coordinates (x2, y2) of the area 703 is defined as “maximum person position”.
  • FIG. 9C is an example of a scene including four person areas 707, 708, 709, and 710. In the example of FIG. 9C, it is assumed that the person area 707 has the largest area among the four person areas. In this case, information indicating the size (H4 ⁇ V4) of the area 707 is defined as “maximum person size”, and information indicating the coordinates (x4, y4) of the area 707 is defined as “maximum person position”.
  • FIG. 10 shows an example of scene information corresponding to the example shown in FIG.
  • the scene information 800 describes information about the scene in units of lines, and each line 801, 802, 803,... Is configured to correspond to one scene.
  • the information described in each line is the image file name, shooting date, shooting time, scene start frame number, scene end frame number, number of people, maximum person size, maximum person position, motion information, conversation information in order from the left. Is shown.
  • the number of persons, the maximum person size, and the maximum person position in the scene information 800 will be described.
  • the symbols of each scene information are also described as meaning the scenes themselves.
  • Scene information of the scene 801 is an example corresponding to the example of FIG.
  • the scene information of the scene 802 is an example corresponding to a scene having one person area, as in the example of FIG.
  • the scene information of the scene 803 is an example corresponding to the example of FIG.
  • the scene information of the scene 804 is an example corresponding to the example of FIG. In FIG.
  • the scene information of the scene 805 is an example corresponding to a scene having five person areas.
  • the scene information of the scene 806 is an example corresponding to a scene where the number of persons is zero, that is, no person is detected in the image.
  • the number of persons is zero, there is no scene information corresponding to the maximum person size and the maximum person position.
  • these non-existing information is represented by the symbol “*”.
  • “maximum person size” is represented by the number of pixels in the horizontal and vertical directions of the rectangular area corresponding to the person area
  • “maximum person position” is the upper left pixel in the image as the origin. In the above description, it is expressed by the coordinates of the upper left pixel of the rectangular area.
  • the shape of the region corresponding to the face image in the person region may be a circle instead of a rectangle, and in this case, the “maximum person size” may be expressed by the number of pixels corresponding to the diameter of the circle.
  • the coordinates corresponding to the “maximum person position” may be the coordinates of the pixel at the center of the area instead of the upper left of the area.
  • the scene information generation unit 102a generates scene information including the person information (number of persons, maximum person size, maximum person position) described above, and outputs the generated scene information to the digest moving image generation unit 103a.
  • the digest moving image generation unit 103a reads the scene information generated by the scene information generation unit 102a, and digests the image data group classified by the image data classification unit 101 or the image data group selected by the event selection unit 104 as a target. Generate a moving image.
  • FIG. 8B shows an internal configuration of the digest moving image generating unit 103a in the present embodiment.
  • the digest moving image generation unit 103a includes a target image extraction unit 1031, a scene type determination unit 1032a, a scene space arrangement unit 1033a, a scene time arrangement unit 1034, and a digest control unit 1035.
  • the differences from the first embodiment will be mainly described.
  • the target image extraction unit 1031 refers to the selection information indicating the target image data group notified from the event selection unit 104, and extracts an input image when generating a digest moving image. Information indicating the extracted image data is notified to the scene type determination unit 1032a and the scene space arrangement unit 1033a.
  • the scene type determination unit 1032a refers to the scene information generated by the scene information generation unit 102a, reads the scene information of the scene corresponding to the information indicating the image data extracted by the target image extraction unit 1031, and determines the scene type. decide.
  • the scene type determination unit 1032a refers to the scene information 800, compares the shooting times of two scenes that are consecutive in the order of the shooting times, and determines whether the difference ⁇ T between the two shooting times is within the scene proximity determination threshold THt or exceeds it. That is, depending on whether or not they are close in time, each scene is determined as a “single scene” used alone or a “combined scene” used in combination.
  • the scene proximity determination threshold THt 300 seconds
  • the scenes 801 and 802 are determined as “combined scenes”.
  • the scenes 803 and 804 are determined as “combination scenes” because they are close in time
  • the scenes 805 and 806 are also determined as “combination scenes” because they are close in time.
  • the scene type determination unit 1032a refers to person information (number of persons, maximum person size, maximum person position), motion information, and conversation information included in the scene information for each scene determined as a combination scene, If it is determined that there is a scene, the main scene is classified.
  • the scene type determination unit 1032a determines that a scene having a smaller (number of people) in the scene (but not zero) is the main in the combination scene, and a scene having a relatively larger number of persons. Is determined not to be major. A scene with a “number of persons” of zero is determined to be less important than a scene with a non-zero “number of persons”. If the “number of persons” of the combination scenes is the same, it is determined that both are main, and both are classified as main scenes.
  • the scene space arrangement unit 1033a determines the spatial arrangement of each scene, generates an image clip in which the scene is arranged spatially, and outputs the image clip to the scene time arrangement unit 1034.
  • the scene space arrangement unit 1033a determines the spatial arrangement (layout) of each scene based on the scene type determined by the scene type determination unit 1032a and the scene information relationship between the combination scenes.
  • the method for determining the layout of the scene in the scene space arrangement unit 1033a is basically the same as the method in the scene space arrangement unit 1033 described above, but in the scene space arrangement unit 1033a, the “person” included in the scene information is displayed.
  • the “number” is used for layout determination in association with the “person information” used as the layout determination reference in the scene space arrangement unit 1033.
  • the “number of persons” included in the scene information is 1 or 2, it is handled in the same way as a scene whose “person information” is “main person (1)”. Further, when the “number of persons” is 3 or more, it is handled in the same manner as a scene in which “person information” is “other person (2)”. Further, when the “number of persons” is 0, it is handled in the same way as a scene where the “person information” is “no person (0)”.
  • the difference from the scene space arrangement unit 1033 is that the control of the arrangement position of the scene according to the “maximum person position” indicated by the scene information and the “maximum person size” And effect control according to the “number of persons”.
  • FIG. 11 to FIG. 13 show processing examples related to scene arrangement position control and effect control by the scene space arrangement unit 1033a.
  • a scene 901 and a scene 902 in FIG. 11 correspond to the scenes 801 and 802 in FIG. Since the scenes 801 and 802 are combined scenes as described above, and both are main scenes, the scene space layout unit 1033a displays the layouts of both scenes in parallel with the scenes 901 and 902 having the same size. The arrangement is determined to be “parallel arrangement” (FIG. 11C).
  • the scene space arrangement unit 1033a determines the areas (areas 921 and 922) each including the area (area 911 and 912) indicated by the “maximum person position” included in the scene information of each scene near the center. These areas 921 and 922 are cut out from the images of the scenes 901 and 902, respectively, and are arranged in areas 931 and 932 in the output image 930.
  • scenes 903 and 904 in FIG. 12 correspond to scenes 803 and 804 in FIG.
  • Scenes 803 and 804 are combined scenes as described above, and are a main scene and a sub-scene, respectively.
  • the layout is determined to be “center arrangement”, which is an arrangement in which the main scene 903 is superimposed and displayed on the area 941 in the center portion of the screen while the sub-scene 904 is displayed over the entire area in the output image 940 (FIG. 12C). )).
  • the scene space arrangement unit 1033a determines an area 923 that includes the area (area 913) indicated by the “maximum person position” included in the scene information of the main scene 903, and this area 923 is determined as the area 923. It is cut out from the image of the scene 903 and arranged in an area 941 in the output image 940.
  • scenes 905 and 906 in FIG. 13 correspond to scenes 805 and 806 in FIG.
  • the scene 805 and the scene 806 are combined scenes, and are a main scene and a sub-scene, respectively.
  • the scene information is treated in the same way as the scene whose person information is “other person (2)”.
  • the layout is changed to “sub-screen arrangement” which is an arrangement in which an image obtained by reducing the sub-scene 906 is superimposed on the main scene as a sub-screen area 952. It decides (FIG.13 (c)).
  • the scene space arrangement unit 1033a causes the region (region 915) indicated by the “maximum person position” included in the scene information of the main scene 805 to be output in the image 950 so that it is not hidden by the sub-scene to be superimposed.
  • the position of the sub-screen area 952 to be superimposed on is determined.
  • the scene space arrangement unit 1033a selects a position that is farthest from the position of the area (area 915) indicated by the “maximum person position” as one of the four corners in the screen as the position of the sub-screen area.
  • the position of the small screen area 952 is determined. Note that the position of the sub-screen area 952 is not limited to the four corners in the screen as long as it does not overlap with the “maximum person position” included in the scene information of the main scene.
  • the scene space placement unit 1033a determines the layout of a plurality of scenes, so that a large subject (for example, a person area that is likely to be noticed) in the main scene becomes a boundary with another scene placed in the same screen. Therefore, it is possible to avoid cases that do not fit within the screen or are hidden in other scenes that are not main, and as a result, it is possible to generate a digest moving image that is easy to watch.
  • a large subject for example, a person area that is likely to be noticed
  • the scene space arrangement unit 1033a may further apply a spatial filter to a part of the scenes to make an image that emphasizes the difference between the main scene and the sub-scene. For example, by applying a smoothing filter to a region 942 in an image 940 “centered” as shown in FIG. 12, the sharpness of the image between the central region 941 displaying the main scene and the peripheral region 942 displaying the sub-scene. To make the areas of interest clearer. At this time, the scene space arrangement unit 1033a controls the strength of the smoothing filter in accordance with the “maximum person size” included in the scene information.
  • the scene space arrangement unit 1033a uses three types of parameters ⁇ , ⁇ , and ⁇ in order of increasing smoothness as the parameter Ff that controls the smoothness of the smoothing filter. At that time, control is performed such that the parameter ⁇ is selected when the HS ratio is small, and the parameter ⁇ is selected when the HS ratio is large.
  • HSratio small
  • area 942 the smoothing applied to the sub-scene display area
  • the difference in sharpness between the main scene and the sub-scene is reduced by reducing the smoothing applied to the sub-scene display area (area 942).
  • smoothing may not be applied (when HS Ratio> r3 in FIG. 14A).
  • the purpose of differentiating the sharpness of the main scene and the sub-scene by the smoothing filter is mainly to increase the degree of attention to the main scene, but if the difference in sharpness is too large, the digest video will be viewed. In this case, it becomes difficult to determine what the sub-scene is copied, and the effect of spatially arranging a plurality of scenes is halved.
  • the smoothing filter is weakened so that the sharpness is not significantly different, or the smoothing filter itself is not applied.
  • the display (appearance) variations will be further increased and the digest video will be viewed.
  • FIG. 14B shows another example relating to the control of the smoothing filter strength Ff by the scene space arrangement unit 1033a.
  • FIG. 14B is a graph showing an example of the relationship between HNsub and Ff when the smoothing filter strength Ff is determined by the “number of persons” HNsub included in the scene information of the sub-scene.
  • the scene space arrangement unit 1033a selects a parameter ⁇ having a high degree of smoothing when HNsub is small, and selects a parameter ⁇ having a low degree of smoothing when HSsub is large.
  • HNsub When HNsub is 0, it may be controlled not to perform smoothing itself (when 0 ⁇ HNsub ⁇ n1 in FIG. 14B).
  • the smoothing intensity can be easily determined from only the scene information of the sub-scene without referring to the scene information of the main scene. Since the target of smoothing is a sub-scene, if the strength of the smoothing filter is controlled according to the scene information (number of persons) of the sub-scene, the sub-scene image is effectively used while increasing the attention of the main scene. A digest moving image can be generated. Further, the smoothing filter strength Ff may be controlled so as to satisfy both the relationship shown in FIG. 14A and the relationship shown in FIG. For example, Ff can select not only three types of ⁇ , ⁇ , and ⁇ , but also a large number of coefficients.
  • the smoothing filter strength Ff described above may be, for example, a parameter indicating the number of thinned pixels when performing simple pixel thinning as a smoothing filter.
  • the smoothing filter strength Ff may be, for example, a parameter indicating a window size corresponding to a pixel range to which a filter is applied when a moving average filter is used as the smoothing filter.
  • the smoothing filter strength Ff may be a parameter indicating a predetermined coefficient set according to the smoothing filter method used by the scene space arrangement unit 1033a, such as a Gaussian filter or a weighting filter. .
  • the spatial filter applied by the scene space arrangement unit 1033a is not limited to the smoothing filter, and may be a color conversion filter that changes the color tone of each region.
  • the scene space arrangement unit 1033a may change the saturation instead of smoothing the sub-scene image.
  • the saturation of the pixels is changed so as to be proportional to the HSratio or HNsub described above.
  • a characteristic indicating the relationship between HSratio and saturation S is defined in the range from 0 to Smax in saturation S, and the sub-scene region 942 is adjusted to match that characteristic. Convert pixel values.
  • Smax means the maximum saturation in the target sub-scene before the pixel value is converted.
  • the saturation of the sub-scene may not be changed (when HSratio> r4 in FIG. 14C).
  • the saturation of the sub-scene may be converted based on the same characteristic indicating the relationship between HNsub and saturation S instead of the characteristic indicating the relationship between HSratio and saturation S as shown in FIG. Good.
  • the sub-scene region (for example, the region 942) is converted by changing the pixel value so as to lower the saturation of the region where the sub-scene is arranged according to the relationship between the scene information of the main scene and the sub-scene.
  • An area of the main scene (for example, the area 941) arranged on the same screen can be made conspicuous as it approaches a gray scale image or becomes a gray scale image.
  • the scene space arrangement unit 1033a does not apply a spatial filter, but makes the change in the time direction of the image in the sub-scene region 942 zero, that is, makes a still image, so that the difference from the main scene region 941 is different. May be emphasized.
  • the video editing apparatus 100a in generating a digest moving image in which a plurality of scenes are combined and spatially arranged, an image that is easily noticed, such as a human region in a main scene. It is possible to provide a digest moving image in which an area is easily seen. In addition, depending on the difference in characteristics between major and non-major scenes spatially arranged on the same screen, sharpness, color, etc. While increasing the degree of attention, the number of display (viewing) variations when spatially arranging multiple scenes in various layouts is further increased, and it is easier to see and less tired of the user when watching a digest video. A moving image can be provided.
  • a person a face image or a whole body image
  • Information that is detected as a region and indicates the “number of regions”, “maximum region size”, and “maximum region position” corresponding to the feature regions may be included in the scene information instead of the person information.
  • the video editing apparatus according to the fourth embodiment is different from the video editing apparatus according to the first embodiment in the target image extraction unit, the scene space arrangement unit, and the scene time arrangement unit included in the digest moving image generation unit.
  • the point is a feature.
  • the video editing apparatus 100b is configured to include a digest moving image generation unit 103b, and the digest moving image generation unit 103b includes a target image extraction unit 1031b, a scene space arrangement unit 1033b, and a scene time arrangement unit. 1034b.
  • FIG. 17 shows an internal configuration of the video editing apparatus 100b and the digest moving image generating unit 103b according to the present embodiment.
  • the digest moving image generation unit 103b reads the scene information generated by the scene information generation unit 102, and uses the image data group classified by the image data classification unit 101 or the image data group selected by the event selection unit 104 as a target. Generate a moving image.
  • the digest moving image generation unit 103b includes a target image extraction unit 1031b, a scene type determination unit 1032, a scene space arrangement unit 1033b, a scene time arrangement unit 1034b, and a digest control unit 1035.
  • the differences from the first embodiment will be mainly described.
  • the target image extraction unit 1031b refers to the selection information indicating the target image data group notified from the event selection unit 104, and extracts the input image when generating the digest moving image.
  • the target image extraction unit 1031b notifies the scene type determination unit 1032 and the scene space arrangement unit 1033b of information indicating the extracted image data.
  • the target image extraction unit 1031b extracts the name of the image data group, the image data name, and the shooting date / time of the image data from the image data group identification information, and notifies the scene space arrangement unit 1033b of the extracted image data.
  • the scene space arrangement unit 1033b determines the spatial arrangement of each scene and generates an image clip in which the scene is arranged spatially in the same manner as the scene space arrangement unit 1033 described with respect to the first embodiment.
  • the scene space arrangement unit 1033b further has a function of superimposing a text image indicating image information when generating an image clip and a function of generating a title image as an additional image clip. This is a difference from the form.
  • FIG. 18 shows an example of an image clip generated by the scene space arrangement unit 1033b.
  • FIG. 18A shows an example of a title screen generated by the scene space layout unit 1033b.
  • the title screen 1000 is, for example, an image in which white text 1002 is superimposed on a black background 1001 and is a still image of about 5 seconds, for example.
  • the scene space arrangement unit 1033b generates the title screen 1000 by superimposing the text 1002 indicating the name of the image data group notified via the target image extraction unit 1031b on the separately generated background image 1001.
  • FIG. 18B is an example of an image clip including text information indicating image information for each scene, which is generated by the scene space arranging unit 1033b.
  • the image clip 1003 is an image clip in which the scene 1004 and the scene 1005 are spatially arranged (corresponding to the image 930 in FIG. 11C), and text indicating the shooting date / time information is superimposed on the scenes 1004 and 1005. It is an image.
  • the scene space arrangement unit 1033b captures images corresponding to the image data of the scene 1004 and the scene 1005 (DSC_2001.mov and DSC_2002.mov in FIG. 15) included in the image data group identification information notified via the target image extraction unit 1031b.
  • An image clip 1003 is generated by superimposing text (1006, 1007) indicating date and time information on each scene.
  • the scene time arrangement unit 1034b combines the image clips generated by the scene space arrangement unit 1033b in the time direction in the same manner as the scene time arrangement unit 1034 described with reference to the first embodiment. At that time, the scene time arrangement unit 1034b combines the image clips in the time direction so that the image clip of the title screen generated by the scene space arrangement unit 1033b is positioned at the head in time.
  • whether or not to superimpose text indicating shooting date / time information on each scene may be determined in advance according to the user's selection.
  • the digest control unit 1035 notifies the scene space layout unit 1033b whether or not to superimpose text according to the user's selection, and the scene space layout unit 1033b sends the text indicating the shooting date / time information to each scene according to the notification. Toggle control of whether or not to superimpose.
  • the scene space layout unit 1033b sends the text indicating the shooting date / time information to each scene according to the notification.
  • the scene space layout unit 1033b sends the text indicating the shooting date / time information to each scene according to the notification.
  • Toggle control of whether or not to superimpose when superimposing the text indicating the shooting date / time information on the image clip, for example, only the shooting date / time information of the main scene may be superimposed instead of superimposing the text for every scene.
  • the video editing apparatus allows a user to confirm and view a large number and a large number of still images and moving images in a short time without trouble.
  • FIG. 23 is a schematic diagram showing the configuration of a video editing apparatus according to the fifth embodiment of the present invention.
  • the video editing apparatus 100c includes a target image data extraction unit 109, a scene information generation unit 102, a reproduction time candidate derivation unit 110, a reproduction time candidate display unit 111, and a digest moving image generation unit 103c.
  • the video editing apparatus 100c may further include a data recording unit that stores image data, a video display unit that displays images, or a data recording device that has the same functions as those described above.
  • the video display device may be configured to be connectable to the outside.
  • the target image data extraction unit 109 extracts image data that meets a predetermined condition based on the metadata included in the image data.
  • the extracted image data is collected as an image data group.
  • the image data shot on the previous day that is, the image data whose shooting date is the previous day of the editing date is determined as the editing target.
  • the image data with the shooting date and time before and after the designated date and time may be determined as the editing target.
  • the image data determined by the target image data extraction unit 109 as the editing target may be based not only on the date / time information but also on position information and creator information. For example, image data having position information specified by the user or position information within a predetermined range including the position may be determined as an editing target.
  • the target image data extraction unit 109 may use a day change as a trigger as the timing for determining image data to be edited. For example, when midnight has passed, image data captured on the previous day may be determined as an editing target.
  • the target image data extraction unit 109 outputs the image data group to the digest moving image generation unit 103c.
  • the target image data extraction unit 109 calculates the total reproduction time by summing the reproduction times of all the extracted image data.
  • the target image data extraction unit 109 outputs the total reproduction time to the reproduction time candidate derivation unit 110.
  • the reproduction time candidate derivation unit 110 derives digest video reproduction time candidates based on the total reproduction time input from the target image data extraction unit 109.
  • the square root of the total playback time is calculated as a playback time candidate.
  • the square root of the total playback time in the unit of “minute” is calculated, and a value obtained by rounding down the decimal point is set as a playback time candidate. For example, when the total playback time is 1 hour, 7 which is a value obtained by rounding down the decimal point from the square root of 60 is a playback time candidate.
  • the reproduction time candidate derivation unit 110 outputs the derived reproduction time candidate to the reproduction time candidate display unit 111.
  • the playback time candidate display unit 111 displays the playback time candidates input from the playback time candidate derivation unit 110 on a display device (not shown). It is assumed that the display device includes user input means such as a touch panel and a mouse.
  • the reproduction time candidate display unit 111 receives a user event via the input means, and sets the reproduction time candidate selected by the user event as a designated time.
  • the reproduction time candidate display unit 111 outputs the designated time to the digest moving image generation unit 103c.
  • FIG. 24 is an example of a user interface for designating the playback time of the digest moving image in the video editing apparatus 100c of the present embodiment.
  • the user can select a desired reproduction time by sliding the button 32 of the bar 31 displayed on the lower side of the “digest moving image reproduction time” display to the left and right.
  • Below the bar 31 the minimum value and the maximum value of the playback time that can be specified are displayed.
  • the minimum value is 1 minute
  • the maximum value is 7 minutes, which is a reproduction time candidate.
  • the button 32 is slid in the middle of the bar 31.
  • the specified time is 4 minutes which is an intermediate value between 1 minute and 7 minutes.
  • the playback time may be selected from a pull-down menu, or a numerical value may be input. .
  • FIG. 25 is a conceptual diagram showing a digest moving image generation process by the video editing apparatus 100c of the present embodiment.
  • the video editing apparatus 100c reads the corresponding scene information 303 for the image data group 302, which is a set of selected image data, from the image data 301, and from the reproduction time candidate display unit 111.
  • a digest moving image is generated according to the input specified time.
  • a group of image data 302 for which a digest moving image is to be generated is, for example, all image data photographed on a certain day. This image data group is determined by the target image data extraction unit 109.
  • the image data group 302 is classified into one or more scenes by the scene information generation unit 102, and scene information, which is information indicating the feature of each scene, is generated.
  • scene information which is information indicating the feature of each scene
  • the digest moving image generation unit 103c refers to the scene information in the order of shooting date and shooting time, and determines the type of scene such as a scene to be used alone and a scene to be used in combination with another scene. To do. Based on the determined scene type, the digest moving image generation unit 103c generates image clips 306a, 306b, 306c,. Are combined to generate a digest moving image 307.
  • the image clips 306a, 306b, 306c, and the like are moving images including at least one scene, but may include still images.
  • the digest moving image generating unit 103c adjusts the digest moving image so that the reproduction time of the generated digest moving image becomes the specified time.
  • “the playback time becomes the specified time” may mean that the playback time matches the specified time, or means that there is a slight difference between the playback time and the specified time. Also good.
  • the digest moving image 50A is composed of image clips 51 to 57, and the specified time passes during the reproduction of the last image clip 57 of the digest moving image 50A. However, it may be considered that the playback time has reached the specified time.
  • the digest moving image 50B is composed of image clips 51 to 56, and the playback time of the digest moving image 50B is shorter than the specified time, but one more image clip,
  • the image clip 57 when the image clip 57 is combined, even if the playback time of the digest moving image 50B becomes longer than the specified time, it may be considered that the playback time has reached the specified time.
  • the playback time may be regarded as the specified time.
  • a specific numerical value such as 30 seconds or 1 minute may be used, or a ratio with respect to the designated time, for example, 1% of the designated time may be used.
  • FIG. 27 shows the internal configuration of the digest moving image generating unit 103c in the present embodiment.
  • the digest moving image generating unit 103c includes a scene type determining unit 1032, a scene space arranging unit 1033, a scene time arranging unit 1034, and a digest moving image editing unit 1036.
  • the processing contents of the scene type determination unit 1032, the scene space arrangement unit 1033, and the scene time arrangement unit 1034 are the same as those in the first embodiment.
  • the digest moving image editing unit 1036 adjusts the reproduction time of the digest moving image by editing the digest moving image output from the scene time arranging unit 1034.
  • the digest moving image editing unit 1036 outputs the input digest moving image as it is when the reproduction time of the digest moving image is the designated time.
  • the digest moving image editing unit 1036 edits the digest moving image so that the reproduction time of the digest moving image becomes the specified time when the reproduction time of the digest moving image is not the specified time.
  • the digest video editing unit 1036 shortens each image clip included in the digest video.
  • the digest moving image editing unit 1036 adjusts the reproduction time of the digest moving image by shortening the reproduction time of the image clip having no motion.
  • the motion information of all the scenes included in the image clip is “no motion (0)”.
  • the image clip 60A is an image clip composed of only a scene whose motion information is “no motion (0)”.
  • Frames 61 to 66 are frames constituting the image clip 60 ⁇ / b> A, and are arranged in chronological order from the frame 61 to the frame 66.
  • the digest moving image editing unit 1036 displays one frame for every two frames in the image clip 60A, and in the case of FIG. 28, the frame 62, the frame 64, the frame 66.
  • the number of frames is half that of the image clip 60A.
  • the image clip 60B is an image clip in which the playback speed of the image clip 60A is doubled.
  • the digest moving image editing unit 1036 repeats the above processing until the digest moving image reproduction time reaches the specified time.
  • the digest moving image editing unit 1036 performs the above processing up to the last image clip of the digest moving image, and if the digest moving image playback time is not the specified time, the motion of all scenes included in the image clip
  • the playback time of the digest moving image is adjusted by cutting out a part of the image clip whose information is not “no motion (0)”. More specifically, assuming that the digest video playback time is Td and the specified time is Ts, the digest video editing unit 1036 has a part of the image clip so that the playback time Ti of each image clip is Ts / Td times. Cut out. For example, a time corresponding to 1 ⁇ (Ts / Td) times of the reproduction time Ti of the image clip is cut from the head of the image clip. In addition to the beginning, the part to be cut may be the last part of the image clip, or may be cut from both the beginning and the end.
  • the digest moving image editing unit 1036 adjusts the playback time of the digest moving image by increasing the playback time of the image clip having no motion. Specifically, referring to the motion information of the scene information of the scene included in the image clip in order from the beginning of the digest moving image, the motion information of all the scenes included in the image clip is “no motion (0)”. Extends the playback time by interpolating the frames of the image clip. For example, by interpolating one frame between each frame, the reproduction time of the image clip is doubled, that is, the reproduction speed is halved.
  • FIG. 29 is a conceptual diagram for explaining processing for extending the playback time of an image clip in the digest moving image editing unit 1036.
  • the image clip 70A is an image clip composed of only a scene whose motion information is “no motion (0)”.
  • Frames 71, 74, and 77 are frames constituting the image clip 70A, and are arranged in time series in the order of the frames 71, 74, and 77.
  • the digest moving image editing unit 36 When the digest moving image playback time is shorter than the specified time, the digest moving image editing unit 36 first sets two frames between the frames in the image clip 70A, and in the case of FIG. 29, the frame 72, the frame 73, and the frame 75. The frames 76... Are interpolated by frame interpolation to obtain an image clip 70B having three times as many frames as the image clip 70A. Next, the digest moving image editing unit 1036 deletes one frame out of two frames in the image clip 70B, and in the case of FIG. 29, deletes the frame 72, the frame 74, the frame 76,. Thus, the image clip 70C has half the number of frames.
  • the image clip 70C Since the number of frames of the image clip 70C is 3/2 times that of the image clip 70A and the frame rate at the time of display is the same as that of the image clip 70A, the image clip 70C is an image obtained by doubling the playback speed of the image clip 70A. It becomes a clip.
  • the specific method of frame interpolation is not particularly limited, for example, linear interpolation or a method of estimating motion between frames and interpolating based on the motion may be used.
  • the digest moving image editing unit 1036 repeats the above processing until the digest moving image reproduction time reaches the specified time.
  • the digest moving image editing unit 1036 performs the above processing up to the last image clip of the digest moving image, and when the digest moving image playback time is not the specified time, from the image clip included in the digest moving image, Randomly selected image clips are combined at the end of the digest video. However, when the randomly selected image clip is the same as the last image clip of the digest moving image so that the same image clip is not continuously reproduced, the combination of the selected image clips may be skipped.
  • the digest moving image editing unit 1036 repeats the above processing until the digest moving image reproduction time reaches the specified time.
  • a method of using a video effect such as a transition when switching image clips. While the video effect is being played back, the playback of the image clip is stopped, so that the digest video playback time can be extended.
  • the digest moving image editing unit 1036 extends the digest moving image reproduction time Td by inserting video effects in order from the location where the difference in shooting time between image clips is large.
  • the digest moving image editing unit 1036 repeats the above processing until the digest moving image playback time reaches the specified time or video effects are inserted between all the image clips. If the digest video playback time does not reach the specified time even if the video effect is inserted between all the image clips, the above-described method of combining the randomly selected image clip at the end of the digest video is used.
  • the specific video effect to be inserted is not particularly limited, but for example, a crossfade, dissolve, or wedge-shaped wipe that is one type of transition may be used.
  • the video editing apparatus according to the sixth embodiment is different from the video editing apparatus according to the fifth embodiment in that a digest moving image generating unit 103d is provided instead of the digest moving image generating unit 103c. is there.
  • the video editing apparatus provides an adjustment method in which the playback time of the digest moving image becomes the specified time without cutting the image clip.
  • FIG. 30 shows an internal configuration of the digest moving image generating unit 103d in the present embodiment.
  • the digest moving image generation unit 103d includes a scene type determination unit 1032d, a scene space arrangement unit 1033, and a scene time arrangement unit 1034.
  • the processing contents of the scene space arrangement unit 1033 and the scene time arrangement unit 1034 are the same as those in the first embodiment.
  • Scene type determination unit 1032d determines the type of scene based on the scene information and threshold value THt in the same manner as scene type determination unit 1032.
  • the scene type determination unit 1032d calculates the digest video playback time Td from the scene information and the scene type.
  • the initial value of the playback time Td is 0, and for a scene whose scene type is “single scene”, the playback time of that scene is added to the playback time Td.
  • the playback time of the scene with the shortest playback time among the scenes is added to the playback time Td.
  • the scene type determination unit 1032d adjusts the digest video playback time Td to be the specified time when the calculated digest video playback time Td is not the specified time.
  • the scene type determination unit 1032d changes the threshold THt so that “multiple scenes” are more easily selected than “single scenes” when the digest video playback time Td is longer than the specified time.
  • the threshold value THt is changed from 5 minutes to 10 minutes.
  • the scene type determination unit 1032d changes the threshold value THt so that “single scene” is more easily selected than “multiple scenes”.
  • the threshold value THt is changed from 5 minutes to 3 minutes. By changing the threshold value, the ratio of the “single scene” included in the digest moving image increases, so that the reproduction time of the digest moving image can be adjusted to be longer.
  • the scene type determination unit 1032d determines the scene type based on the changed threshold value THt, and calculates the digest moving image playback time Td again.
  • the scene type determination unit 1032d repeats the above processing until the digest moving image playback time Td reaches the specified time.
  • FIG. 31 is a schematic diagram showing the configuration of the video editing apparatus according to the first embodiment of the present invention.
  • the video editing apparatus 100e includes an image data classification unit 101, a scene information generation unit 102, a digest moving image generation unit 103, an event selection unit 104, an output control unit 105, a video display unit 106, a digest moving image editing control unit 107, and an operation.
  • the unit 108 is configured to be included.
  • the video editing apparatus 100e may include a data recording unit that stores image data inside, or may be configured to connect a data recording apparatus having the same function as the outside to the outside. .
  • the basic processing contents of the image data classification unit 101, the scene information generation unit 102, the digest moving image generation unit 103, the event selection unit 104, and the output control unit 105 are the same as those in the first embodiment.
  • the video display unit 106 outputs a video including a digest moving image generated by the video editing apparatus 100e and a user interface (UI) used for operation to a display device.
  • the display device is built in the video editing apparatus 100e or connected to the outside.
  • the digest moving image editing control unit 107 reproduces the digest moving image generated by the digest moving image generating unit 103, and outputs it to the video display unit 106 while synchronizing the image and sound and adjusting the frame rate. In parallel with this, video editing processing is performed based on the input from the user.
  • the digest moving image to be reproduced may be image data once stored in the recording medium by the digest moving image generation unit 103 or may be image data directly input from the digest moving image generation unit 103. Further, it may be image data in which a digest moving image generated by another video editing device equivalent to the video editing device 100e is stored in a recording medium.
  • the digest moving image is converted into display data that can be used by the video display unit 106.
  • the digest moving image is compressed by an encoding method such as HEVC or AAC
  • the digest moving image editing control unit 107 decodes the image data and outputs the decoded image data to the video display unit 106.
  • digest video data is stored in a format that requires video generation processing, including video layout, transformation, segmentation, and overlay processing during playback, digest video generation The generation processing is performed by controlling the unit 103 to acquire and reproduce the video.
  • the digest moving image editing control unit 107 can perform reproduction control including pause / fast forward / rewind / movement between scenes in reproduction.
  • the operation unit 108 detects an input operation from the user including a position designation on the display screen by using, for example, a touch sensor integrated with the video display unit 106, a mouse or a keyboard connected to the outside.
  • a touch sensor integrated with the video display unit 106
  • the user can input by a general operation such as tap, flick, or pinch.
  • buttons and keys for recording and playback control may be provided.
  • the output control unit 105 is based on a condition including image display specifications of the video display unit 106, audio output specifications of an audio output device (not shown), and information indicating user preferences for the digest moving image. Set the generation policy.
  • Information indicating user preferences is received by the operation unit 108 or other input means. For example, there is a method of displaying options such as “person main” and “landscape main” on the video display unit 106 and allowing the user to select, but the present invention is not limited to this. Further, when information indicating user preferences is not input, for example, “person main” may be set as a standard value.
  • the output control unit 105 sets, as “multiple scene simultaneous arrangement”, information indicating whether or not multiple scenes are allowed to be simultaneously arranged in the same image frame in accordance with the video display device of the output destination. For example, when the resolution or screen size of the display device used by the video display unit 106 is smaller than a certain threshold, the simultaneous placement of multiple scenes is set to “No”, and when the resolution is larger, the simultaneous placement of multiple scenes is set to “Yes”. .
  • the digest moving image editing control unit 107 plays back and edits the digest moving image generated as described above. This may be started by an instruction from the user, or may be started when the generation of the digest moving image is completed.
  • the digest moving image editing control unit 107 reproduces the digest moving image, and receives an input from the user and further edits the digest moving image being reproduced.
  • FIG. 32 is a diagram showing an editing process when a digest moving image is reproduced. Although the reproduction process itself and the reproduction control process such as fast forward / rewind are not shown, they are executed in parallel with the editing process. Hereinafter, steps S101 to S104 in the figure will be described.
  • step S101 the digest moving image editing control unit 107 is used to start reproduction of the digest moving image. Furthermore, a process for interpreting and executing an input from the operation unit 108 as an editing process is started.
  • Step S102 is a check of whether or not a moving image is being reproduced. If it is detected that the reproduction of the moving image is finished or interrupted, the editing process is finished.
  • Step S103 is an input operation check. It is checked whether an operation that can be interpreted as an instruction for editing processing is input. If the operation has not been input, the process returns to step S102.
  • step S102 and step S103 can be realized by a periodic or non-periodic interruption, and therefore does not necessarily have to be executed in the order shown in FIG. Further, before the check step, a waiting time for waiting for a change in reproduction state or occurrence of input may be inserted.
  • Step S104 is execution of an editing operation.
  • a process corresponding to the type of editing operation is executed with the scene being reproduced as the scene to be edited.
  • the reproduction of the digest moving image is temporarily stopped at the start of the editing process, and resumed using the edited data after the editing process is completed.
  • some editing operations in step S104 will be described in more detail.
  • the type of editing operation is distinguished by input from the operation unit.
  • the video display unit 106 includes a touch panel
  • direct designation of coordinates on the screen and editing operation by gestures can be realized.
  • the above-described distinction can be made by a pointing device such as a mouse.
  • an input device is not necessarily a touch panel, the example using the operation input by a touch panel in which the most intuitive operation for a user is possible is demonstrated here. Note that windows, icons, and other GUI components may be displayed on the screen for operations other than the editing operation.
  • touch operations there are various types of touch panel operations (hereinafter referred to as touch operations) generally seen. For example, tap (tap the screen with the fingertip), double tap (tap the screen twice with the fingertip), flick (move the fingertip to touch the screen and play quickly), swipe (with the fingertip touching the screen in a certain direction) ), Drag (move while the fingertip is in contact with the screen, not necessarily in a certain direction), pinch in (two or more fingertips touch the screen and move closer to close), pinch out (fingertip on the screen 2 or more in contact and release to open), twist or rotate (2 or more fingertips touch the screen and move to twist).
  • the functions may be distinguished depending on the number of fingers used for each operation and the position / shape / speed of the fingertip locus. The above is a description of a general touch operation, and not all of the touch operations are used in the editing operation of the video editing apparatus 100e, and other touch operations may be assigned to the editing operations described below.
  • FIG. 33 schematically shows an example of an operation performed on the digest moving image.
  • the thick frame indicates the entire image area of the digest moving image, and when there is a rectangular frame in the thick frame, it indicates that the main scene or the sub scene of the combination scene is displayed.
  • a dotted frame indicates a change caused by editing.
  • the arrow indicates the approximate trajectory and length of the touch operation. Further, the coordinates where the touch operation is started are called start point coordinates, and the coordinates where the touch operation is finished are called end point coordinates.
  • FIG. 33A shows a flick operation on the screen 81.
  • the flick operation is associated with scene deletion in the video editing apparatus 100e.
  • the scene to be edited is set as a scene to be deleted.
  • the digest moving image editing control unit 107 deletes the deletion target scene data from the digest moving image, or marks the deletion target scene not to be reproduced.
  • the playback is resumed from the scene next to the deleted scene.
  • playback is stopped.
  • it is preferable that a user can easily understand the visual effect that the scene to be deleted moves in the flicked direction when the deletion operation is accepted. In this way, scenes that the user feels unnecessary during playback can be easily deleted.
  • FIG. 33 (b) is an example in which a twist operation is performed in a combination scene arranged in parallel as shown in FIG. 5 (a).
  • the twist operation is associated with the change of the arrangement pattern in the video editing apparatus 100e.
  • FIG. 33B shows an arrangement pattern in which two element scenes 82 and 83 are present on the screen, but the two element scenes 82 and 83 can be switched left and right by a twist operation.
  • the digest moving image editing control unit 107 When the digest moving image is encoded, the digest moving image editing control unit 107 generates an editing target scene that has been changed to a new layout and encodes it again.
  • it is preferable to present a visual effect such that the element scenes 82 and 83 are switched when a twist operation is received. In this way, even when the user feels that the right and left arrangement of the element scene is unnatural during reproduction, it can be easily changed to a more preferable arrangement.
  • FIG. 33 (c) is an example in which a twist operation is performed in a combination scene obtained by dividing the screen into three equal parts as shown in FIG. 5 (f).
  • the video editing apparatus 100e selects and changes the spatial arrangement of the element scenes in order from possible combinations.
  • element scenes 84, 85, and 86 are arranged in this order from the left of the screen.
  • the digest moving image editing control unit 107 changes this arrangement to ⁇ A, B, C ⁇ ⁇ ⁇ A, C, B ⁇ ⁇ ⁇ B every time a twist operation is executed.
  • the twist operation in the example of FIGS. 33 (b) and 33 (c) may be performed anywhere on the screen, but in the twist operation performed near the boundary of the combination scene, only the element scene that touches the boundary is used. May be replaced. In this way, it is possible to quickly select a more preferable arrangement for the user even in a combination scene including three or more element scenes.
  • FIG. 33D shows an example of performing a pinch-out operation in a centrally arranged combination scene in which a reduced main scene 88 is arranged at the center of a sub-scene 87 arranged on the entire screen as shown in FIG. 5B. It is.
  • the pinch out operation is associated with an increase in size with respect to the element scene.
  • the enlargement ratio of the element scene is determined according to the distance between the start point coordinate and the end point coordinate of the pinch out operation, the minimum value is the size of the main scene 88 before the operation, and the maximum value is the size of the entire screen, that is, the sub scene 87. .
  • the digest moving image editing control unit 107 extracts the area of the main scene 88 from the editing target scene, expands it according to the expansion ratio, generates an image rearranged on the editing target scene, and re-encodes it. To do.
  • the position of the main scene 88 is maintained at the center so that the main scene 88 before editing is completely hidden by the enlarged main scene 88.
  • the contents and persons of the main scene 88 can be made more conspicuous.
  • FIG. 33 (e) shows a central arrangement similar to the example of FIG. 33 (d), but in a combination scene in which the main scene 89 is cut out and arranged at the center of the screen as shown in FIG. 5 (d).
  • This is an example of a pinch-out operation.
  • the upper limit of the scene enlargement ratio is determined as w0 / w1 using the digest moving image, that is, the horizontal pixel number of the sub-scene 87 (denoted w0) and the horizontal pixel number of the area of the main scene 89 (denoted w1). .
  • the digest moving image editing control unit 107 generates and encodes an image in which the main scene 89 enlarged and trimmed in this way is rearranged on the scene to be edited. As in the example of FIG. 33 (d), the content and person of the main scene 89 can be made more conspicuous by doing in this way.
  • 33 (b) to 33 (e) are based on the premise that the scene to be edited is a combination scene. If there is no information on whether or not the editing scene is a combination scene in the digest moving image being played back, or if the scene information related to the digest moving image cannot be acquired, a pixel value histogram generation for each frame, If contour extraction (particularly a straight line portion), motion detection for each region, or the like is used, it can be determined whether or not the scene to be edited is a combination scene.
  • FIG. 33 (f) shows an example of a drag operation having a complicated trajectory.
  • a drag operation is associated with a filter effect for an area near the locus in the video editing apparatus 100e.
  • the trajectory as in the example of FIG. 33F is determined as, for example, a trajectory in which the distribution of the coordinate values of the points forming the trajectory is wide without deviation in the horizontal direction, and the vertical distribution is biased between the maximum value and the minimum value.
  • the digest moving image editing control unit 107 sets an area including the start point and end point of the drag operation as a filter target area.
  • the digest moving image editing control unit 107 filters the pixels in the filter target region in all frames included in such an editing target scene, and updates the digest moving image. At this time, if necessary, the filter result is encoded again.
  • the filter used here has a function to make the target area inconspicuous, including a non-sharpening filter and simple filling with a predetermined pixel value, if the drag operation shows a trajectory that erases or disturbs a certain area. Things are desirable.
  • a drag operation of a trajectory that can be interpreted as meaning attention such as surrounding a certain region
  • a function that makes the target region stand out including a sharpening filter and a filter that increases luminance.
  • the filter target area needs to be changed for each frame when the object or the camera is moving. For this reason, it is desirable that the digest moving image editing control unit 107 performs a motion detection process of the filter target region for the editing target scene and adjusts the position, shape, and size of the filter target region.
  • a plurality of filters can be automatically switched even with a similar target filter.
  • an unsharpening filter is applied in the case of a trajectory that reciprocates in the vertical direction as shown in FIG. 33 (f)
  • a fill filter is applied in the case of a trajectory that reciprocates in the horizontal direction as in FIG. it can.
  • FIG. 33 (h) is an example in which a button 90 indicating the start of video shooting is displayed on the screen, and a scene can be added.
  • the digest moving image editing control unit 107 stops the reproduction of the digest moving image and displays the display if there is a camera (not shown) built in the video editing apparatus 100e or connected to the outside. Switch the screen to the input image from and start shooting. Shooting is terminated by the user's operation, and the shot image data V is stored in a recording medium.
  • the digest moving image editing control unit uses the image data V to perform the same processing as the already described digest moving image generation. However, only the image data V is input.
  • the digest moving image of the image data V to be output is added to the end of the digest moving image that is already being reproduced, and is stored as a new digest moving image. More simply, the captured image data V may be added to the end of the digest moving image as it is.
  • a digest moving image is generated, and the configuration of the digest moving image can be edited at the time of reproduction with a simple operation.
  • the video editing apparatus according to the eighth embodiment has the same configuration as that of the video editing apparatus according to the seventh embodiment, but includes information indicating the spatial and temporal arrangement of scenes in the digest video.
  • arrangement information information used for generation
  • input image data are stored in a recording medium or memory so that they can be used during reproduction.
  • the digest moving image itself may be in a format including input image data and the arrangement information.
  • the digest moving image may be data composed of one or more files including input image data and data corresponding to the arrangement information.
  • the video image intended by the digest moving image generation unit 103 can be generated by arranging the input image data with reference to the arrangement information.
  • information indicating the spatial arrangement of the scene refers to the index of the input image data corresponding to the element scene in each arrangement pattern as described in the seventh embodiment, and the vertical and horizontal directions of the element scene. It includes the size (number of pixels), the position (coordinates) on the screen, and the cutout position on the input image data.
  • indirect information for deriving these may be used, for example, an index for selecting an arrangement pattern / a predetermined size / a predetermined position.
  • the information indicating the temporal arrangement of scenes is information indicating where each scene corresponds on the time axis of the final digest moving image, and at least the start time and end time (or length) of each scene. )including. The time and length may be expressed using the number of frames.
  • the digest moving image generating unit 103 can store and reuse data used for generating a previous digest moving image including arrangement information in a recording medium or a memory. As a result, even when partially or completely the same digest moving image is generated again, the load can be reduced by avoiding re-execution of the same process.
  • FIG. 34A shows a case where the scene to be edited is a combination scene, and the sub-scene 92 in which the starting point coordinates of the flick operation are arranged so as to overlap the main scene 91.
  • the scene to be deleted is the sub-scene 92.
  • FIG. 34B shows the case where the scene displayed at the starting point coordinates is the main scene 91, and the deletion target scene is the main scene 91.
  • all the editing target scenes may be set as deletion target scenes.
  • the digest moving image editing control unit 107 deletes the editing target scene from the spatial and temporal arrangement information so that the editing target scene is not reproduced, and generates a digest moving image again.
  • deleting an element scene if an element scene that is not placed over another element scene is deleted, an area where no scene is displayed is generated on the screen of the digest moving image.
  • the screen may be rearranged as a single scene if there is only one element scene after deletion, and the screen may be subdivided if there are two or more element scenes after deletion. For example, when one is deleted from the parallel arrangement of three element scenes as shown in FIG. 5 (f), the remaining two element scenes may be rearranged using the parallel arrangement of FIG. 5 (b).
  • 34 (c) to 34 (e) show examples of the drag operation using the sub-scene 92 of the combination scene as the starting point coordinates.
  • FIG. 34 (c) shows a case where dragging continues to either the left or right screen edge.
  • the digest moving image editing control unit 107 changes the sub-scene 92 to a single scene and deletes the sub-scene 92 from the editing target scene.
  • a newly generated single scene corresponding to the original sub-scene 92 is inserted immediately before the scene to be edited if the end point of the drag is the left screen edge, and immediately after the scene to be edited if the right screen edge.
  • FIG. 35 shows changes in the digest video before and after editing when the end point of the drag is the right screen edge.
  • the digest moving image 1100 before editing includes scenes 1100a, 1100b, and 1100c on the way.
  • the scene 1100b is a combination scene, and includes a main scene S21 and a sub scene S22 as element scenes.
  • the sub-scene S22 is independent from the scene 1100b, and the edited digest moving image 1101 has a single scene 1100b2. It becomes.
  • the scene 1100b from which the sub-scene S22 is deleted is only the main scene S21 and becomes a single scene 1100b1. As a simpler embodiment, it may be inserted at a predetermined position immediately before or after the scene to be edited regardless of the position of the end point.
  • the digest moving image editing control unit 107 first deletes the original scene to be edited, and a new scene in which the sub scene 92 is deleted from the original scene to be edited and the sub scene at the temporal position where the scene to be edited exists. Two scenes of a single scene corresponding to 92 are inserted in the order described above.
  • the editing target scene includes only one element scene other than the sub-scene 92 as shown in FIG. 34C
  • the editing target scene becomes two single scenes after editing.
  • the editing target scene becomes one combination scene and one single scene after editing.
  • 34D and 34E show a case where the end point of the drag operation does not reach the screen edge.
  • the drag direction in the figure is an example.
  • FIG. 34 (d) shows a case where a portion other than the boundary portion of the sub scene 92 is dragged.
  • the digest moving image editing control unit 107 moves the sub-scene 92 to another place on the main scene.
  • the movement destination may be an arbitrary position near the end point of the drag, or, as indicated by a dotted rectangle in FIG. 34 (d), a position closest to the end point of the drag among a plurality of positions defined by the system. May be.
  • the digest moving image editing control unit 107 rewrites the information of the editing target scene in the arrangement information so as to correspond to the above, and generates and stores the editing target scene again.
  • FIG. 34 (e) shows a case where the boundary portion between the main scene and the sub scene 92 in the sub-screen arrangement is dragged.
  • the digest moving image editing control unit 107 changes the display size (number of vertical and horizontal pixels) of the sub-scene 92.
  • the new size of the sub-scene may be an arbitrary size derived from the end point of the drag, or may be a size closest to the size represented by the end point of the drag among a plurality of sizes determined by the system.
  • the new size and area of the sub-scene 92 may be larger or smaller than the size and area before the operation, but an upper limit and a lower limit may be provided. For example, the upper limit is 1/4 of the area of the entire moving image, and the lower limit is 1/16.
  • FIG. 34F shows a pattern in which the main scene 94 is arranged in the center of the screen with the sub-scene 93 as a background.
  • the arrangement in which the main scene overlaps the sub-scene is basically the same except that the scene whose size is to be changed becomes the main scene.
  • the boundary between the main scene 94 and the sub-scene 93 is only the right and left vertical sides of the main scene.
  • the size change of the element scene is performed while maintaining the original image aspect ratio of the element scene regardless of the drag direction. If only the rate or the reduction rate is changed, the user can easily understand and the operation is simple. On the other hand, if the size can be changed without maintaining the aspect ratio, the size can be changed more flexibly. In that case, the operation may be changed depending on the starting point of the drag. If the starting point of the drag is the corner of the boundary portion, the vertical and horizontal sizes of the element scene are simultaneously changed according to the drag operation. If the drag viewpoint is one of the four sides excluding the corner, dragging the vertical side changes the size in the horizontal direction, and if it is a horizontal side, changes the size in the vertical direction.
  • the image aspect ratio specified as a result of such dragging is different in many cases from the aspect ratio of the input image data corresponding to the element scene.
  • the input image data is scaled at different magnifications in the vertical and horizontal directions, or the input image data is trimmed in accordance with a new image aspect ratio.
  • the size of the input image data corresponding to the element scene is set to ws0: hs0 (horizontal: vertical), and the new size is set to ws1: hs1.
  • the left and right sides of the input image data corresponding to the element scene are deleted in a vertical band shape to form an image with the number of pixels (ws1 ⁇ hs1 / hs0): hs0 and reduced to ws1: hs1. .
  • the image may be simply trimmed so that the new image aspect ratio is obtained centering on the object.
  • the digest moving image editing control unit 107 changes the arrangement information in the editing target scene to arrange the element scene whose size has been adjusted as described above, and uses the input image data corresponding to the element scene to edit the editing target scene Is generated again.
  • the load when generating the edit target scene can be reduced by deleting the element scene from the arrangement information of the edit target scene.
  • FIG. 34 (g) shows an example in which a twist operation is performed in a combination scene.
  • the digest moving image editing control unit 107 selects another arrangement pattern from the possible arrangement patterns using the input image data corresponding to the element scene included in the scene to be edited, and edits it.
  • the target scene is generated again.
  • the arrangement order of arrangement is changed as in the examples shown in FIGS. 33B and 33C, and the arrangement pattern is changed to change the overlapping manner. It is also possible to do. Thereby, a preferable arrangement pattern for the user can be easily selected.
  • the arrangement pattern when the twist operation is performed near the boundary of the element scene, the arrangement pattern may be changed so that only the element scene near the boundary is changed. Further, when the boundary is a boundary between the main scene and the sub-scene, an arrangement in which the assignment of the main scene and the sub-scene is changed may be included in the possible arrangement pattern.
  • the original main scene is used as a new sub-scene and the original sub-scene is replaced with a new main scene without changing the sub-screen layout. Thereby, even when the user feels that the scene set as the sub-scene is more important than the main scene, the arrangement pattern can be easily changed.
  • the digest moving image editing control unit 107 changes the arrangement information according to the new arrangement pattern, and again generates the editing target scene based on the arrangement information.
  • the video editing apparatus that stores the input image data and the arrangement information used for generating the digest moving image can generate the digest moving image again based on a simple operation at the time of reproduction, which is more preferable for the user. Can be modified to a moving image.
  • the image data classification unit 101 the scene information generation unit 102, the digest moving image generation units 103, 103a, 103b, and 103c, 103d, event selection unit 104, output control unit 105, video display unit 106, digest moving image editing control unit 107, operation unit 108, target image data extraction unit 109, reproduction time candidate derivation unit 110, and reproduction time candidate display unit 111
  • the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed.
  • the “computer system” is a computer system built in the video editing apparatuses 100, 100a, 100b, and includes an OS and hardware such as peripheral devices.
  • the “computer-readable recording medium” refers to a portable medium such as a memory card, a magneto-optical disk, a CD-ROM, a DVD-ROM, a storage device such as a hard disk built in a computer system, an SSD. .
  • the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line,
  • a server or client that holds a program for a certain period of time, such as a volatile memory inside a computer system.
  • the program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
  • each video editing apparatus in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration).
  • LSI Large Scale Integration
  • Each functional block of the video editing apparatus may be individually made into a processor, or a part or all of them may be integrated into a processor.
  • the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.
  • the video editing apparatus includes a scene information generation unit that divides an image data group including a moving image into one or more scenes and generates scene information indicating features of a scene unit, and the scene A digest moving image generating unit that generates a digest moving image of the image data based on information, wherein the digest moving image generating unit generates a digest moving image based on the scene information Whether or not to use each scene, whether to place multiple scenes in the same frame, and the spatial arrangement pattern of scenes when placing multiple scenes in the same frame It is said.
  • the video editing apparatus is the video editing apparatus according to aspect 1, wherein the digest moving image generating unit compares scene information of a plurality of scenes that are temporally close to each other, and is a type of scene based on the comparison result.
  • a scene and a sub-scene are determined, and at least two or more main scenes are included in the same frame as the spatial arrangement pattern of the plurality of scenes based on the relationship of scene types between scenes that are temporally close to each other.
  • "Parallel arrangement" pattern, the main scene and the sub-scene are arranged in the center area of the screen, and the "center arrangement” pattern, the main scene is arranged so that the sub-scene is located around the main scene area. It may be selected and selected from any of “sub-screen arrangement” patterns that are displayed over the entire frame and arranged so that the sub-scene is superimposed on a part of the area.
  • the digest moving image generating unit arranges the main scene in the center area of the screen and the sub-scene so as to be located around the area of the main scene.
  • a spatial filter may be applied to the sub-scene to differentiate it from the main scene area in terms of image sharpness or color tone.
  • the video editing apparatus is the video editing apparatus according to any one of Aspects 1 to 3, wherein the digest moving image generating unit is further configured to generate a digest moving image in units of image data groups to be generated as a digest moving image.
  • the number of generations may be counted, and the arrangement pattern for arranging a plurality of scenes may be changed according to the number of generations.
  • the video editing apparatus is the video editing apparatus according to any one of aspects 1 to 4, wherein the scene information generation unit is information indicating the number of feature areas in the image frame in units of scenes. And “maximum region size” which is information indicating the size of the region having the largest area among the feature regions, and information indicating the position in the image of the region having the maximum area among the feature regions. ⁇ Maximum region position '' is generated as a part of the scene information, and the digest moving image generation unit, when arranging a plurality of scenes in the same frame, based on each information indicated by the scene information, The strength of the filter when applying the spatial filter to the image area to be cut out as the main scene or the sub-scene may be varied.
  • the video editing apparatus is the video editing apparatus according to any one of the aspects 1 to 5, wherein the video editing apparatus generates a digest moving image based on an output condition including characteristics of an output device that outputs the digest moving image.
  • the video editing apparatus is the video editing apparatus according to any one of Aspects 1 to 6, wherein the video editing apparatus stores image data in units of events based on metadata indicating a shooting condition included in the image data.
  • the digest moving image generation unit may generate a digest moving image with the image data group selected by the event selection unit as an input.
  • the video editing apparatus determines a digest moving image generation policy based on an output condition including characteristics of an output device that outputs a digest moving image, and the determined generation policy is
  • An output control unit for notifying the digest moving image generation unit may be further provided, and the digest moving image generation unit may determine a spatial arrangement pattern of the scene in the digest moving image based on the generation policy and the scene information.
  • the video editing apparatus is the video editing apparatus according to aspect 3, wherein the scene information generation unit includes information indicating the number of feature areas in an image frame in units of scenes, and the feature area. "Maximum region size” that is information indicating the size of a region having the largest area, and “maximum region position” that is information indicating the position in the image of the region having the largest area among the feature regions Is generated as a part of the scene information, and the digest moving image generation unit sets the “number of areas” or “maximum area size” indicated by the scene information when arranging a plurality of scenes in the same frame. Based on the size, the strength of the filter when applying the spatial filter to the sub-scene may be varied.
  • the digest moving image generation unit further determines the number of times that the digest moving image is generated in units of image data groups that are the generation targets of the digest moving image.
  • the arrangement pattern at the time of arranging a plurality of scenes may be changed according to the number of generations.
  • the video editing apparatus is the video editing apparatus according to aspect 8, wherein the digest moving image generation unit encodes a digest moving image based on the generation policy, and a code used when encoding the digest moving image.
  • the conversion quality may be determined.
  • the video editing apparatus provides a playback time candidate derivation unit for deriving a playback time candidate of a digest video based on an image data group, presents the playback time candidate to the user, and based on a user event.
  • a playback time candidate display unit that sets a designated time
  • a scene information generation unit that divides an image data group including moving images into one or more scenes, generates an image clip based on the scene
  • a video editing apparatus including a digest video generation unit that generates a digest video by temporally combining clips, wherein the digest video generation unit includes a reproduction time of the digest video and the specified time. It is characterized by performing such adjustment.
  • the image can be viewed by various display methods and can be viewed at a time desired by the user.
  • the digest moving image generation unit may shorten the reproduction time of the image clip with less movement as the designated time becomes shorter.
  • the digest moving image generation unit may shorten the reproduction time by thinning out the frames of the image clip.
  • the digest moving image generation unit may lengthen the reproduction time of the image clip with less movement as the designated time increases.
  • the digest moving image generation unit may extend the reproduction time by interpolating the frame of the image clip.
  • the digest moving image generating unit classifies the scene as either a single scene using the scene alone or a plurality of scenes using a combination of multiple scenes. As the specified time becomes shorter, the ratio of the plurality of scenes constituting the digest moving image may increase.
  • the reproduction time candidate derivation unit makes the reproduction time candidate shorter than the total reproduction time of the image data group, and the image data group The longer the total playback time, the longer the playback time candidate may be.
  • the video editing apparatus includes a scene information generation unit that divides an image data group including a moving image into one or more scenes and generates scene information indicating features of a scene unit, a digest video An output control unit that determines an image generation policy and notifies the digest moving image generation unit of the determined generation policy, and a scene in which a plurality of scenes are spatially arranged on the screen based on the scene information and the generation policy (Hereinafter, referred to as a combination scene), a digest moving image generating unit that generates a digest moving image of the image data group, a video display unit that displays video and operation information, and playing back the digest moving image
  • a video editing apparatus comprising: a digest video editing control unit that outputs to the video display unit; and an operation unit that detects an operation input from the outside. It is characterized by changing the configuration of the digest moving image by the detected operation input by the operation unit.
  • a large number and a large number of still images and moving images can be confirmed and viewed in a short time without trouble. Furthermore, an image configured so that a large number and a large number of still images and moving images can be easily checked and viewed can be easily corrected at the time of reproduction so as to have a configuration preferable for the user.
  • the video editing apparatus is the video editing apparatus according to aspect 19, wherein the video editing apparatus deletes a part of a scene constituting a scene or combination scene specified by the operation input from the digest moving image. Good.
  • the video editing apparatus may change a spatial arrangement pattern of the combination scene designated by the operation input.
  • the video editing apparatus may filter a moving image with respect to an area designated by the operation input.
  • the video editing apparatus may add newly captured image data to the digest moving image by the operation input.
  • the video editing apparatus is the video editing apparatus according to aspect 19, wherein the video editing apparatus selects any scene constituting the combination scene as a single scene from the combination scene specified by the operation input. You may insert in the said digest moving image before or after the said combination scene in time.
  • the video editing apparatus is the video editing apparatus according to any one of Aspects 19 to 24, wherein the video editing apparatus is an image used for generating the digest moving image by an operation input during reproduction of the digest moving image. And the content of the digest moving image may be changed using information indicating the spatial and temporal arrangement of the image.
  • the video editing apparatus may delete a part of the scenes constituting the combination scene designated by the operation input from the combination scene.
  • the video editing apparatus may change a spatial arrangement pattern of the combination scene by the operation input.
  • the present invention can be suitably applied to a video editing apparatus that generates a so-called digest moving image by inputting a still image or a moving image.
  • Target image extracting units 1032, 1032a, 1032d ...

Abstract

In order to address the problem of checking and viewing a large amount or number of still images or moving images in a short amount of time without requiring much effort, the present invention provides a video image editing apparatus (100) provided with: a scene information generation unit (102) which divides an image data group including moving images into one or more scenes as well as generates scene information representing the characteristics for each scene-unit; and a digest moving image generation unit (103) which generates a digest moving image on the basis of the scene information. The digest moving image generation unit (103) determines whether or not to use each scene when generating the digest moving image and whether or not to place a plurality of scenes in the same frame, and determines a spatial placement pattern for the scenes when placing a plurality of scenes within the same frame.

Description

映像編集装置Video editing device
 本発明は、動画像や静止画像などの映像情報を自動的に編集する映像編集装置に関する。 The present invention relates to a video editing apparatus that automatically edits video information such as moving images and still images.
 デジタルカメラやスマートフォン等、静止画像や動画像などの撮影機能を有する映像機器の普及と、メモリカード等の記録メディアの大容量化を背景に、気軽に映像情報を撮りためることができるようになっている。このような映像機器のユーザが撮りためた映像情報の活用手段の一つとして、ダイジェスト動画像の生成がある。ダイジェスト動画像は、多数あるいは長時間の動画像を入力とし、その全てを観賞する代わりに要約的にあるいは部分的に観賞できるように再構成した、比較的短時間の動画像である。 With the widespread use of video equipment with shooting functions for still images and moving images, such as digital cameras and smartphones, and the increased capacity of recording media such as memory cards, it is now possible to easily capture video information. ing. As one of means for utilizing video information taken by a user of such video equipment, there is generation of a digest moving image. The digest moving image is a moving image of a relatively short time that is reconstructed so that a large number or a long time moving image can be input, and all or all of the moving images can be viewed in summary or partially.
 特許文献1には、ダイジェスト動画像と静止画像を同時にまた連続的に表示する画像表示装置を開示している。特許文献1では、映画フィルムのコマ状に配置された領域に、連続的に配置された静止画像もしくは動画像を割り当て、複数の画像を同時に観賞することを可能にしている。 Patent Document 1 discloses an image display device that simultaneously and continuously displays a digest moving image and a still image. In Patent Document 1, still images or moving images arranged continuously are assigned to areas arranged in a frame shape on a movie film, and a plurality of images can be viewed simultaneously.
日本国公開特許公報「特開2010-258768号公報(公開日:2010年11月11日)」Japanese Patent Publication “Japanese Patent Laid-Open No. 2010-258768 (Publication Date: November 11, 2010)”
 しかしながら、従来のダイジェスト動画像生成およびその表示装置には以下のような問題点がある。なおこれ以降、当明細書にて「画像」と記述する場合、特に注記がなければそれは静止画像および動画像のいずれかまたは両方を意味する。「画像ファイル」、「画像データ」も同様である。 However, the conventional digest moving image generation and its display device have the following problems. Hereinafter, when “image” is described in this specification, it means either or both of a still image and a moving image unless otherwise noted. The same applies to “image file” and “image data”.
 特許文献1において、表示したい静止画像や動画像は、一旦画面上にサムネイル画像を表示し、表示された画像の中から選ぶ必要がある。撮影された画像の数が少なければそれで特に問題はないが、大量・多数の画像が撮りためられて整理されていないような場合は、ユーザは膨大なサムネイル画像の中から一つ一つ画像を選択する必要が生じる。画像の数が多くなればなるほど、その選択作業に必要な手間や労力は増してゆき、ユーザの負担となる。また、静止画像が対象であればサムネイル画像から内容の把握は容易であるが、動画像が対象だとサムネイル画像からは内容の把握がしにくい場合もある。そのような場合には、ユーザが手間をかけて選んだわりに適切な画像を選択できない、といった不満につながる。 In Patent Document 1, a still image or a moving image that is desired to be displayed needs to be displayed once on the screen as a thumbnail image and selected from the displayed images. If the number of captured images is small, there is no particular problem. However, if a large number of images are taken and not organized, the user can select each of the huge thumbnail images. There is a need to choose. As the number of images increases, the time and labor required for the selection work increase, and the burden on the user increases. Further, if a still image is a target, it is easy to grasp the content from the thumbnail image, but if a moving image is a target, it may be difficult to grasp the content from the thumbnail image. In such a case, it leads to dissatisfaction that the user cannot select an appropriate image instead of selecting it with effort.
 また、特許文献1では、表示装置上に固定的に配置された表示領域に静止画もしくは動画を分離して表示するため、表示が単調であり、飽きやすいという課題が有る。また、スマートフォンや小型のタブレットPCなど、画面の小さな表示装置では、表示領域が狭いために分離して表示した各画像が見にくい、という課題がある。 In Patent Document 1, there is a problem that the display is monotonous and easy to get tired because a still image or a moving image is separately displayed in a display area fixedly arranged on the display device. In addition, in a display device with a small screen such as a smartphone or a small tablet PC, there is a problem that it is difficult to see each image displayed separately because the display area is small.
 本発明は上記の点に鑑みてなされたものであり、画像の内容を確認または観賞しようとすると長時間かかったり、操作の手間がかかるような大量・多数の静止画像や動画像を、手間をかけずに短時間で確認・観賞することができる、映像編集装置または方法を提供する。 The present invention has been made in view of the above points, and it takes a long time to confirm or appreciate the contents of an image, and it takes a lot of still images and moving images to take a long time or troublesome operation. Provided is a video editing apparatus or method that can be confirmed and viewed in a short time without spending time.
 前述の課題を解決するために、本発明による映像編集装置は、動画像を含む画像データ群を、1つ以上のシーンに分割すると共に、シーン単位の特徴を示すシーン情報を生成するシーン情報生成部と、前記シーン情報に基づいて、前記画像データのダイジェスト動画像を生成するダイジェスト動画像生成部とを備える映像編集装置であって、前記ダイジェスト動画像生成部は、前記シーン情報に基づいて、ダイジェスト動画像を生成する際に各シーンを使用するか否か、複数のシーンを同一フレーム内に配置するか否か、および複数シーンを同一フレーム内に配置する際のシーンの空間的配置パターンを決定することを特徴としている。 In order to solve the above-described problem, a video editing apparatus according to the present invention divides an image data group including a moving image into one or more scenes and generates scene information indicating scene-specific features. And a digest moving image generating unit that generates a digest moving image of the image data based on the scene information, wherein the digest moving image generating unit is based on the scene information, Whether to use each scene when generating a digest video, whether to place multiple scenes in the same frame, and the spatial arrangement pattern of scenes when placing multiple scenes in the same frame It is characterized by deciding.
 また、前述の課題を解決するために、本発明による映像編集装置は、画像データ群に基づいて、ダイジェスト動画像の再生時間候補を導出する再生時間候補導出部と、前記再生時間候補をユーザに提示し、ユーザイベントに基づいて指定時間を設定する再生時間候補表示部と、動画像を含む画像データ群を、1つ以上のシーンに分割するシーン情報生成部と、前記シーンに基づいて、画像クリップを生成し、前記画像クリップを時間的に結合することでダイジェスト動画像を生成するダイジェスト動画像生成部とを備える映像編集装置であって、前記ダイジェスト動画像生成部は、前記ダイジェスト動画像の再生時間が前記指定時間となるような調整を実施することを特徴としている。 In order to solve the above-described problem, a video editing apparatus according to the present invention includes a playback time candidate derivation unit that derives a playback time candidate for a digest moving image based on an image data group, and the playback time candidate to a user. A playback time candidate display unit that presents and sets a designated time based on a user event, a scene information generation unit that divides an image data group including a moving image into one or more scenes, and an image based on the scene A video editing apparatus comprising: a digest moving image generating unit configured to generate a clip and generate a digest moving image by temporally combining the image clips, wherein the digest moving image generating unit Adjustment is performed such that the reproduction time becomes the specified time.
 また、前述の課題を解決するために、本発明による映像編集装置は、動画像を含む画像データ群を、1つ以上のシーンに分割すると共に、シーン単位の特徴を示すシーン情報を生成するシーン情報生成部と、ダイジェスト動画像の生成方針を決定し、決定した生成方針を前記ダイジェスト動画像生成部へ通知する出力制御部と、前記シーン情報および前記生成方針に基づいて、複数シーンを画面内に空間的に配置したシーン(以下、組合せシーンとする)を含む、前記画像データ群のダイジェスト動画像を生成するダイジェスト動画像生成部と、映像および操作用の情報を表示する映像表示部と、前記ダイジェスト動画像を再生して前記映像表示部に出力するダイジェスト動画像編集制御部と、外部からの操作入力を検出する操作部とを備える映像編集装置であって、前記操作部により検出された操作入力により前記ダイジェスト動画像の構成を変更することを特徴としている。 In order to solve the above-described problem, the video editing apparatus according to the present invention divides an image data group including a moving image into one or more scenes and generates scene information indicating scene-specific features. An information generation unit, an output control unit for determining a digest moving image generation policy, and notifying the determined generation policy to the digest moving image generation unit, and a plurality of scenes on the screen based on the scene information and the generation policy A digest moving image generating unit that generates a digest moving image of the image data group, and a video display unit that displays video and operation information; A digest moving image editing control unit that reproduces the digest moving image and outputs it to the video display unit, and an operation unit that detects an operation input from the outside. That a video editing apparatus is characterized by changing the configuration of the digest moving image by the detected operation input by the operation unit.
 本発明によれば、大量・多数の静止画像や動画像を、手間をかけずに短時間で確認・観賞することができる。 According to the present invention, it is possible to confirm and view a large number and a large number of still images and moving images in a short time without trouble.
本発明に係る映像編集装置の内部構成を示す概略図である。It is the schematic which shows the internal structure of the video editing apparatus based on this invention. 本発明によるシーン情報の一例を示す図である。It is a figure which shows an example of the scene information by this invention. 本発明によるダイジェスト動画像の生成過程を示す概念図である。It is a conceptual diagram which shows the production | generation process of the digest moving image by this invention. 本発明によるシーン情報とシーンの種類の一例を示す図である。It is a figure which shows an example of the scene information by this invention, and the kind of scene. 本発明によるダイジェスト動画像の表示例を示す図である。It is a figure which shows the example of a display of the digest moving image by this invention. 本発明によるダイジェスト動画像の別の表示例を示す図である。It is a figure which shows another example of a display of the digest moving image by this invention. ダイジェスト動画像生成部の内部構成を示す概略図である。It is the schematic which shows the internal structure of a digest moving image generation part. 本発明の第3の実施例に係る映像編集装置およびダイジェスト動画像生成部の内部構成を示す概略図である。It is the schematic which shows the internal structure of the video editing apparatus and digest moving image generation part which concern on 3rd Example of this invention. 本発明の第3の実施例における人物情報の概念図である。It is a conceptual diagram of the person information in the 3rd Example of this invention. 本発明の第3の実施例におけるシーン情報の一例を示す図である。It is a figure which shows an example of the scene information in the 3rd Example of this invention. 本発明の第3の実施例におけるシーン配置の例を示す図である。It is a figure which shows the example of the scene arrangement | positioning in the 3rd Example of this invention. 本発明の第3の実施例におけるシーン配置の別の例を示す図である。It is a figure which shows another example of the scene arrangement | positioning in the 3rd Example of this invention. 本発明の第3の実施例におけるシーン配置の別の例を示す図である。It is a figure which shows another example of the scene arrangement | positioning in the 3rd Example of this invention. 本発明の第3の実施例におけるシーン情報とフィルタ強度の関係の例を示す図である。It is a figure which shows the example of the relationship between the scene information and filter strength in 3rd Example of this invention. 本発明における画像データ群の分類の例を示す図である。It is a figure which shows the example of the classification | category of the image data group in this invention. 本発明において編集対象の画像データ群を選択する際の表示画面の一例を示す図である。It is a figure which shows an example of the display screen at the time of selecting the image data group of edit object in this invention. 本発明の第4の実施例に係る映像編集装置およびダイジェスト動画像生成部の内部構成を示す概略図である。It is the schematic which shows the internal structure of the video editing apparatus and digest moving image generation part which concern on 4th Example of this invention. 本発明の第4の実施例に係る映像編集装置が生成するタイトル画面と画像クリップの例を示す図である。It is a figure which shows the example of the title screen and image clip which the video editing apparatus concerning 4th Example of this invention produces | generates. 本発明によるランドスケープ画面用のシーン配置の例を示す図である。It is a figure which shows the example of the scene arrangement | positioning for the landscape screen by this invention. 本発明によるポートレート画面用のシーン配置の例を示す図である。It is a figure which shows the example of the scene arrangement | positioning for portrait screens by this invention. 本発明によるランドスケープ画面用のシーン配置に使用する画像領域の決定過程を示す図である。It is a figure which shows the determination process of the image area | region used for the scene arrangement | positioning for the landscape screen by this invention. 本発明によるポートレート画面用のシーン配置に使用する画像領域の決定過程を示す図である。It is a figure which shows the determination process of the image area | region used for the scene arrangement | positioning for portrait screens by this invention. 本発明の第5の実施例に係る映像編集装置の内部構成を示す概略図である。It is the schematic which shows the internal structure of the video editing apparatus based on the 5th Example of this invention. 本発明の第5の実施例に係るダイジェスト動画像の再生時間を指定するユーザインターフェースの一例を示す図である。It is a figure which shows an example of the user interface which designates the reproduction time of the digest moving image which concerns on 5th Example of this invention. 本発明の第5の実施例によるダイジェスト動画像の生成過程を示す概念図である。It is a conceptual diagram which shows the production | generation process of the digest moving image by the 5th Example of this invention. 本発明の第5の実施例によるダイジェスト動画像の再生時間と指定時間の説明をするための概念図である。It is a conceptual diagram for demonstrating the reproduction | regeneration time and designated time of a digest moving image by the 5th Example of this invention. 本発明の第5の実施例に係るダイジェスト動画像生成部の内部構成を示す概略図である。It is the schematic which shows the internal structure of the digest moving image generation part which concerns on the 5th Example of this invention. 画像クリップの再生時間を短くする処理を説明するための概念図である。It is a conceptual diagram for demonstrating the process which shortens the reproduction time of an image clip. 画像クリップの再生時間を長くする処理を説明するための概念図である。It is a conceptual diagram for demonstrating the process which lengthens the reproduction time of an image clip. 本発明の第6の実施例に係るダイジェスト動画像生成部の内部構成を示す概略図である。It is the schematic which shows the internal structure of the digest moving image generation part which concerns on the 6th Example of this invention. 本発明の第7の実施例に係る映像編集装置の内部構成を示す概略図である。It is the schematic which shows the internal structure of the video editing apparatus based on the 7th Example of this invention. 本発明の第7の実施例に係るダイジェスト動画像編集の処理を示すフロー図である。It is a flowchart which shows the process of the digest moving image edit which concerns on the 7th Example of this invention. 本発明の第7の実施例によるダイジェスト動画像の編集操作例を示す図である。It is a figure which shows the example of edit operation of the digest moving image by the 7th Example of this invention. 本発明の第8の実施例によるダイジェスト動画像の別の編集操作例を示す図である。It is a figure which shows another example of edit operation of the digest moving image by the 8th Example of this invention. 本発明の第8の実施例によるダイジェスト動画像のシーン削除操作の前後におけるダイジェスト動画像の変化例を示す図である。It is a figure which shows the example of a change of the digest moving image before and behind the scene deletion operation of the digest moving image by the 8th Example of this invention.
  (第1の実施形態)
 以下、図面を参照しながら本発明の実施形態について説明する。
(First embodiment)
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 図1は、本発明に係る第1の実施形態である映像編集装置の構成を示す概略図である。映像編集装置100は、画像データ分類部101、シーン情報生成部102、ダイジェスト動画像生成部103、イベント選択部104、および出力制御部105を含んで構成される。映像編集装置100はさらに、図示は省略するが、画像データを格納するデータ記録部や、画像を表示する映像表示部を内部に備えてもよいし、それらと同様の機能を備えるデータ記録装置や映像表示装置を、外部に接続可能な構成であってもよい。 FIG. 1 is a schematic diagram showing the configuration of a video editing apparatus according to the first embodiment of the present invention. The video editing apparatus 100 includes an image data classification unit 101, a scene information generation unit 102, a digest moving image generation unit 103, an event selection unit 104, and an output control unit 105. Although not shown, the video editing apparatus 100 may further include a data recording unit that stores image data, a video display unit that displays images, or a data recording device that has the same functions as those described above. The video display device may be configured to be connectable to the outside.
 画像データ分類部101は、画像データを分類する。画像データは、動画像を記録したデータであって、動画像の再生時間、撮影もしくは作成された日時を示す日時情報、撮影・作成された際の場所(位置)を示す位置情報、撮影もしくは作成したユーザまたは機器を示す作成者情報などのメタデータを含む電子データである。各画像データは、図示しない記録メディアに格納された電子ファイルであってもよく、あるいは、撮影装置から入力された画像・音声信号を含むデジタルデータであってもよい。また、画像データに静止画像を含んでいても構わない。 The image data classification unit 101 classifies image data. The image data is data that records a moving image. The playback time of the moving image, date and time information indicating the date and time when the image was shot or created, position information that indicates the location (position) when the image was shot and created, and shooting or creation. Electronic data including metadata such as creator information indicating the user or the device that has performed. Each image data may be an electronic file stored in a recording medium (not shown), or may be digital data including an image / audio signal input from the photographing apparatus. The image data may include a still image.
 画像データ分類部101は、画像データに含まれるメタデータに基づいて、各画像を所定の条件に合致する一つ以上の画像データ群に分類する。例えば、同一の年月日に撮影された画像データを、一つの画像データ群として分類する。さらに、撮影した際の位置情報も参照し、撮影日時が同一の年月日であって、かつ位置情報が所定の範囲内である複数の画像データを、一つの画像データ群として分類してもよいし、撮影日時が異なっても撮影時の位置情報が所定の範囲内である複数の画像データを、一つの画像データ群として分類してもよい。また、例えば、位置情報が所定の範囲内であって作成者情報が同一である複数の画像データを、一つの画像データ群として分類してもよい。 The image data classification unit 101 classifies each image into one or more image data groups that match a predetermined condition based on metadata included in the image data. For example, image data captured on the same date is classified as one image data group. Furthermore, referring to position information at the time of shooting, a plurality of image data having the same shooting date and time and position information within a predetermined range may be classified as one image data group. Alternatively, a plurality of pieces of image data whose position information at the time of shooting is within a predetermined range even when the shooting dates and times are different may be classified as one image data group. Further, for example, a plurality of pieces of image data having position information within a predetermined range and the same creator information may be classified as one image data group.
 図15に、画像データ分類部101によって分類される画像データ群の例を示す。データ記録部30内に、画像データ11、12、13、…1n、21、22、23、…2n等が格納されているものとする。画像データ群10は、画像データ11、12、13、…1nを含んで構成され、画像データ群20は、画像データ21、22、23、…2nを含んで構成される。画像データ11、12、13、…1nは、各画像データに含まれるメタデータ11a、12a、13a…のうち、日時情報が「2014年01月01日」、位置情報が「自宅周辺」である点が共通である。画像データ群10は、このような、日時情報(撮影日)および位置情報が同一である画像データ11、12、13…を一つの画像データ群として分類した例である。画像データ21、22、23、…2nは、各画像データに含まれるメタデータ21a、22a、23a…のうち、日時情報は2014年01月02日から2014年01月05日の範囲で異なるが、位置情報が「ハワイ島」である点が共通である。画像データ群20は、このような、日時情報(撮影日)は異なるが、位置情報が所定範囲内である画像データ21、22、23…を一つの画像データ群として分類した例である。画像データ分類部101は、このように分類した画像データ群を示す情報として、画像データ群識別情報10A、20Aを生成する。画像データ群識別情報10A、20Aは、画像データ群を識別するために、画像データ群の名称と、その画像データ群に含まれる画像データを示す情報を含んで構成される。図15の例では、画像データ分類部101は、画像データ群10の名称として「2014年01月01日 自宅周辺」という文字列を与える。また、画像データ群20の名称として、「2014年01月02日~05日 ハワイ島」という文字列を与える。また、図では省略したが、画像データ群識別情報10A、20Aを構成する情報として、画像データ群に含まれる各画像データの名称(図中で、/data/DSC_1001.mov等のファイル名)の他に、撮影日時も含むように画像データ群識別情報を構成してもよい。 FIG. 15 shows an example of an image data group classified by the image data classification unit 101. It is assumed that image data 11, 12, 13,... 1n, 21, 22, 23,. The image data group 10 includes image data 11, 12, 13,... 1n, and the image data group 20 includes image data 21, 22, 23,. Image data 11, 12, 13,..., 1n includes metadata 11a, 12a, 13a... Included in each image data, the date and time information is “01/01/2014” and the position information is “around home”. The point is common. The image data group 10 is an example in which the image data 11, 12, 13,... Having the same date information (shooting date) and position information are classified as one image data group. The image data 21, 22, 23,... 2n, among the metadata 21a, 22a, 23a... Included in the respective image data, has different date / time information in the range from January 02, 2014 to January 05, 2014. The point is that the location information is “Hawaii Island”. The image data group 20 is an example in which the image data 21, 22, 23,..., Which have different date information (shooting date) but whose position information is within a predetermined range, is classified as one image data group. The image data classification unit 101 generates image data group identification information 10A and 20A as information indicating the image data groups classified in this way. The image data group identification information 10A and 20A includes the name of the image data group and information indicating the image data included in the image data group in order to identify the image data group. In the example of FIG. 15, the image data classifying unit 101 gives a character string “01/01/2014 around the home” as the name of the image data group 10. In addition, as the name of the image data group 20, a character string “02.05 to 05.2014, Hawaii Island” is given. Although omitted in the figure, as the information constituting the image data group identification information 10A, 20A, the name of each image data included in the image data group (file name such as /data/DSC_1001.mov in the figure) In addition, the image data group identification information may be configured to include the shooting date and time.
 図1に戻って、シーン情報生成部102は、画像データを分析して、画像信号や音声信号で特徴づけられる1つ以上のシーンに分類し、シーン単位の特徴を示す情報であるシーン情報を生成する。シーン情報は、例えば、画像内の時間方向の変化を示す動き情報、画像内に現れる人物の領域の数や大きさを示す人物情報、または音声信号内の発話区間の有無や長さを示す会話情報などを含んで構成される。シーン情報生成部102および生成されるシーン情報の詳細については後述する。 Returning to FIG. 1, the scene information generation unit 102 analyzes the image data, classifies the image data into one or more scenes characterized by an image signal or an audio signal, and generates scene information that is information indicating the feature of each scene. Generate. The scene information is, for example, motion information indicating changes in the time direction in the image, person information indicating the number or size of a person's area appearing in the image, or conversation indicating the presence or length of an utterance section in an audio signal. It includes information. Details of the scene information generation unit 102 and the generated scene information will be described later.
 ダイジェスト動画像生成部103は、画像データ分類部101によって分類された画像データ群の単位で、シーン情報生成部102によって生成されたシーン情報を読み込み、画像データの撮影・作成の時系列に沿って、ダイジェスト動画像を生成する。画像データ群が複数ある場合は、後述するイベント選択部104が選択した画像データ群を対象として、ダイジェスト動画像を生成する。また、ダイジェスト動画像生成部103は、ダイジェスト動画像を生成する際に、後述する出力制御部105から通知される生成方針に従って、ダイジェスト動画像を生成する。映像編集装置100は、生成したダイジェスト動画像を、映像編集装置100に内蔵された映像表示部または外部接続された映像表示装置へ出力するか、内蔵されたデータ記録部または外部接続されたデータ記録装置へ出力する。ダイジェスト動画像生成部103の動作の詳細については後述する。 The digest moving image generation unit 103 reads the scene information generated by the scene information generation unit 102 in units of image data groups classified by the image data classification unit 101, and follows the time series of image data shooting / creation. Then, a digest moving image is generated. When there are a plurality of image data groups, a digest moving image is generated for the image data group selected by the event selection unit 104 described later. In addition, the digest moving image generation unit 103 generates a digest moving image according to a generation policy notified from the output control unit 105 described later when generating a digest moving image. The video editing device 100 outputs the generated digest moving image to a video display unit built in the video editing device 100 or an externally connected video display device, or a built-in data recording unit or an externally connected data recording Output to the device. Details of the operation of the digest moving image generating unit 103 will be described later.
  (イベント選択部)
 イベント選択部104は、画像データ分類部101で分類された画像データ群の内、どの画像データ群を編集対象とするかを決定する。例えば、ダイジェスト動画像の自動編集を行う編集日を基準にして、前日に撮影された画像データ群、すなわち撮影日時が編集日の前日である画像データ群を編集対象に決定する。また、編集日ではなく、ユーザが指定する指定日時を基準として、撮影日時がその指定日時の前後の画像データ群を編集対象に決定しても良い。イベント選択部104が編集対象と決定する画像データ群は、日時情報だけでなく、位置情報や作成者情報に基づいてもよい。例えば、ユーザが指定する位置情報またはその位置を含む所定範囲内の位置情報を有する画像データを含む画像データ群を、編集対象に決定してもよい。あるいは、所定範囲内の位置情報を有する画像データを含む画像データ群が、異なる作成者別に複数ある中で、特定の作成者情報を有する画像データ群のみを編集対象に決定してもよいし、逆に、特定の作成者情報を有する画像データ群を除外した画像データ群を、編集対象に決定してもよい。イベント選択部104が編集対象として決定する画像データ群は、1つとは限らず、2つ以上でもよい。なお、イベント選択部104が、編集対象とする画像データ群を決定するタイミングとして、一日の切り替わりをトリガにしてもよい。例えば、午前0時を過ぎた時点で、その前日に撮影された画像データ群を編集対象と決定してもよい。また、イベント選択部104が、編集対象とする画像データ群を決定する方法の別の例として、ユーザの選択に従って決定してもよい。例えば、イベント選択部104は、画像データ分類部101によって分類された一つ以上の画像データ群を示す情報を、図示しない表示部に表示する。画像データ群を示す情報は、例えば、画像データ群の撮影日や作成者を表す文字列でもよいし、画像データ群に含まれる撮影位置情報の範囲を示した地図画像上のアイコンやサムネイル画像でもよい。ユーザは表示された情報の中から、ダイジェスト動画像の編集対象としたい画像データ群を指定する。イベント選択部104は、ユーザが指定した画像データ群を、編集対象の画像データ群とするように決定する。イベント選択部104は、決定した画像データ群を示す情報(選択情報)を、ダイジェスト動画像生成部103へ通知する。
(Event selection part)
The event selection unit 104 determines which image data group among the image data groups classified by the image data classification unit 101 is to be edited. For example, the image data group captured on the previous day, that is, the image data group whose shooting date and time is the previous day of the editing date, is determined as an editing target based on the editing date on which the digest moving image is automatically edited. Further, on the basis of the designated date and time designated by the user instead of the editing date, the image data group with the shooting date and time before and after the designated date and time may be determined as the editing target. The image data group that the event selection unit 104 determines to be edited may be based not only on date / time information but also on position information and creator information. For example, an image data group including image data having position information specified by the user or position information within a predetermined range including the position may be determined as an editing target. Alternatively, among a plurality of image data groups including image data having position information within a predetermined range for different creators, only an image data group having specific creator information may be determined as an editing target. Conversely, an image data group excluding an image data group having specific creator information may be determined as an editing target. The number of image data groups that the event selection unit 104 determines to be edited is not limited to one, and may be two or more. In addition, as a timing when the event selection unit 104 determines an image data group to be edited, a day change may be used as a trigger. For example, when midnight has passed, the image data group captured on the previous day may be determined as the editing target. Further, as another example of a method in which the event selection unit 104 determines an image data group to be edited, the event selection unit 104 may determine the image data group according to a user selection. For example, the event selection unit 104 displays information indicating one or more image data groups classified by the image data classification unit 101 on a display unit (not shown). The information indicating the image data group may be, for example, a character string representing the shooting date or creator of the image data group, or may be an icon or thumbnail image on a map image indicating the range of shooting position information included in the image data group. Good. From the displayed information, the user designates an image data group desired to be edited for the digest moving image. The event selection unit 104 determines the image data group designated by the user as the image data group to be edited. The event selection unit 104 notifies the digest moving image generation unit 103 of information (selection information) indicating the determined image data group.
 図16に、編集対象とする画像データ群をユーザが選択して決定する際の、イベント選択部104による表示画面の一例を示す。イベント選択部104は、図15で示したような、画像データ分類部101によって分類された画像データ群10、20、…を示す画像データ群識別情報10A、20A、…から、画像データ群の名称を抽出し、それら名称41、42、43、…を含む選択用表示画面40を出力する。ユーザは、映像編集装置100に接続または内蔵された操作手段(例えばタッチパネル、マウス、キーボード等)を介して、編集対象としたい画像データ群の名称(41、42、43…等のいずれか)を指定する。イベント選択部104は、ユーザが指定した画像データ群の名称(図16の例では「2014年01月02日~05日 ハワイ島」)もしくは、その名称に対応する画像データ群識別情報(10A、20A等)を、画像データ群を示す選択情報として、ダイジェスト動画像生成部103へ通知する。 FIG. 16 shows an example of a display screen by the event selection unit 104 when the user selects and determines an image data group to be edited. The event selection unit 104 uses the image data group identification information 10A, 20A,... Indicating the image data groups 10, 20,. And the selection display screen 40 including the names 41, 42, 43,... Is output. The user inputs the name (41, 42, 43..., Etc.) of the image data group desired to be edited via an operation means (for example, touch panel, mouse, keyboard, etc.) connected to or incorporated in the video editing apparatus 100. specify. The event selection unit 104 selects the name of the image data group designated by the user (“January 02 to 05, 2014, Hawaii Island” in the example of FIG. 16) or image data group identification information (10A, 20A or the like) is sent to the digest moving image generation unit 103 as selection information indicating the image data group.
  (出力制御部)
 出力制御部105は、ダイジェスト動画像生成部103が生成するダイジェスト動画像の出力先と生成の方針を決定する。出力制御部105は、図示しない映像表示装置の表示画素数や音声出力仕様などを示す能力情報を入力し、能力情報に基づいてダイジェスト動画像の生成方針を決定する。出力先の映像表示装置が複数ある場合、例えば、本実施例の映像編集装置が、ダイジェスト動画像を表示可能な映像表示部を内蔵し、かつ、外部に別の映像表示装置が接続されている場合は、内蔵の映像表示部、外部接続の映像表示装置それぞれに対する、ダイジェスト動画像の生成方針を決定する。また、内蔵か外部出力かに関わらず、生成するダイジェスト動画像を映像として表示するか、データとして符号化して記録メディアに保存するか、あるいは通信メディアを介して外部に出力するか、も決定する。出力制御部105は、決定した生成方針をダイジェスト動画像生成部103へ通知する。
(Output control unit)
The output control unit 105 determines an output destination and a generation policy of the digest moving image generated by the digest moving image generating unit 103. The output control unit 105 inputs capability information indicating the number of display pixels of a video display device (not shown), audio output specifications, and the like, and determines a digest moving image generation policy based on the capability information. When there are a plurality of output destination video display devices, for example, the video editing device of this embodiment has a built-in video display unit capable of displaying a digest moving image, and another video display device is connected to the outside. In this case, a digest moving image generation policy is determined for each of the built-in video display unit and the externally connected video display device. Also, whether the digest video to be generated is displayed as video, encoded as data and stored in recording media, or output to the outside via communication media, regardless of whether it is internal or external output . The output control unit 105 notifies the digest video generation unit 103 of the determined generation policy.
 出力制御部105における処理内容の詳細を以下に述べる。出力制御部105は、図示しない入力手段によって与えられるダイジェスト動画像の生成方針を構成する情報に基づいて、ダイジェスト動画像の生成方針を決定する。ダイジェスト動画像の生成方針は、出力先情報、出力画像仕様、出力音声仕様、シーン選択基準、複数シーン同時配置を示す情報を含んで構成される、一種のパラメータセットである。以下、出力制御部105における、ダイジェスト動画像の生成方針を決定する過程を、上記パラメータごとに説明する。 Details of processing contents in the output control unit 105 will be described below. The output control unit 105 determines a digest moving image generation policy based on information constituting a digest moving image generation policy given by an input unit (not shown). The digest moving image generation policy is a kind of parameter set including information indicating output destination information, output image specifications, output audio specifications, scene selection criteria, and simultaneous arrangement of a plurality of scenes. Hereinafter, the process of determining the digest moving image generation policy in the output control unit 105 will be described for each parameter.
 出力制御部105は、出力先の映像表示デバイスが、映像編集装置100に内蔵された映像表示部であるか、外部に接続された映像表示装置であるかを示す情報を、出力先情報として決定する。出力先が内蔵であるか外部接続であるかは、映像編集装置100の外部に表示装置が接続されているかどうかを電気的に検知して判定するか、図示しない入力手段によってユーザが指定する。出力制御部105は、映像編集装置100に映像表示部が内蔵されていて、かつ外部に映像表示装置が接続されている場合には、両方を出力先とするように出力先情報を決定してもよい。 The output control unit 105 determines, as output destination information, information indicating whether the output destination video display device is a video display unit built in the video editing device 100 or an externally connected video display device. To do. Whether the output destination is built-in or externally connected is determined by electrically detecting whether or not a display device is connected to the outside of the video editing apparatus 100, or specified by an input unit (not shown). When the video editing device 100 has a built-in video display unit and an external video display device is connected, the output control unit 105 determines output destination information so that both are output destinations. Also good.
 出力制御部105は、出力先の映像表示デバイスすなわち、映像編集装置に内蔵された映像表示部もしくは外部に接続された映像表示装置の表示画素数などの画像表示仕様を示す情報を、映像表示装置の内部もしくは外部から受け取り、出力画像仕様を決定する。出力画像仕様は、少なくとも出力水平画素数および出力垂直画素数を含んで構成され、基本的には、出力先の映像表示デバイスの水平方向・垂直方向それぞれの表示画素数をそのまま出力水平画素数および出力垂直画素数にそれぞれ設定する。ただし、出力先の映像表示デバイスが、画像をウィンドウ表示するなど、表示デバイスの全画面に画像を表示しないことが判明している場合は、出力先の映像表示デバイスの表示画素数よりも小さな値を、出力水平画素数および出力垂直画素数にそれぞれ設定してもよい。 The output control unit 105 displays information indicating image display specifications such as the number of display pixels of an output destination video display device, that is, a video display unit built in the video editing device or an externally connected video display device. To determine the output image specifications. The output image specification is configured to include at least the number of output horizontal pixels and the number of output vertical pixels. Basically, the number of display pixels in the horizontal direction and the vertical direction of the output destination video display device is used as it is. Set to the number of output vertical pixels. However, if it is known that the output video display device does not display an image on the full screen of the display device, such as displaying an image in a window, the value is smaller than the number of display pixels of the output video display device. May be set to the number of output horizontal pixels and the number of output vertical pixels, respectively.
 出力制御部105は、出力先の音声出力デバイスすなわち、映像編集装置に内蔵された音声出力部もしくは外部に接続された映像表示装置の音声出力部の音声再生能力を示す情報を、映像表示装置の内部もしくは外部から受け取り、出力音声仕様を決定する。出力音声仕様は、少なくとも出力音声チャネル数を含み、音声出力がステレオに対応していれば出力音声チャネル数=2、音声出力がステレオに対応していなければ出力音声チャネル数=1、音声出力機能自体がない場合は出力音声チャネル数=0を、それぞれ設定する。出力音声仕様は、出力音声チャネル以外にも、サンプリング周波数、量子化ビット数などを含んでいてもよく、いずれの情報も、出力先の音声再生デバイスの音声再生能力で示される情報を設定する。例えば、サンプリング周波数は、32kHz、44.1kHz、48kHz、96kHzなどの例があり、量子化ビット数は、8ビット、16ビット、24ビットなどの例がある。 The output control unit 105 outputs information indicating the audio playback capability of the audio output device of the output destination, that is, the audio output unit incorporated in the video editing device or the audio output unit of the video display device connected to the outside, to the video display device. Receive from inside or outside and determine output audio specifications. The output audio specification includes at least the number of output audio channels. If the audio output corresponds to stereo, the number of output audio channels = 2, and if the audio output does not correspond to stereo, the number of output audio channels = 1, the audio output function If there is not, the number of output audio channels = 0 is set. In addition to the output audio channel, the output audio specification may include a sampling frequency, the number of quantization bits, etc., and any information sets information indicated by the audio reproduction capability of the output audio reproduction device. For example, there are examples of sampling frequencies such as 32 kHz, 44.1 kHz, 48 kHz, and 96 kHz, and examples of the number of quantization bits include 8 bits, 16 bits, and 24 bits.
 出力制御部105は、図示しない入力手段によって、ダイジェスト動画像生成に関するユーザの好みを示す情報を受け取り、「人物主体」、「風景主体」などのシーン選択基準を決定する。ユーザの好みを示す情報は、「人物」、「風景」などの言語情報でもよいし、例えば傾向の異なるサムネイル画像の中から選択して得られるような、画像自体を示す情報でもよい。ユーザの好みを示す情報は、単に「人物主体」や「風景主体」だけではなく、顔認識や形状認識に基づく「特定の人物」や「動物」、「花」など主要な被写体を示す情報や、画素値の分布分析に基づく「海辺」、「森林」などの風景の種類を示す情報など、画像の傾向をより詳細に指定する情報でもよい。なお、ユーザの好みを示す情報が特に指定されない場合、出力制御部105は、標準のシーン選択基準として「人物主体」を設定してもよい。 The output control unit 105 receives information indicating the user's preference regarding the digest moving image generation by an input unit (not shown), and determines scene selection criteria such as “person main” and “landscape main”. The information indicating the user's preference may be language information such as “person” or “landscape”, or may be information indicating the image itself obtained by selecting from thumbnail images having different tendencies, for example. Information indicating user preferences is not only “person-centric” or “landscape-centric”, but also information indicating major subjects such as “specific persons”, “animals”, and “flowers” based on face recognition and shape recognition. Information indicating the details of the image, such as “seaside” and “forest” based on a distribution analysis of pixel values, may be used. When information indicating user preference is not specified, the output control unit 105 may set “person subject” as a standard scene selection criterion.
 出力制御部105は、出力先の映像表示デバイスに応じて、複数シーンを同一画像フレーム内に同時に配置することを許容するか否かを示す情報を、「複数シーン同時配置」として設定する。例えば、出力先として映像編集装置に内蔵された映像表示部が有効な場合、複数シーン同時配置を「否」に設定し、出力先として外部に接続された映像表示装置が有効な場合、複数シーン同時配置を「可」に設定する。この判断基準は、映像編集装置に内蔵される表示デバイスは小型であるため(例えば、映像編集装置がスマートフォンである場合)、外部接続する場合は大型の表示デバイスに接続する、という前提によるものである。なお、出力先の表示デバイスの大きさがあらかじめ判明している場合や、表示デバイスの大きさを算出できる情報(例えば、画素密度を示す情報:dpi)が得られている場合は、表示デバイスの大きさを算出し、所定の閾値より大きい場合には複数シーン同時配置を「可」に設定し、そうでない場合には複数シーン同時配置を「否」に設定する。 The output control unit 105 sets, as “multiple scene simultaneous arrangement”, information indicating whether or not multiple scenes are allowed to be simultaneously arranged in the same image frame in accordance with the video display device of the output destination. For example, if the video display unit built in the video editing device is valid as the output destination, the simultaneous arrangement of multiple scenes is set to “No”, and if the video display device connected externally is valid as the output destination, multiple scenes Set simultaneous placement to “Yes”. This criterion is based on the premise that the display device built in the video editing apparatus is small (for example, when the video editing apparatus is a smartphone), and therefore, when connected externally, it is connected to a large display device. is there. In addition, when the size of the display device of the output destination is known in advance, or when information capable of calculating the size of the display device (for example, information indicating pixel density: dpi) is obtained, The size is calculated, and if it is larger than the predetermined threshold, the simultaneous arrangement of a plurality of scenes is set to “permitted”, and if not, the simultaneous arrangement of a plurality of scenes is set to “No”.
  (シーン情報生成部)
 次に、シーン情報生成部102および生成されるシーン情報の詳細について述べる。図2は、シーン情報生成部が生成するシーン情報の一例である。図2に示したシーン情報200は、シーンに関する情報を行単位で記述したものであり、各行201、202、203、…の記述がそれぞれ一つのシーンに対応するように構成される。各行201、202、203…で記述される情報は、左から順に、画像ファイル名、撮影年月日、撮影時刻、シーン先頭フレーム番号、シーン終端フレーム番号、人物情報、動き情報、会話情報を示している。画像ファイル名は、各シーンを含む静止画像または動画像のデータの格納場所を示す文字列である。撮影年月日および撮影時刻は、基本的には、各シーンが含まれる画像ファイルが記録された時点の年月日および時刻を示す文字列である。シーン先頭フレーム番号およびシーン終端フレーム番号は、対応する画像ファイルにおける当該シーンの時間範囲(シーン長)を示す情報である。例えばシーン先頭フレーム番号が0、シーン終端フレーム番号が149の場合、対応する画像ファイルが30fpsの動画像データであれば、ファイル先頭から5秒間のシーンであることを示している。そして、人物情報、動き情報、会話情報は、当該シーンの画像信号・音声信号の特徴を示す情報である。次に、各シーンの画像信号・音声信号の特徴を示す情報である、人物情報、動き情報、会話情報について説明する。
(Scene information generator)
Next, details of the scene information generation unit 102 and the generated scene information will be described. FIG. 2 is an example of scene information generated by the scene information generation unit. The scene information 200 shown in FIG. 2 describes scene-related information in units of lines, and each line 201, 202, 203,... Is configured to correspond to one scene. Information described in each row 201, 202, 203,... Indicates an image file name, shooting date, shooting time, scene start frame number, scene end frame number, person information, motion information, and conversation information in order from the left. ing. The image file name is a character string indicating a storage location of still image data or moving image data including each scene. The shooting date and shooting time are basically character strings indicating the date and time when the image file including each scene is recorded. The scene start frame number and the scene end frame number are information indicating the time range (scene length) of the scene in the corresponding image file. For example, when the scene start frame number is 0 and the scene end frame number is 149, if the corresponding image file is moving image data of 30 fps, it indicates that the scene is 5 seconds from the start of the file. The person information, motion information, and conversation information are information indicating the characteristics of the image signal / audio signal of the scene. Next, personal information, motion information, and conversation information, which are information indicating the characteristics of the image signal / audio signal of each scene, will be described.
 人物情報とは、シーンについて人物の有無を含む情報である。さらに、人数、個人名、姿勢、人物領域のサイズ、複数の人物の分布パターンを示す情報が含まれていてもよい。動き情報は、シーンについて動きの有無や種類を示す情報である。個々のオブジェクトの動きを示していてもよいし、領域毎の動きを示していてもよい。会話情報は、シーンについて音量や音声の種類(無音、人の声、音楽など)を示す情報である。さらに、話者を特定する情報や音楽の種類などの、音源の情報が含まれていてもよい。図2においては、前記3種類の情報は、既定の種類に対応するインデクスの数字として表されている。 Person information is information including the presence or absence of a person in the scene. Furthermore, information indicating the number of persons, the personal name, the posture, the size of the person area, and the distribution pattern of a plurality of persons may be included. The motion information is information indicating the presence / absence and type of motion in the scene. The movement of each object may be shown, and the movement for each area may be shown. Conversation information is information indicating the volume and type of sound (silence, human voice, music, etc.) for a scene. Furthermore, information for identifying a speaker and information on a sound source such as music type may be included. In FIG. 2, the three types of information are represented as index numbers corresponding to predetermined types.
 人物情報に関しては、例えば、「人物なし(0)」、「主要人物(1)」、「その他人物(2)」の3種類の値をとり得る。人物なし(0)は、そのシーン全体にわたって、人物の姿が全くもしくはほとんど写っていないことを意味する。また、主要人物(1)は、シーン内に人物が1~2名写っており、かつその領域が所定のサイズより大きいことを意味する。例えば、撮影者が意図を持って特定の人物を撮影したようなシーンが相当する。また、その他人物(2)は、シーン内に人物が写っているが、その数が多いか、もしくは写っている領域が所定のサイズよりも小さいことを意味する。例えば、特定の人物を含む集合写真的なシーンや、誰とは分からなくても人々の動きの様子が分かるように撮影したシーンなどが相当する。 With respect to the person information, for example, three types of values of “no person (0)”, “main person (1)”, and “other person (2)” can be taken. No person (0) means that there is no or almost no figure of the person throughout the scene. The main character (1) means that one or two persons are shown in the scene and the area is larger than a predetermined size. For example, a scene in which a photographer has photographed a specific person with an intention corresponds. The other person (2) means that a person is captured in the scene, but the number of persons is large, or the captured area is smaller than a predetermined size. For example, it corresponds to a group photo-like scene including a specific person, or a scene shot so that people can see how people move without knowing who they are.
 動き情報に関しては、例えば、「動きなし(0)」、「動き一部(1)」、「動き全体(2)」の3種類の値をとり得る。動きなし(0)は、そのシーン全体にわたって、画像としての変化がほとんどないことを意味する。また、動き一部(1)は、そのシーン内で、画像領域の一部に動きがあることを意味する。例えば、固定したカメラの前で人物が踊っているようなシーンが相当する。また、動き全体(2)は、そのシーン内で、画像領域全体にわたる動きがあることを意味する。例えば、カメラ自体を水平に動かしながら撮影したようなシーンが相当する。 Regarding the motion information, for example, three types of values of “no motion (0)”, “part of motion (1)”, and “entire motion (2)” can be taken. No motion (0) means that there is almost no change in the image throughout the scene. Also, the motion part (1) means that there is motion in a part of the image area in the scene. For example, a scene in which a person is dancing in front of a fixed camera corresponds. Further, the entire motion (2) means that there is a motion over the entire image area in the scene. For example, it corresponds to a scene shot while moving the camera itself horizontally.
 会話情報に関しては、例えば、「音なし(0)」、「会話あり(1)」、「その他音声(2)」の3種類の値をとり得る。音なし(0)は、そのシーン全体にわたって、著しく音信号のレベルが低いなど、利用できる音信号が記録されていないことを意味する。また、会話あり(1)は、シーン内で人の会話を含む音声が記録されていることを意味する。また、その他音声(2)は、会話ではないが所定レベル以上の音信号が継続して記録されていることを意味する。例えば、音楽が流れているようなシーンが相当する。 Concerning conversation information, for example, three kinds of values of “no sound (0)”, “conversation exists (1)”, and “other voice (2)” can be taken. No sound (0) means that no usable sound signal is recorded, for example, the sound signal level is extremely low throughout the scene. Also, “with conversation (1)” means that a voice including a conversation of a person is recorded in the scene. The other voice (2) means that a sound signal of a predetermined level or higher is continuously recorded although it is not a conversation. For example, a scene where music is flowing corresponds.
 シーン情報生成部102は、上記のようなシーン情報を、画像データ内の画像信号や音声信号を解析することによって決定し、生成する。その際、例えば1秒間の長さの単位で画像信号および音声信号を解析し、画像信号・音声信号の特徴を示す前記3種類の情報に変化がない場合には、一つの連続するシーンとしてシーン情報を生成する。一方、前記3種類の情報の内、いずれかに変化が生じた場合は、その変わり目をシーンの切れ目とし、一つの動画データを複数のシーンに分割して、それぞれのシーン情報を生成する。 The scene information generation unit 102 determines and generates the scene information as described above by analyzing the image signal and the audio signal in the image data. At that time, for example, when an image signal and an audio signal are analyzed in units of length of 1 second, and there is no change in the three types of information indicating the characteristics of the image signal / audio signal, the scene is regarded as one continuous scene. Generate information. On the other hand, when a change occurs in any of the three types of information, the change is regarded as a scene break, and one piece of moving image data is divided into a plurality of scenes to generate respective scene information.
 さらに、シーン情報生成部102は、シーン情報生成の過程で、ダイジェスト動画像に含めるには不向きなシーンを除外するようにシーン情報を生成してもよい。例えば、画像信号の解析の過程で、急激な画像全体の動きがある状態やフォーカスがあっていない状態等、画像を見たとしても何を写したものか分からない可能性の高いシーンに関して、シーン情報生成部102はシーン情報自体を生成しない。もしくは、シーン情報生成部102は当該シーンに対して、ダイジェスト動画像に含めるには適さないことを示すダイジェスト不適のフラグを生成する。これにより、例えばデジタルカメラやスマートフォン等で動画像を撮り始める際に、大きな手ぶれやフォーカスずれが生じた場合等、ダイジェスト動画像に含めるには有益でないシーンを排除することができる。 Furthermore, the scene information generation unit 102 may generate scene information so as to exclude scenes unsuitable for inclusion in the digest moving image in the process of generating scene information. For example, in the process of image signal analysis, scenes that are likely to not know what was captured even if you look at the image, such as when there is a sudden movement of the entire image or when there is no focus The information generation unit 102 does not generate scene information itself. Alternatively, the scene information generation unit 102 generates a digest inappropriate flag indicating that the scene is not suitable for inclusion in the digest moving image. Thus, for example, when starting to take a moving image with a digital camera, a smartphone, or the like, a scene that is not useful for inclusion in the digest moving image can be eliminated, such as when a large camera shake or focus shift occurs.
 シーン情報200において、209と211は、静止画像であるシーンに対応するシーン情報である。静止画像では、時間の要素がないため、シーンの時間範囲を示す情報である、シーン先頭フレーム番号およびシーン終端フレーム番号は存在しない。また、画像上の動きも音声もないため、動き情報も会話情報も存在しない。これらの存在しない情報に関しては、図2においては、記号*で表している。一方、人物情報については、シーン情報生成部102が、静止画像の画像信号から解析し、動画像と同様に「人物なし(0)」、「主要人物(1)」、「その他人物(2)」のいずれかの情報を与える。 In the scene information 200, 209 and 211 are scene information corresponding to a scene that is a still image. Since there is no time element in the still image, there is no scene start frame number and scene end frame number, which are information indicating the time range of the scene. In addition, since there is no movement or sound on the image, neither movement information nor conversation information exists. Such non-existing information is represented by a symbol * in FIG. On the other hand, with respect to the person information, the scene information generation unit 102 analyzes the image signal of the still image, and “no person (0)”, “main person (1)”, “other person (2)” like the moving image. Give any information.
 ここで、シーン情報に含まれる情報の内、シーン先頭フレーム番号およびシーン終端フレーム番号と、撮影時刻の関係について説明する。画像ファイルの撮影時刻は通常、画像をファイルとして記録する時刻、すなわち撮影完了時の時刻が記録される。シーン情報生成部102が、ある画像ファイルのシーン情報を生成する過程で、画像ファイルを複数のシーンに分割しない場合は、画像ファイルの撮影時刻がそのまま対応するシーンの撮影時刻に相当する。しかし、シーン情報生成部102が、一つの画像ファイルを複数シーンに分割した場合には、各シーンの撮影時刻に相当する時刻は、元の画像ファイルの撮影時刻が示す時刻とは一致しないことがある。従って、シーン情報生成部102は、一つの画像ファイルを複数シーンに分割した場合には、その画像ファイルに関して記録されている撮影時刻ShootTime、動画像のフレームレートfr_rate、フレーム数FALLと、分割した各シーンのシーン終端フレーム番号Sendとの関係から、各シーンの終端時刻に相当する時刻EndTime(EndTime=ShootTime-(FALL-Send)/fr_rate)を算出して、それをシーンごとの撮影時刻として、シーン情報を生成する。そのようにしておくことで、後でシーン情報を参照する際に、画像ファイル名やフレーム番号は参照せず、撮影年月日および撮影時刻のみを比較することで、各シーンの時間的な前後関係を特定することが可能になる。なお、シーン情報生成部102は、各シーンの画像信号または音声信号の解析の過程で各シーンの時間的な長さを取得し、取得したシーン長とそのシーンが含まれる画像データの撮影時刻との比較に基づいて、適切な撮影年月日や撮影時刻が記録されるように調整する。例えば、画像ファイルとしての撮影年月日が2014年1月1日、撮影時刻が午前0:01:00と記録されていても、その画像ファイルの動画像全体の長さが数分間にわたるような場合、その画像ファイルの先頭部分をシーンとして抜き出した場合は、抜き出したシーンの実際の撮影年月日は、画像ファイルの撮影年月日とは異なる2013年12月31日であり、撮影時刻は例えば23:59:00などとなる。このようにして、シーン単位で適切な撮影日時を算出してシーン情報として記録することにより、前述のイベント選択部104によって決定される編集対象の画像データ群に含まれるシーンが、適切に選択される。 Here, the relationship between the scene start frame number and the scene end frame number and the shooting time in the information included in the scene information will be described. The shooting time of the image file is normally recorded as the time when the image is recorded as a file, that is, the time when shooting is completed. When the scene information generation unit 102 does not divide the image file into a plurality of scenes in the process of generating scene information of a certain image file, the shooting time of the image file corresponds to the shooting time of the corresponding scene as it is. However, when the scene information generation unit 102 divides one image file into a plurality of scenes, the time corresponding to the shooting time of each scene may not match the time indicated by the shooting time of the original image file. is there. Therefore, when the scene information generation unit 102 divides one image file into a plurality of scenes, the shooting time ShootTime recorded for the image file, the frame rate fr_rate of the moving image, the number of frames FALL, and each divided The time EndTime (EndTime = ShootTime- (FALL-Send) / fr_rate) corresponding to the end time of each scene is calculated from the relationship with the scene end frame number Send of the scene, and this is used as the shooting time for each scene. Generate information. By doing so, when referring to the scene information later, the image file name and frame number are not referred to, and only the shooting date and time are compared to compare the scenes in time. It becomes possible to specify the relationship. The scene information generation unit 102 acquires the time length of each scene in the process of analyzing the image signal or audio signal of each scene, and the acquired scene length and the shooting time of the image data including the scene. Based on this comparison, adjustment is made so that an appropriate shooting date and shooting time are recorded. For example, even if the shooting date as an image file is recorded as January 1, 2014, and the shooting time is recorded as 0:00 am, the length of the entire moving image of the image file is several minutes. In this case, when the head part of the image file is extracted as a scene, the actual shooting date of the extracted scene is December 31, 2013 which is different from the shooting date of the image file, and the shooting time is For example, 23:55:00. In this way, by calculating an appropriate shooting date and time for each scene and recording it as scene information, a scene included in the image data group to be edited determined by the event selection unit 104 is appropriately selected. The
 なお、シーン情報として生成されるシーン先頭フレーム番号およびシーン終端フレーム番号は、画像ファイル内の時間的な位置を特定する別の情報で置き換えてもよい。例えば、シーン先頭フレーム番号およびシーン終端フレーム番号の代わりに、各シーンのシーン開始時刻を示す文字列と、シーン長に対応する時間情報(=シーン開始時刻からの経過時間)を示す文字列を生成してもよい。あるいは、シーン先頭フレーム番号およびシーン終端フレーム番号の代わりに、シーン先頭を示す画像ファイル内経過時間およびシーン終端を示す画像ファイル内経過時間を、シーン情報として生成してもよい。画像ファイル内経過時間は、例えば画像ファイルの先頭を基準とした秒単位やミリ秒単位もしくは秒単位+フレーム番号で表す。これら、シーンの時間的な位置を特定する情報は、前述のように文字列で表してもよいし、数値(例えば既定の日時や時刻を基準とした時の経過時間を表す数値)で表してもよい。また、動画像のフレームレートを表す情報を含んでもよい。 Note that the scene start frame number and the scene end frame number generated as the scene information may be replaced with other information specifying the temporal position in the image file. For example, instead of the scene start frame number and the scene end frame number, a character string indicating the scene start time of each scene and a character string indicating time information (= elapsed time from the scene start time) corresponding to the scene length are generated. May be. Alternatively, instead of the scene start frame number and the scene end frame number, an image file elapsed time indicating the scene start and an image file elapsed time indicating the scene end may be generated as the scene information. The elapsed time in the image file is expressed, for example, in seconds, milliseconds, or seconds + frame number with respect to the head of the image file. The information for specifying the temporal position of the scene may be represented by a character string as described above, or represented by a numerical value (for example, a numerical value representing an elapsed time with reference to a predetermined date and time or time). Also good. Further, information representing the frame rate of the moving image may be included.
 図2のシーン情報200において、人物情報、動き情報、会話情報に関しては、数値で表す例を示したが、各情報の特徴を意味する文字列で表してもよい。例えば、人物情報に関して、「人物なし(0)」を意味する“NO_HUMAN”、「主要人物(1)」を意味する“HERO”、「その他人物(2)」を意味する“OTHERS”などの文字列で表す。動き情報や会話情報も同様に文字列で表してもよい。 In the scene information 200 of FIG. 2, the example in which the person information, the motion information, and the conversation information are represented by numerical values is shown, but may be represented by a character string that represents the feature of each information. For example, with regard to the person information, characters such as “NO_HUMAN” meaning “no person (0)”, “HERO” meaning “main person (1)”, “OTHERS” meaning “other person (2)”, etc. Represented by a column. Similarly, movement information and conversation information may be represented by character strings.
 図2のシーン情報200では、シーン情報、人物情報、動き情報、会話情報を数値(既定の種類に対応するインデクス)で示す例を示した。このように数値インデクスで表す以外にも、既定の種類に対応する文字列で格納してもよいし、単一の数値または文字列ではなく、パラメータ(人数、動きベクトル、周波数毎の音量など)の集合として格納してもよい。さらには、データは可読性のあるテキストデータである必要はなく、バイナリデータでもよい。 2 shows an example in which scene information, person information, motion information, and conversation information are indicated by numerical values (indexes corresponding to predetermined types). As described above, in addition to the numerical index, it may be stored as a character string corresponding to a predetermined type, or instead of a single numerical value or character string, parameters (number of people, motion vector, volume at each frequency, etc.) May be stored as a set of Furthermore, the data need not be readable text data, and may be binary data.
  (ダイジェスト動画像生成部)
 次に、ダイジェスト動画像生成部103における処理内容の詳細を述べる。図3は、本実施例の映像編集装置によるダイジェスト動画像の生成過程を示す概念図である。図示の通り、ダイジェスト動画像生成部103は、画像データ群301のうち、選択された画像データ群302を対象として、対応するシーン情報303を読み込み、あらかじめ決定されたダイジェスト動画像生成方針305に従ってダイジェスト動画像を生成する。ダイジェスト動画像生成の対象となる画像データ群302は、例えば、ある一日に撮影された全画像データである。この画像データ群は、前述の通り、画像データ分類部101とイベント選択部104によって決定される。この場合、イベント選択部104は、対象の画像データ群を示す選択情報304として、「撮影年月日=○○年△△月□□日」を意味するパラメータをダイジェスト動画像生成部103に通知する。ダイジェスト動画像生成部103は、シーン情報生成部102によって生成されたシーン情報を先頭から参照し、前記選択情報に該当するシーンのシーン情報を読み込む。次にダイジェスト動画像生成部103は、読み込んだシーン情報を撮影年月日と撮影時刻の早い順番で参照し、単独で使用するシーンと、他のシーンと組み合わせて使うシーンなど、シーンの種類を決定する。そしてダイジェスト動画像生成部103は、決定したシーンの種類に基づいて、各シーンを空間的に配置した画像データである画像クリップ306a、306b、306c、…を生成すると共に、複数の画像クリップを時間的に結合してダイジェスト動画像307を生成する。図3において、S01、S02、S03等の表記はそれぞれシーンを表している。また、画像クリップ306a内の表記「S01+S02」は、画像クリップ306aはシーンS01とシーンS02の両方が空間的に配置された画像クリップであることを表す。画像クリップ306a、306b、306c等は、少なくとも一つのシーンを含み、適度な(例えば1秒間以上の)長さを有する静止画または動画像である。
(Digest video generator)
Next, details of processing contents in the digest moving image generation unit 103 will be described. FIG. 3 is a conceptual diagram illustrating a process of generating a digest moving image by the video editing apparatus according to the present embodiment. As shown in the figure, the digest moving image generation unit 103 reads the corresponding scene information 303 for the selected image data group 302 of the image data group 301, and performs the digest according to the predetermined digest moving image generation policy 305. Generate a moving image. A group of image data 302 for which a digest moving image is to be generated is, for example, all image data photographed on a certain day. This image data group is determined by the image data classification unit 101 and the event selection unit 104 as described above. In this case, the event selection unit 104 notifies the digest moving image generation unit 103 of a parameter indicating “shooting date = YY year ΔΔ month □□ day” as selection information 304 indicating the target image data group. To do. The digest moving image generation unit 103 refers to the scene information generated by the scene information generation unit 102 from the top, and reads the scene information of the scene corresponding to the selection information. Next, the digest moving image generation unit 103 refers to the read scene information in the order of shooting date and shooting time, and selects the type of scene, such as a scene used alone and a scene used in combination with other scenes. decide. Based on the determined scene type, the digest moving image generation unit 103 generates image clips 306a, 306b, 306c,... That are spatially arranged image data, and converts a plurality of image clips into time. Are combined to generate a digest video 307. In FIG. 3, notations such as S01, S02, and S03 each represent a scene. The notation “S01 + S02” in the image clip 306a indicates that the image clip 306a is an image clip in which both the scene S01 and the scene S02 are spatially arranged. The image clips 306a, 306b, 306c, and the like are still images or moving images that include at least one scene and have an appropriate length (for example, 1 second or longer).
 図7に、本実施形態におけるダイジェスト動画像生成部103の内部構成を示す。ダイジェスト動画像生成部103は、対象画像抽出部1031、シーン種類決定部1032、シーン空間配置部1033、シーン時間配置部1034、ダイジェスト化制御部1035を含んで構成される。 FIG. 7 shows an internal configuration of the digest moving image generating unit 103 in the present embodiment. The digest moving image generation unit 103 includes a target image extraction unit 1031, a scene type determination unit 1032, a scene space arrangement unit 1033, a scene time arrangement unit 1034, and a digest control unit 1035.
 対象画像抽出部1031は、イベント選択部104から通知される対象の画像データ群を示す選択情報を参照して、ダイジェスト動画像を生成する際の入力画像を抽出する。抽出した画像データを示す情報を、シーン種類決定部1032およびシーン空間配置部1033へ通知する。シーン種類決定部1032は、シーン情報生成部102で生成されたシーン情報を参照し、対象画像抽出部1031で抽出された画像データを示す情報に対応するシーンのシーン情報を読み込み、シーンの種類を決定する。 The target image extraction unit 1031 refers to the selection information indicating the target image data group notified from the event selection unit 104, and extracts an input image when generating a digest moving image. Information indicating the extracted image data is notified to the scene type determination unit 1032 and the scene space arrangement unit 1033. The scene type determination unit 1032 refers to the scene information generated by the scene information generation unit 102, reads the scene information of the scene corresponding to the information indicating the image data extracted by the target image extraction unit 1031, and determines the scene type. decide.
 図4に、シーン情報と、シーン種類決定部1032が決定するシーンの種類の関係の一例を示す。図4は、図2と同様にシーン情報の例を示し、シーン情報400に含まれる各行401、402、403…が、それぞれ一つのシーンに対応するシーン情報を記述したものである。これ以降、図4に関する説明においては、記述の簡単化のため、各シーン情報401、402、403…は、それぞれシーンそのものを意味するものとしても記述する。 FIG. 4 shows an example of the relationship between the scene information and the scene type determined by the scene type determination unit 1032. FIG. 4 shows an example of scene information in the same manner as FIG. 2, and each row 401, 402, 403... Included in the scene information 400 describes scene information corresponding to one scene. Hereinafter, in the description related to FIG. 4, the scene information 401, 402, 403... Is also described as meaning the scene itself in order to simplify the description.
 シーン種類決定部1032は、シーン情報400を参照して、撮影時刻の順に連続する2つのシーンの撮影時刻を比較し、両者の撮影時刻の差ΔTが所定の閾値(シーン近接判定閾値THt)以内であるか超えるか、すなわち時間的に近接するか否かによって、各シーンを、単独で使用する「単独シーン」か、組み合わせて使用する「組み合わせシーン」に決定する。シーン近接判定閾値THtは、例えば5分間(=300秒)とする。シーン401と402の撮影時刻の差ΔTは、ΔT=1分41秒=101秒<THtであるので、シーン401とシーン402は「組み合わせシーン」に決定する。同様に、シーン403とシーン404は、時間的に近接するため「組み合わせシーン」に決定し、シーン405とシーン406も、時間的に近接するため「組み合わせシーン」に決定する(図4では、点線で囲んだシーン情報のセットが組み合わせシーンを示す)。シーン種類決定部1032は、組み合わせシーンに決めた各シーンに関して、さらに次のように主シーンか副シーンに決定する。シーン種類決定部1032は、各シーンのシーン情報に含まれる人物情報、動き情報、会話情報を参照し、主要なシーンであると判定したら主シーンに、主要ではないシーンであると判定したら副シーンに、各シーンを分類する。例えば、図4の例では、シーン401とシーン402は、どちらも人物情報が「主要人物(1)」であるので、両方とも主要なシーンであると判定して主シーンに分類する。シーン403は人物情報が「主要人物(1)」であるので、主要なシーンであると判定して主シーンに分類する。シーン404は人物情報が「その他人物(2)」であるので、主要なシーンではないと判定して副シーンに分類する。シーン405とシーン406は、人物情報がそれぞれ「その他人物(2)」、「人物なし(0)」であり、相対的にシーン405がシーン406に比べて重要度が高く主要なシーンであると判定して、シーン405を主シーンに、シーン406を副シーンに分類する。 The scene type determination unit 1032 refers to the scene information 400, compares the shooting times of two consecutive scenes in the order of the shooting times, and the difference ΔT between the two shooting times is within a predetermined threshold (scene proximity determination threshold THt). Each scene is determined as a “single scene” to be used alone or a “combined scene” to be used in combination depending on whether or not they are close to each other. The scene proximity determination threshold THt is, for example, 5 minutes (= 300 seconds). Since the difference ΔT between the shooting times of the scenes 401 and 402 is ΔT = 1 minute 41 seconds = 101 seconds <THt, the scene 401 and the scene 402 are determined as “combined scenes”. Similarly, the scene 403 and the scene 404 are determined as “combined scenes” because they are close in time, and the scenes 405 and 406 are also determined as “combined scenes” because they are close in time (in FIG. A set of scene information enclosed in brackets indicates a combined scene). The scene type determination unit 1032 further determines a main scene or a sub-scene for each scene determined as a combination scene as follows. The scene type determination unit 1032 refers to the person information, motion information, and conversation information included in the scene information of each scene. If it is determined that the scene is a main scene, the scene type determination unit 1032 determines that the scene is a main scene. Each scene is classified. For example, in the example of FIG. 4, since the person information of both the scene 401 and the scene 402 is “main person (1)”, both are determined to be main scenes and classified as main scenes. Since the scene information of the scene 403 is “main person (1)”, the scene 403 is determined to be a main scene and classified as a main scene. Since the scene information of the scene 404 is “other person (2)”, the scene 404 is determined not to be a main scene and is classified as a sub-scene. In the scenes 405 and 406, the person information is “other person (2)” and “no person (0)”, respectively, and the scene 405 is relatively more important than the scene 406 and is a main scene. The scene 405 is classified as a main scene and the scene 406 is classified as a sub-scene.
 なお、シーン405とシーン406の例は、いずれのシーンも人物情報が「主要人物」以外であるため、両方とも主要なシーンではないと判定してもよい。その場合、より重要度の低いシーン(上記の例ではシーン406)をダイジェストに使用しない、と決定してもよい。そのように決定することによって、時間的に近接した複数の主要でないシーンの一部をダイジェスト動画像に使用しないようにすることができ、生成するダイジェスト動画像の冗長度を軽減することができる。 It should be noted that in the examples of the scene 405 and the scene 406, since the person information of any scene is other than “main person”, it may be determined that both are not main scenes. In that case, it may be determined that a less important scene (scene 406 in the above example) is not used for the digest. By making such a determination, it is possible to prevent a part of a plurality of non-major scenes close in time from being used for the digest moving image, and reduce the redundancy of the generated digest moving image.
 次に、シーン空間配置部1033は、各シーンの空間的な配置を決定し、シーンを空間的に配置した画像クリップを生成する。図5に、シーン空間配置部1033によるシーン配置の例を示す。シーン空間配置部1033は、前述のようにシーン種類決定部1032が決定したシーンの種類および組み合わせシーン同士のシーン情報の関係に基づいて、各シーンの空間的な配置(レイアウト)を決定する。例えば、図4の例で示したシーン401とシーン402は、両方とも主シーンであるので、互いに同等のサイズで並列に表示する配置である「並列配置」に決定する(図5(a))。この時、シーン401、402はいずれも人物情報が「主要人物(1)」であるため、各シーンの中央領域に人物が写っている可能性が高い。そこで、各シーンの中央領域を切り出して、それぞれ領域501、502に配置する。 Next, the scene space placement unit 1033 determines the spatial placement of each scene and generates an image clip in which the scene is spatially placed. FIG. 5 shows an example of scene arrangement by the scene space arrangement unit 1033. The scene space arrangement unit 1033 determines the spatial arrangement (layout) of each scene based on the scene type determined by the scene type determination unit 1032 and the relationship between the scene information of the combination scenes as described above. For example, since the scene 401 and the scene 402 shown in the example of FIG. 4 are both main scenes, they are determined to be “parallel arrangement”, which is an arrangement that displays them in parallel at the same size (FIG. 5A). . At this time, since the person information of each of the scenes 401 and 402 is “main person (1)”, there is a high possibility that a person appears in the central area of each scene. Therefore, the central area of each scene is cut out and placed in areas 501 and 502, respectively.
 次の例として、シーン403とシーン404は、それぞれ主シーンと副シーンであるので、主シーンが注目されるように、副シーンを画像フレーム全体に表示しながら、画面中央部分の領域503に主シーンの中央領域を重畳表示する配置である「中央配置」に決定する(図5(b))。主シーンの中央領域を重畳表示する理由は、主シーンであるシーン403の人物情報が「主要人物(1)」だからである。人物情報が「主要人物(1)」であるシーンは、画像フレーム内において比較的サイズの大きい1~2名の人物が写っていることを意味する。そのようなシーンは撮影者が意図して特定の人物を撮影した可能性が高く、従って、人物の写っている領域は画像フレームの中央部分である可能性が高い。そこで、人物の写っている領域が注目されるように、主シーン内で人物が写っている可能性の高い中央領域を切り出して、画面中央部分の領域503に配置する。なお、図5(b)において、主シーンの中央領域の代わりに、主シーンの画像フレーム全体を縮小して領域503に表示してもよい。また、「中央配置」の別の例として、図5(d)に示すような配置を選択してもよい。図5(d)の配置は、副シーンは図5(b)と同様に画像フレーム全体に表示し、主シーンに関しては、その中央領域を図5(b)よりも大きく切り出して領域507に配置したものである。図5(d)の配置は、図5(b)と比較すると、副シーンの表示面積が小さくなる。シーン空間配置部1033は、例えば副シーンの動き情報が「動き全体(2)」であるような場合に、この配置を選択する。動きの大きなシーンを主シーンの背景領域504に表示することで、画像全体として躍動感が演出されると共に、図5(b)と比較すると背景領域504の面積が小さいため、観察者が注目する主シーンの領域507の観賞を妨げない画像レイアウトで表示することができる。 In the following example, since the scene 403 and the scene 404 are the main scene and the sub scene, respectively, the main scene is displayed in the area 503 in the central portion of the screen while the sub scene is displayed on the entire image frame so that the main scene is noticed. The central area of the scene is determined to be “central arrangement”, which is an arrangement for superimposing and displaying (FIG. 5B). The reason why the central area of the main scene is displayed in a superimposed manner is that the person information of the scene 403 that is the main scene is “main person (1)”. A scene whose person information is “main person (1)” means that one or two persons having a relatively large size are captured in the image frame. In such a scene, the photographer intends to photograph a specific person, and therefore, the region where the person is photographed is highly likely to be the central portion of the image frame. Therefore, a central area where a person is highly likely to be captured in the main scene is cut out and placed in an area 503 at the center of the screen so that the area where the person is captured is noted. In FIG. 5B, instead of the central area of the main scene, the entire image frame of the main scene may be reduced and displayed in the area 503. As another example of “center arrangement”, an arrangement as shown in FIG. 5D may be selected. In the arrangement of FIG. 5D, the sub-scene is displayed in the entire image frame as in FIG. 5B, and the central area of the main scene is cut out larger than that in FIG. 5B and arranged in the area 507. It is a thing. In the arrangement of FIG. 5D, the display area of the sub-scene is smaller than that in FIG. The scene space arrangement unit 1033 selects this arrangement when, for example, the motion information of the sub-scene is “whole movement (2)”. By displaying a scene with large movement in the background area 504 of the main scene, a dynamic feeling is produced as an entire image, and the area of the background area 504 is smaller than that in FIG. It is possible to display an image layout that does not prevent viewing of the main scene area 507.
 もう一つの例として、図4に示したシーン405とシーン406は、それぞれ主シーンと副シーンである点はシーン403とシーン404の関係と同じであるが、主シーンであるシーン405の人物情報が「その他人物(2)」であるため、シーン405においては画像の中央領域など特定の領域が重要な意味を持つ可能性は低い。そこで、主シーンを画像フレーム全体に配置しながら、副シーンを縮小した画像を子画面領域506として主シーン上に重畳する配置である「子画面配置」に決定する(図5(c))。この時、子画面領域506のサイズは、前述の中央配置における主シーンの領域(503、507)に比べて小さくなるように決定する。その理由は、注目させたいシーンは基本的に主シーンであり、副シーンは特別目立たせないようにするためである。例えば、中央配置である図5(b)における主シーンを配置している領域503の大きさは、画像フレーム全体の1/4程度(図5(d)の領域507は、水平方向の画素数が画像フレーム全体の水平方向の画素数の1/2程度)に、子画面配置における副シーンを配置している領域506の大きさは画像フレーム全体の1/9程度にし、その大きさに合わせて元の画像から切り出し、もしくは元の画像を縮小する。このようにシーンを配置する領域の大きさに差をつけることで、注目させたい領域またはシーンを目立たせることができる。 As another example, the scene 405 and the scene 406 shown in FIG. 4 are the same as the scene 403 and the scene 404 in that the scene 405 and the scene 406 are the main scene and the sub scene, respectively. Is “other person (2)”, it is unlikely that a specific area such as the center area of the image has an important meaning in the scene 405. Therefore, while the main scene is arranged in the entire image frame, an image obtained by reducing the sub-scene is determined as the “sub-screen arrangement”, which is an arrangement for superimposing the main scene on the main scene (FIG. 5C). At this time, the size of the small-screen area 506 is determined to be smaller than the main scene area (503, 507) in the above-described central arrangement. The reason is that the scene to be noticed is basically the main scene and the sub-scene is not particularly noticeable. For example, the size of the area 503 where the main scene in FIG. 5B, which is the central arrangement, is about 1/4 of the entire image frame (the area 507 in FIG. 5D is the number of pixels in the horizontal direction). Is about 1/2 of the total number of pixels in the horizontal direction of the entire image frame), and the size of the area 506 in which the sub-scene is arranged in the sub-screen arrangement is set to about 1/9 of the entire image frame. Cut out from the original image or reduce the original image. Thus, by making a difference in the size of the area where the scene is arranged, it is possible to make the area or scene to be noticed stand out.
 なお、図5(c)に示した「子画面配置」の別の例を図5(e)に示す。図5(e)の例は、領域505に主シーンを配置するのは図5(c)と同様であるが、副シーンを配置する領域508を、図5(c)の領域506とは異なる空間的な位置に変えたものである。図5(c)、図5(e)の子画面配置は、いずれも主シーンへの注目を妨げないような領域に副シーンを配置することを特徴としている。図5(c)の配置ではなく図5(e)の配置に決定する場合の例としては、シーン情報生成部102におけるシーン解析の過程で、主シーンを配置している領域505上の領域506に、人物または人物の一部が写っていることが判明したような場合である。そのような場合には、副シーンを重畳する領域を領域506ではなく領域508に変えることで、主シーンを配置している領域505上の人物領域を副シーンで隠してしまわないようにする。このようにシーンの配置を変えることにより、主シーンに写っている主要な画像領域への注目を妨げないようにすることができる。 Note that another example of the “small screen arrangement” shown in FIG. 5C is shown in FIG. In the example of FIG. 5E, the main scene is arranged in the area 505 in the same manner as in FIG. 5C, but the area 508 in which the sub-scene is arranged is different from the area 506 in FIG. 5C. It was changed to a spatial position. 5 (c) and FIG. 5 (e) is characterized in that the sub-scene is arranged in an area that does not hinder attention to the main scene. As an example of the case where the arrangement shown in FIG. 5E is determined instead of the arrangement shown in FIG. 5C, an area 506 on the area 505 where the main scene is arranged in the process of scene analysis in the scene information generation unit 102. In other cases, it is found that a person or a part of the person is reflected. In such a case, the area where the sub-scene is superimposed is changed to the area 508 instead of the area 506 so that the person area on the area 505 where the main scene is arranged is not hidden in the sub-scene. By changing the arrangement of the scenes in this way, attention to the main image areas shown in the main scene can be prevented from being disturbed.
 さらに、一部のシーンに空間フィルタを適用し、主シーンと副シーンの違いを強調した画像にしてもよい。例えば図5(b)、(d)の領域504に平滑化フィルタを施すことによって画像の鮮鋭度を落とすと、主シーンを表示した中央領域と副シーンを表示した周辺領域との違いが一目で分かるようになり、注目すべき領域がより明確になる。このような空間フィルタを適用するか否かは、例えば、主シーンと副シーンの画像の類似性に基づいて決定する。例えば、シーン空間配置部1033は、主シーンと副シーンの類似性が高い場合には、副シーンに平滑化フィルタを施し、類似性が低い場合には、副シーンに平滑化フィルタを施さない。例えば、図5(b)、(d)のような中央配置において、主シーンの領域503または507と副シーンの領域504それぞれの画像内の画素値の色成分ごとの平均値を比較し、平均値の差分が所定値より小さい場合、すなわち領域503、507と領域504の間で画素値の類似度が高い場合に、領域504に空間フィルタを施すと決定する。これによって、空間フィルタを施さない場合と比較して、領域503や507すなわち主シーンに注目しやすくなり、全体として見やすい画像にすることができる。なお、空間フィルタは平滑化フィルタに限らず、領域ごとの色調を変える色変換フィルタでもよい。例えば、シーン空間配置部1033は副シーンをグレイスケールやいわゆるセピア調に変換しても良い。領域504の画像を色変換によってグレイスケールやいわゆるセピア調にすると、主シーンである領域503、507を目立ちやすくすることができる。あるいは、シーン空間配置部1033は、空間フィルタを施すのではなく、領域504の画像の時間方向の変化をゼロにする、すなわち静止画にすることで、主シーンである領域503、507との違いを強調してもよい。 Furthermore, a spatial filter may be applied to some scenes to make an image that emphasizes the difference between the main scene and the sub-scene. For example, if the sharpness of the image is reduced by applying a smoothing filter to the region 504 in FIGS. 5B and 5D, the difference between the central region displaying the main scene and the peripheral region displaying the sub-scene is at a glance. You will be able to understand and the areas of interest will become clearer. Whether or not to apply such a spatial filter is determined based on, for example, the similarity between the images of the main scene and the sub-scene. For example, the scene space arrangement unit 1033 applies a smoothing filter to the sub scene when the similarity between the main scene and the sub scene is high, and does not apply the smoothing filter to the sub scene when the similarity is low. For example, in the central arrangement as shown in FIGS. 5B and 5D, the average values for the color components of the pixel values in the images of the main scene region 503 or 507 and the sub-scene region 504 are compared and averaged. When the difference between the values is smaller than the predetermined value, that is, when the similarity between the pixel values between the regions 503 and 507 and the region 504 is high, it is determined to apply the spatial filter to the region 504. This makes it easier to focus on the areas 503 and 507, that is, the main scene, compared to the case where no spatial filter is applied, and the image can be easily viewed as a whole. The spatial filter is not limited to the smoothing filter, and may be a color conversion filter that changes the color tone for each region. For example, the scene space arrangement unit 1033 may convert the sub-scene into gray scale or so-called sepia tone. If the image of the area 504 is converted to a gray scale or so-called sepia tone by color conversion, the areas 503 and 507 which are main scenes can be easily noticeable. Alternatively, the scene space arrangement unit 1033 does not apply a spatial filter, but makes the change in the time direction of the image in the region 504 zero, that is, makes a still image, so that the difference from the regions 503 and 507 that are the main scenes is different. May be emphasized.
 なお、一つの画面上に配置するシーンは2つより多くてもよい。図5(f)に、3つのシーンを配置する例を示す。図5(f)に示した例は、時間的に近接した3つのシーンの人物情報がいずれも「主要人物(1)」である場合の配置例である。この場合、シーン空間配置部1033は、3つのシーンが互いに組み合わせシーンであると決定するとともに、すべて主シーンに決定する。3つが主シーンであるため、互いに同等のサイズになるように、各シーンの中央領域を切り出して、領域509、510、511に並列に配置する。複数のシーンを同一画像クリップ内に配置する際に、シーンの時間的な長さが同一でない場合には、同一画像クリップ内に配置するシーン間で最も時間の短いシーンに合わせるように、他のシーンの一部を切り捨てて調整する。 Note that more than two scenes may be arranged on one screen. FIG. 5F shows an example in which three scenes are arranged. The example shown in FIG. 5F is an arrangement example in the case where all of the person information of three scenes close in time are “main person (1)”. In this case, the scene space arrangement unit 1033 determines that the three scenes are combination scenes with each other, and determines all of them as main scenes. Since three are main scenes, the central area of each scene is cut out and arranged in parallel in areas 509, 510, and 511 so that they have the same size. When multiple scenes are placed in the same image clip, if the scenes are not the same length in time, other scenes can be matched to the scene with the shortest time between scenes placed in the same image clip. Cut off part of the scene and adjust.
 シーン空間配置部1033は、以上のような方法で生成した画像クリップを、シーン時間配置部1034へ出力する。 The scene space arrangement unit 1033 outputs the image clip generated by the above method to the scene time arrangement unit 1034.
 シーン時間配置部1034は、前述のようにシーンを空間的に配置した画像クリップ同士を、さらに時間方向に結合する。図3において、画像クリップ306a、306b、306c…はそれぞれ、単独シーンのみで構成される画像クリップ、もしくは組み合わせシーンを配置した画像クリップに相当する。シーン時間配置部1034は、各画像クリップに対応するシーンの撮影時刻の前後関係に従って、複数の画像クリップを結合する。組み合わせシーンから構成される画像クリップ、すなわち一つの画像クリップが複数のシーンを含む場合、その画像クリップの撮影時刻は、その画像クリップに含まれる複数シーンの中で撮影時刻が最も遅いシーンのシーン情報である撮影時刻情報とみなす。 The scene time arrangement unit 1034 further combines the image clips in which the scenes are arranged spatially as described above in the time direction. 3, image clips 306a, 306b, 306c,... Each correspond to an image clip composed of only a single scene or an image clip in which a combination scene is arranged. The scene time arrangement unit 1034 combines a plurality of image clips according to the context of the shooting time of the scene corresponding to each image clip. When an image clip is composed of combination scenes, that is, when one image clip contains multiple scenes, the shooting time of that image clip is the scene information of the scene with the latest shooting time among the multiple scenes included in the image clip. It is regarded as shooting time information.
 上記で説明した組み合わせシーンは、互いに撮影時刻の差が相対的に小さい、すなわち、イベント全体の長さに比べて、撮影時刻が近接しているシーンである。撮影時刻が近接しているシーンは、同一または互いに類似の場面を撮影したものである可能性が高い。ダイジェスト動画像の生成において、類似性の高いシーンを時間的に連続するように結合すると、生成されるダイジェスト動画像は、似たようなシーンが続いて冗長になり、観賞する際に飽きやすくなる。そこで、類似性の高いシーンを空間的に並列に並べたり、同一フレームの一部に含めたりすることで、撮影された多数の画像を有効に活用すると共に、表示レイアウトを多様化させることができる。これにより、飽きにくいダイジェスト動画像を生成することができ、ユーザの満足度を高めることができる。 The combination scene described above is a scene in which the difference in shooting time is relatively small, that is, the shooting time is close compared to the length of the entire event. There is a high possibility that scenes whose shooting times are close to each other are the same or similar scenes. In the generation of digest moving images, if scenes with high similarity are combined so as to be continuous in time, the generated digest moving images will become redundant when the similar scenes continue to be easily bored. . Therefore, by arranging scenes with high similarity in parallel in parallel or including them in a part of the same frame, it is possible to effectively use a large number of captured images and diversify the display layout. . Thereby, it is possible to generate a digest moving image that is difficult to get tired of, and to improve the satisfaction of the user.
 ここで、ダイジェスト動画像を生成する際の、音声トラックの扱いについて説明する。ダイジェスト動画像を生成する際の音声トラックは、ダイジェスト動画像に使用する各シーンに対応する画像データに含まれる音声トラックをそのまま利用する。その際、使用するシーンが単独シーンの場合は、そのシーンの音声トラックをそのまま利用するが、組み合わせシーンの場合、音声トラックは複数あるため、次に述べる方法で使用する音声トラックを決定する。組み合わせシーンの配置が「並列配置」以外の場合、すなわち「中央配置」や「子画面配置」の場合、ダイジェスト動画像の音声トラックとして、主シーンの音声トラックを利用する。組み合わせシーンの配置が「並列配置」の場合、配置したシーンの位置関係に合わせて、各シーンの音声トラックをダイジェスト動画像の音声トラックの左チャネルと右チャネルに割り振るように利用する。このようにすることで、画像として注目するシーンと聞こえる音声とが一致し、違和感なくダイジェスト動画像を観賞することができる。 Here, how to handle an audio track when generating a digest video will be described. As an audio track for generating a digest moving image, an audio track included in image data corresponding to each scene used for the digest moving image is used as it is. At that time, when the scene to be used is a single scene, the audio track of the scene is used as it is. However, in the case of a combination scene, since there are a plurality of audio tracks, the audio track to be used is determined by the method described below. When the combination scene arrangement is other than “parallel arrangement”, that is, “center arrangement” or “sub-screen arrangement”, the audio track of the main scene is used as the audio track of the digest moving image. When the arrangement of the combination scene is “parallel arrangement”, the audio track of each scene is used so as to be allocated to the left channel and the right channel of the audio track of the digest moving image in accordance with the positional relationship of the arranged scenes. By doing in this way, the scene to be noticed as an image matches the sound that can be heard, and the digest moving image can be viewed without a sense of incongruity.
  (ダイジェスト動画像生成方法の切り替え)
 続いて、映像編集装置100がダイジェスト動画像を生成する際の、生成方法の切り替えについて説明する。ダイジェスト化制御部1035は、出力制御部105が決定した、ダイジェスト動画像の生成方針に従って、ダイジェスト動画像の生成方法(生成アルゴリズム)を変化させる。具体的には、あるシーンをダイジェストに含めるかどうか、主シーンと副シーンの判定基準、複数シーンの空間的な配置の有無と配置パターン、画像符号化品質、音声符号化品質などを切り替えて、ダイジェスト動画像を生成する。ダイジェスト動画像の生成方法の変化について、以下に詳しく説明する。
(Switching digest video generation method)
Next, switching of the generation method when the video editing apparatus 100 generates a digest moving image will be described. The digest control unit 1035 changes the digest moving image generation method (generation algorithm) according to the digest moving image generation policy determined by the output control unit 105. Specifically, whether or not to include a scene in the digest, switching judgment criteria of the main scene and sub-scene, presence / absence and arrangement pattern of multiple scenes, image coding quality, audio coding quality, etc. A digest video is generated. Changes in the digest moving image generation method will be described in detail below.
 ダイジェスト動画像生成部103において、シーン種類決定部1032は、ダイジェスト動画像の生成対象である画像データ群に含まれる各シーンが主要なシーンであるかどうかを決定する。この時シーン種類決定部1032は、ダイジェスト動画像の生成方針に含まれるシーン選択基準に基づいて上記決定を行っても良い。例えば、前述の説明は、シーン選択基準が「人物主体」であることを示す場合に用い、シーン選択基準がこれと異なる場合は、ダイジェスト化制御部1035が、主要なシーンを判定する際の基準を変え、判定基準を示す情報をシーン種類決定部1032へ通知し、シーン種類決定部1032がその情報に従ってシーンの種類を決定する。例えば「風景主体」の場合は、人物の姿や会話を捉えたシーン以外のシーン、すなわち自然など景色が主体のシーンを主要なシーンとして判定する。例えば、時間的に近接する組み合わせシーンの中では、人物情報が「人物なし」であるか、あるいは会話情報が「会話あり」以外であるようなシーンを主シーンに分類し、それ以外の組み合わせシーンを副シーンに分類する。また、時間的に近接するシーンのない単独シーンに関しては、人物情報が「人物なし」のシーンのみ選択し、それ以外のシーン、すなわち人物が写っているシーンは、単独シーンとしてはダイジェスト動画像に使用しないようにシーンを選択する。このような構成により、指定された特徴に合うシーンを優先的に選択し、ユーザの好みを反映したダイジェスト動画像を生成することを可能にする。 In the digest moving image generation unit 103, the scene type determination unit 1032 determines whether or not each scene included in the image data group that is the generation target of the digest moving image is a main scene. At this time, the scene type determination unit 1032 may make the above determination based on a scene selection criterion included in the digest moving image generation policy. For example, the above description is used to indicate that the scene selection criterion is “human subject”, and when the scene selection criterion is different from this, the digest control unit 1035 determines the criterion for determining the main scene. The information indicating the determination criterion is notified to the scene type determination unit 1032, and the scene type determination unit 1032 determines the type of the scene according to the information. For example, in the case of “landscape-based”, a scene other than a scene capturing a person's figure or conversation, that is, a scene mainly including scenery such as nature is determined as a main scene. For example, in combination scenes that are close in time, scenes whose person information is “no person” or whose conversation information is other than “conversation” are classified as main scenes, and other combination scenes Are classified into sub-scenes. For single scenes that do not have temporally close scenes, select only scenes with the person information “No People”, and other scenes, that is, scenes with people in them, are converted into digest video images as single scenes. Select a scene not to use. With such a configuration, it is possible to preferentially select a scene that matches a specified feature and generate a digest moving image that reflects user preferences.
 ダイジェスト化制御部1035は、ダイジェスト動画像の生成方針に含まれる複数シーン同時配置に基づいて、複数のシーンが時間的に近接する場合に、同一の画像フレーム内に配置するか否かを切り替えても良い。ダイジェスト化制御部1035は、複数のシーンを同一の画像フレーム内に配置するか否かを決定し、シーン種類決定部1032およびシーン空間配置部1033へ通知する。ダイジェスト化制御部1035が通知する複数シーン同時配置が「可」の場合は、シーン空間配置部1033は前述の説明の通り、時間的に近接する複数のシーンを組み合わせシーンとして扱い、同一の画像フレーム内に配置するようにダイジェスト動画像を生成する。逆に、複数シーン同時配置が「否」の場合、シーン空間配置部1033は、各シーンを単独シーンとして扱い、同一画像フレーム内に配置しないようにダイジェスト動画像を生成する。出力制御部105に関して既に説明した通り、出力先の表示デバイスの画面が小さい場合には、出力制御部105は複数シーン同時配置を「否」とするため、例えば、シーンが縮小されるような子画面配置のレイアウトが選択されることを回避し、生成されるダイジェストの見やすさを損なわないようにすることができる。 Based on the multiple scene simultaneous arrangement included in the digest moving image generation policy, the digest control unit 1035 switches whether or not to arrange in the same image frame when a plurality of scenes are close in time. Also good. The digest control unit 1035 determines whether or not to arrange a plurality of scenes in the same image frame, and notifies the scene type determination unit 1032 and the scene space arrangement unit 1033. When the multiple scene simultaneous arrangement notified by the digest control unit 1035 is “possible”, the scene space arrangement unit 1033 treats a plurality of scenes close in time as a combined scene as described above, and uses the same image frame. A digest moving image is generated so as to be placed inside. On the other hand, when the multiple scene simultaneous arrangement is “NO”, the scene space arrangement unit 1033 treats each scene as a single scene and generates a digest moving image so as not to arrange the scenes in the same image frame. As already described with respect to the output control unit 105, when the screen of the display device of the output destination is small, the output control unit 105 sets “simultaneous placement of multiple scenes”, for example, a child whose scene is reduced. It is possible to avoid selecting the layout of the screen layout and not to impair the visibility of the generated digest.
 ダイジェスト化制御部1035は、ダイジェスト動画像の生成方針に含まれる出力先情報に基づいて、画像や音声を符号化するか否かを決定する。符号化しない場合は、生成したダイジェスト動画像をそのまま表示・再生するように、内蔵の映像表示部や外部接続の映像表示装置へ出力する。符号化する場合は、ダイジェスト動画を生成する際に、所定の符号化方式に従って画像や音声を符号化し、符号化データとしてダイジェスト動画像を出力する。符号化する際の符号化方式は例えば、画像はMPEG-2、AVC/H.264、HEVC/H.265などの方式に従い、音声はMPEG-1、AAC-LC、HE-AACなどの方式に従う。ダイジェスト化制御部1035は、符号化する際の基本符号化方式として最も性能の高い方法、例えば、画像はHEVC/H.265、音声はHE-AACを使用することにしておき、後述する出力画像仕様および出力音声仕様に基づいて、実際に使用する符号化方式および符号化品質を決定する。符号化方式および符号化品質については後述する。 The digest control unit 1035 determines whether to encode an image or sound based on output destination information included in the digest moving image generation policy. When not encoded, the generated digest moving image is output to a built-in video display unit or an externally connected video display device so as to be displayed and reproduced as it is. In the case of encoding, when a digest video is generated, an image or sound is encoded according to a predetermined encoding method, and a digest moving image is output as encoded data. The encoding method used for encoding is, for example, MPEG-2, AVC / H. H.264, HEVC / H. In accordance with a method such as H.265, the sound follows a method such as MPEG-1, AAC-LC, HE-AAC. The digest control unit 1035 is a method having the highest performance as a basic encoding method for encoding, for example, an image is HEVC / H. 265. It is assumed that HE-AAC is used for audio, and the encoding method and encoding quality to be actually used are determined based on output image specifications and output audio specifications described later. The encoding method and encoding quality will be described later.
 ダイジェスト化制御部1035は、ダイジェスト動画像の生成方針に含まれる出力画像仕様に基づいて、生成するダイジェスト動画像の画像符号化品質および複数シーンの配置パターンを決定する。出力画像仕様は、少なくとも出力先の映像表示デバイスの表示画素数を示す情報を含んで構成される。表示画素数は、水平方向の画素数と垂直方向の画素数から構成され、結果として表示デバイスの画面アスペクト比も判明する。ダイジェスト化制御部1035は、編集対象である入力画像の画素数および画面アスペクト比が、出力先の表示画素数および画面アスペクト比と一致する場合は、入力画像の画素数をそのまま維持するようにダイジェスト動画像を生成する。入力画像の画素数および画面アスペクト比が、出力先の表示画素数および画面アスペクト比と一致しない場合は、出力先の表示画素数を超えない範囲で、入力画像の画素数を維持または活かすように、シーンの配置を決定してダイジェスト動画像を生成する。入力画像の画素数および画面アスペクト比が、出力先の表示画素数および画面アスペクト比と一致しない場合の、シーンの配置に関しては後述する。前述の出力先情報に基づいて、ダイジェスト動画像を符号化してファイルとして記録・伝送する場合、ダイジェスト化制御部1035は、出力先の表示画素数に基づいて画像の符号化レートを決定する。例えば、画素数を示す複数の情報と、対応する符号化レートを示す情報テーブルを用意しておき、ダイジェスト化制御部1035は、出力先の表示画素数に対応する符号化レートを、前記情報テーブルを参照して決定する。また、出力画像仕様として、画素数以外に出力先の画像再生能力が得られる場合、ダイジェスト化制御部1035は、画像再生能力に応じて、ダイジェスト動画像を符号化する際の符号化方式を決定する。例えば、外部に接続された映像表示装置が対応する画像符号化方式がMPEG-2とAVC/H.264である場合、ダイジェスト化制御部1035は、より符号化性能の高いHEVC/H.265を選択する代わりに、出力画像仕様で示される画像再生能力に合わせて、AVC/H.264を符号化方式として選択し、ダイジェスト動画像の符号化を行う。 The digest control unit 1035 determines the image coding quality of the generated digest moving image and the arrangement pattern of a plurality of scenes based on the output image specifications included in the digest moving image generation policy. The output image specification includes at least information indicating the number of display pixels of an output destination video display device. The number of display pixels is composed of the number of pixels in the horizontal direction and the number of pixels in the vertical direction. As a result, the screen aspect ratio of the display device is also found. When the number of pixels and the screen aspect ratio of the input image to be edited match the number of display pixels and the screen aspect ratio of the output destination, the digest control unit 1035 performs a digest so as to maintain the number of pixels of the input image as it is. Generate a moving image. If the number of pixels of the input image and the screen aspect ratio do not match the number of display pixels of the output destination and the screen aspect ratio, the number of pixels of the input image should be maintained or utilized within the range not exceeding the number of display pixels of the output destination. Then, the arrangement of the scene is determined and a digest moving image is generated. The arrangement of the scene when the number of pixels of the input image and the screen aspect ratio do not match the number of display pixels of the output destination and the screen aspect ratio will be described later. When the digest moving image is encoded and recorded / transmitted as a file based on the output destination information, the digest control unit 1035 determines the image encoding rate based on the number of display pixels of the output destination. For example, a plurality of information indicating the number of pixels and an information table indicating the corresponding encoding rate are prepared, and the digest control unit 1035 displays the encoding rate corresponding to the display pixel number of the output destination in the information table. To determine. In addition, when the output image specifications include the output image reproduction capability in addition to the number of pixels, the digest control unit 1035 determines an encoding method for encoding the digest moving image according to the image reproduction capability. To do. For example, an image encoding method supported by an externally connected video display apparatus is MPEG-2 and AVC / H. In the case of H.264, the digesting control unit 1035 performs HEVC / H. Instead of selecting H.265, AVC / H.264 is set in accordance with the image reproduction capability indicated by the output image specification. H.264 is selected as the encoding method, and the digest moving image is encoded.
 ダイジェスト化制御部1035は、ダイジェスト動画像の生成方針に含まれる出力音声仕様に基づいて、生成するダイジェスト動画像の音声符号化品質および音声トラックの構成を決定する。出力音声仕様は、少なくとも出力先の音声出力デバイスすなわち、映像編集装置100に内蔵された音声出力部もしくは外部に接続された映像表示装置の音声出力部の音声再生能力を示す情報である、出力音声チャネル数、サンプリング周波数、量子化ビット数などを含んで構成される。ダイジェスト化制御部1035は、出力音声仕様の出力音声チャネル数に応じて、ダイジェスト動画像に含めるシーンの音声トラックの使用有無やチャネルの振り分けを決定する。また、出力音声仕様のサンプリング周波数、量子化ビット数に応じて、音声のリサンプリングやビット数変換を行う。また、前述の出力先情報に基づいて、ダイジェスト動画像を符号化してファイルとして記録・伝送する場合で、出力音声仕様として、出力先の音声出力デバイスが対応する符号化方式を示す情報が得られる場合、ダイジェスト化制御部1035は、その情報に基づいて音声の符号化方式を決定する。例えば、外部に接続された映像表示装置が対応する音声符号化方式がMPEG-1、AAC-LCのみである場合、ダイジェスト化制御部1035は、より符号化性能の高いHE-AACを選択する代わりにAAC-LCを符号化方式として選択して、ダイジェスト動画像の音声トラックの符号化を行う。 The digest control unit 1035 determines the audio coding quality of the digest video to be generated and the configuration of the audio track based on the output audio specifications included in the digest video generation policy. The output audio specification is information indicating an audio reproduction capability of at least an output destination audio output device, that is, an audio output unit built in the video editing apparatus 100 or an audio output unit of an externally connected video display apparatus. It is configured including the number of channels, sampling frequency, number of quantization bits, and the like. The digest control unit 1035 determines whether or not to use the audio track of the scene to be included in the digest moving image and channel allocation according to the number of output audio channels in the output audio specification. Also, audio resampling and bit number conversion are performed in accordance with the sampling frequency and quantization bit number of the output audio specification. Further, when the digest moving image is encoded and recorded and transmitted as a file based on the output destination information described above, information indicating the encoding method supported by the output destination audio output device can be obtained as the output audio specification. In this case, the digest control unit 1035 determines a speech encoding method based on the information. For example, if the audio encoding method supported by the externally connected video display apparatus is only MPEG-1 or AAC-LC, the digest control unit 1035 may select HE-AAC with higher encoding performance. AAC-LC is selected as the encoding method, and the audio track of the digest moving image is encoded.
 ここで、入力画像の画素数および画面アスペクト比が、前述の出力画像仕様で示される出力先の表示画素数および画面アスペクト比と一致しない場合の、シーン配置の例を説明する。図6に、入力画像の画面アスペクト比が横長であり、出力先の画面アスペクト比が縦長である時の、ダイジェスト動画像生成部103が決定する複数シーンの配置例を示す。図6(a)は、図5(a)の「並列配置」の例と同様に、時間的に近接する2つのシーンが両方とも主シーンである例で、互いに同等のサイズで並列に表示する配置である。その際、元の画像は、領域602と領域602’を含む横長のサイズの画像と、領域603と領域603’を含む横長のサイズの画像であるため、表示領域601の画面アスペクト比に合わせて、各シーンの中央の領域(602、603)を切り出してそれぞれ配置する。図6(b)は、図5(b)の「中央配置」の例と同様に、副シーンを画像フレーム(表示領域)全体601に配置しながら、画面中央部分の領域604に主シーンの中央領域を重畳するように配置する例である。主シーンの中央領域の代わりに、主シーンの画像フレーム全体を縮小して領域604に表示してもよい。また、「中央配置」の別の例として、図6(d)に示すような配置を選択してもよい。図6(d)の配置は、副シーンは図6(b)と同様に画像フレーム全体601に表示し、主シーンに関しては、その中央領域を図6(b)よりも大きく切り出して領域608に配置したものである。図中の608’は、領域608を切り出したことによって捨てられる元の画像の一部の領域である。図6(c)は、図5(c)の「子画面配置」と同様に、副シーンを縮小した画像を子画面領域606として配置する例である。ただし、この場合は入力画像が横長であるため、主シーンの全体を表示するように領域604に配置して、画面上に空いた領域の一部に副シーンを配置する。この時、子画面領域606のサイズは、主シーンの領域604に比べると小さくなるように決定することで、注目させたいシーンである主シーンとの区別がつくようにする。例えば、主シーンの領域604は、水平方向のサイズが画像フレーム全体601の水平方向のサイズと同一になるように決定し、副シーンの領域606は、水平方向のサイズが画像フレーム全体601の水平方向のサイズの2/3程度になるように決定する。 Here, an example of a scene arrangement in the case where the number of pixels of the input image and the screen aspect ratio do not match the number of display pixels and the screen aspect ratio of the output destination indicated in the above-described output image specification will be described. FIG. 6 shows an arrangement example of a plurality of scenes determined by the digest moving image generation unit 103 when the screen aspect ratio of the input image is horizontally long and the screen aspect ratio of the output destination is vertically long. FIG. 6A is an example in which two scenes that are close in time are both main scenes, as in the “parallel arrangement” example of FIG. 5A, and are displayed in parallel at the same size. Arrangement. At that time, since the original image is a horizontally long image including the region 602 and the region 602 ′ and a horizontally long image including the region 603 and the region 603 ′, the original image is matched with the screen aspect ratio of the display region 601. The central area (602, 603) of each scene is cut out and arranged. 6B, as in the “center arrangement” example of FIG. 5B, the sub-scene is arranged in the entire image frame (display area) 601, and the center of the main scene is displayed in the area 604 in the central portion of the screen. It is an example arrange | positioned so that an area | region may overlap. Instead of the central area of the main scene, the entire image frame of the main scene may be reduced and displayed in the area 604. Further, as another example of “center arrangement”, an arrangement as shown in FIG. 6D may be selected. In the arrangement of FIG. 6D, the sub-scene is displayed on the entire image frame 601 similarly to FIG. 6B, and the central area of the main scene is cut out larger than FIG. It is arranged. Reference numeral 608 ′ in the figure denotes a partial area of the original image that is discarded by cutting out the area 608. FIG. 6C shows an example in which an image obtained by reducing the sub-scene is arranged as a sub-screen area 606 as in the “sub-screen arrangement” of FIG. However, in this case, since the input image is horizontally long, it is arranged in the area 604 so as to display the entire main scene, and the sub-scene is arranged in a part of the vacant area on the screen. At this time, the size of the small-screen area 606 is determined so as to be smaller than the main scene area 604 so that it can be distinguished from the main scene that is the scene to be noticed. For example, the main scene area 604 is determined so that the horizontal size is the same as the horizontal size of the entire image frame 601, and the sub-scene area 606 is horizontal in the horizontal direction of the entire image frame 601. It is determined to be about 2/3 of the size of the direction.
 さらに、一部のシーンに空間フィルタを適用し、主シーンと副シーンの違いを強調した画像にしてもよい。例えば図6(b)、(d)の領域605に平滑化フィルタを施すことによって画像の鮮鋭度を落とすと、主シーンを表示した中央領域と副シーンを表示した周辺領域との違いが一目で分かるようになり、注目すべき領域がより明確になる。なお、図6(c)における領域607には、空間フィルタを適用した主シーンもしくは副シーンを表示してもよい。領域607を含めて画像フレーム全体601に画像を表示することによって、表示される画像のサイズを図6(c)以外の配置パターンの画像と同一にするため、領域607に画像を表示しない場合と比べて、画像を観賞する際の空間的な広がり感が得られると共に、時間方向に結合される可能性のある他の配置の画像クリップと、連続して観賞する際に生じ得る違和感を回避することができる。 Furthermore, a spatial filter may be applied to some scenes to make an image that emphasizes the difference between the main scene and the sub-scene. For example, if the sharpness of the image is reduced by applying a smoothing filter to the region 605 in FIGS. 6B and 6D, the difference between the central region displaying the main scene and the peripheral region displaying the sub-scene is at a glance. You will be able to understand and the areas of interest will become clearer. In addition, in a region 607 in FIG. 6C, a main scene or a sub scene to which a spatial filter is applied may be displayed. By displaying the image on the entire image frame 601 including the region 607, the size of the displayed image is made the same as the image of the arrangement pattern other than that in FIG. 6C, so that no image is displayed in the region 607. Compared with other arrangements of image clips that may be combined in the time direction, it is possible to avoid the uncomfortable feeling that may occur when continuously viewing images. be able to.
 なお、一つの画面上に配置するシーンは2つより多くてもよい。図6(e)に、3つのシーンを配置する例を示す。図6(e)に示した例は、図5(f)と同様に、時間的に近接した3つの主シーンの配置例である。各シーンは、その中央領域を含むように、それぞれ領域609、610、611に並列に配置される。 Note that more than two scenes may be arranged on one screen. FIG. 6E shows an example in which three scenes are arranged. The example shown in FIG. 6 (e) is an arrangement example of three main scenes that are close in time as in FIG. 5 (f). Each scene is arranged in parallel in the areas 609, 610, and 611 so as to include the central area.
 ここで、画面アスペクト比が縦長である場合の、単独シーンの表示レイアウトについて説明する。単独シーンを配置する場合は、例えば図6(b)の配置において、領域604に単独シーンを配置し、領域605にも単独シーンを配置するようにする。その際、領域605には、上記で説明したような平滑化フィルタを施すことによって画像の鮮鋭度を落とす。このような構成により、図6(c)に関する前述の説明と同様に、画像を観賞する際の空間的な広がり感が得られると共に、表示される画像のサイズが、時間方向に結合される他の配置の画像クリップと同じになるため、連続して観賞する際に生じ得る違和感を回避することができる。 Here, the display layout of a single scene when the screen aspect ratio is vertically long will be described. When arranging a single scene, for example, in the arrangement of FIG. 6B, the single scene is arranged in the area 604 and the single scene is arranged in the area 605 as well. At that time, the sharpness of the image is lowered by applying the smoothing filter as described above to the region 605. With such a configuration, as described above with reference to FIG. 6C, a spatial feeling when viewing an image can be obtained, and the size of the displayed image can be combined in the time direction. Therefore, it is possible to avoid a sense of incongruity that may occur when continuously watching images.
 なお、上記の空間フィルタは平滑化フィルタに限らず、領域ごとの色調を変える色変換フィルタでもよい。例えば、領域605、607の画像を色変換によってグレイスケールやいわゆるセピア調にすると、主シーンである領域604、608を目立ちやすくすることができる。あるいは、空間フィルタではなく、領域605、607の画像の時間方向の変化をゼロにする、すなわち静止画にすることで、主シーンである領域604、608との違いを強調してもよい。 The above spatial filter is not limited to a smoothing filter, and may be a color conversion filter that changes the color tone of each region. For example, if the images of the areas 605 and 607 are converted to gray scale or so-called sepia tone by color conversion, the areas 604 and 608 that are the main scene can be made conspicuous. Alternatively, the difference from the regions 604 and 608 that are the main scenes may be emphasized by making the change in the time direction of the images of the regions 605 and 607 zero, that is, making a still image instead of the spatial filter.
 以上、図6を参照して、画面アスペクト比が横長の画像を、画面アスペクト比が縦長の画面に配置する例を説明したが、逆に、画面アスペクト比が縦長の画像を、画面アスペクト比が横長の画面に配置する場合も、同様の考え方で、シーンを配置する領域のサイズや位置、各シーンから切り出す領域、および空間フィルタの適用有無を決定することができる。 As described above, with reference to FIG. 6, the example in which the image with the screen aspect ratio of the landscape is arranged on the screen with the screen aspect ratio of the portrait has been described. In the case of arranging on a horizontally long screen, the size and position of the area where the scene is arranged, the area cut out from each scene, and whether or not to apply the spatial filter can be determined based on the same concept.
 図19に、画面アスペクト比が縦長の画像(以降、「ポートレート画像」と記す)を、画面アスペクト比が横長の画面(以降、「ランドスケープ画面」と記す)に配置する場合の、シーン空間配置部1033によるシーン配置の例を示す。図19(a)は、図5(a)と同様、ランドスケープ画面用の「並列配置」の例である。図19(a)において配置される画像は、領域1902と領域1902’を含むポートレート画像である主シーンAと、領域1903と領域1903’を含むポートレート画像である主シーンBである。シーン空間配置部1033は、主シーンAと主シーンBそれぞれの中央領域を切り出して、表示領域1901内で並列に表示されるように、それぞれ領域1902、1903に配置する。領域1902’と領域1903’はそれぞれ、主シーンAおよび主シーンBの内、表示されない領域である。 FIG. 19 shows a scene space layout when an image with a screen aspect ratio of portrait (hereinafter referred to as “portrait image”) is placed on a screen with a screen aspect ratio of landscape (hereinafter referred to as “landscape screen”). An example of scene arrangement by the unit 1033 is shown. FIG. 19A is an example of “parallel arrangement” for a landscape screen, as in FIG. 5A. The images arranged in FIG. 19A are a main scene A that is a portrait image including areas 1902 and 1902 ', and a main scene B that is a portrait image including areas 1903 and 1903'. The scene space arrangement unit 1033 cuts out the central areas of the main scene A and the main scene B and arranges them in the areas 1902 and 1903 so that they are displayed in parallel in the display area 1901. A region 1902 'and a region 1903' are regions that are not displayed in the main scene A and the main scene B, respectively.
 図19(b)は、図5(b)と同様、ランドスケープ画面用の「中央配置」の例である。図19(b)において配置される画像は、領域1904に対応するポートレート画像である主シーンAと、領域1905と領域1905’を含むポートレート画像である副シーンBである。シーン空間配置部1033は、副シーンBの中央部分を、表示領域1901全体に対応する領域1905に表示されるように配置し、主シーンAを、表示領域1901の中央部に位置する領域1904に配置する。副シーンBの内、領域1905’は表示されない領域である。 FIG. 19 (b) is an example of “center arrangement” for the landscape screen, as in FIG. 5 (b). The image arranged in FIG. 19B is a main scene A which is a portrait image corresponding to the area 1904 and a sub-scene B which is a portrait image including areas 1905 and 1905 '. The scene space arrangement unit 1033 arranges the central part of the sub-scene B so as to be displayed in the area 1905 corresponding to the entire display area 1901, and the main scene A in the area 1904 located in the central part of the display area 1901. Deploy. Of the sub-scene B, an area 1905 'is an area that is not displayed.
 図19(d)は、ランドスケープ画面用の「中央配置」の別の例であり、図5(b)に対する図5(d)と同様に、主シーンAを、図19(b)と比較して大きく配置し、その分、副シーンBの表示面積が小さくなる配置である。シーン空間配置部1033は、副シーンBの中央部分を、表示領域1901全体に対応する領域1905に表示されるように配置し、主シーンAの中央部分を、表示領域1901の中央部に位置する領域1906に配置する。主シーンAの内、領域1906’は表示されない領域である。また、副シーンBの内、領域1905’は表示されない領域である。 FIG. 19D is another example of the “center arrangement” for the landscape screen. Similar to FIG. 5D for FIG. 5B, the main scene A is compared with FIG. 19B. The display area of the sub-scene B is accordingly reduced. The scene space arrangement unit 1033 arranges the central part of the sub-scene B so as to be displayed in the area 1905 corresponding to the entire display area 1901, and the central part of the main scene A is located in the central part of the display area 1901. Arrange in the area 1906. Of the main scene A, an area 1906 'is an area that is not displayed. Also, in the sub-scene B, the area 1905 'is an area that is not displayed.
 図19(c)は、図5(c)と同様、ランドスケープ画面用の「子画面配置」の例である。図19(c)において配置される画像は、領域1906と領域1906’を含むポートレート画像である主シーンAと、領域1907に対応するポートレート画像である副シーンBである。シーン空間配置部1033は、主シーンAの中央部分を、表示領域1901の中央部に位置する領域1906に配置し、副シーンBを縮小した上で主シーンの領域1906に隣接する領域1907に配置する。主シーンAの内、領域1906’は表示されない領域である。シーン空間配置部1033はまた、領域1906、1907の背景として、領域1908に、主シーンAの中央部分または副シーンBの中央部分を表示するように配置してもよい。 FIG. 19C is an example of “child screen arrangement” for the landscape screen, as in FIG. 5C. The images arranged in FIG. 19C are a main scene A that is a portrait image including areas 1906 and 1906 ′, and a sub-scene B that is a portrait image corresponding to the area 1907. The scene space arrangement unit 1033 arranges the central part of the main scene A in the area 1906 located in the central part of the display area 1901, reduces the sub-scene B, and arranges it in the area 1907 adjacent to the area 1906 of the main scene. To do. Of the main scene A, an area 1906 'is an area that is not displayed. The scene space arrangement unit 1033 may also be arranged to display the central part of the main scene A or the central part of the sub-scene B in the area 1908 as the background of the areas 1906 and 1907.
 図19(e)は、図5(f)と同様、ランドスケープ画面用に3つのシーンを配置する例である。図19(e)において配置される画像は、領域1909を含むポートレート画像である主シーンA、領域1910を含むポートレート画像である主シーンBおよび、領域1911を含むポートレート画像である主シーンCである。シーン空間配置部1033は、主シーンA、主シーンBおよび主シーンCそれぞれの中央領域を切り出して、表示領域1901内で水平方向に並列に表示されるように、それぞれ領域1909、1910、1911に配置する。 FIG. 19 (e) is an example in which three scenes are arranged for the landscape screen, as in FIG. 5 (f). The images arranged in FIG. 19E are a main scene A that is a portrait image including a region 1909, a main scene B that is a portrait image including a region 1910, and a main scene that is a portrait image including a region 1911. C. The scene space arrangement unit 1033 cuts out the central areas of the main scene A, the main scene B, and the main scene C, and displays them in areas 1909, 1910, and 1911 so that they are displayed in parallel in the display area 1901. Deploy.
 次に、画面アスペクト比が縦長の画像(ポートレート画像)を、画面アスペクト比が同じく縦長の画面(以降、「ポートレート画面」と記す)に配置する表示レイアウトについて説明する。図20に、ポートレート画像をポートレート画面に配置する場合の、シーン空間配置部1033によるシーン配置の例を示す。図20(a)は、2つのシーンを縦方向に並べて配置する、ポートレート画面用の「並列配置」の例である。図20(a)において配置される画像は、領域2002を含むポートレート画像である主シーンAと、領域2003を含むポートレート画像である主シーンBである。シーン空間配置部1033は、主シーンAと主シーンBそれぞれの中央領域を切り出して、ポートレート画面に対応する表示領域2001内で垂直方向に並列に表示されるように、それぞれ領域2002、2003に配置する。なお、図20においては、画像領域の切り出しに伴って表示されない領域については図示を省略し、別途、図21、22を参照して説明する。 Next, a description will be given of a display layout in which an image having a vertically long screen aspect ratio (portrait image) is arranged on a vertically long screen having the same screen aspect ratio (hereinafter referred to as “portrait screen”). FIG. 20 shows an example of scene arrangement by the scene space arrangement unit 1033 when the portrait image is arranged on the portrait screen. FIG. 20A is an example of “parallel arrangement” for a portrait screen in which two scenes are arranged in the vertical direction. The images arranged in FIG. 20A are a main scene A that is a portrait image including a region 2002 and a main scene B that is a portrait image including a region 2003. The scene space arrangement unit 1033 cuts out the central areas of the main scene A and the main scene B and displays them in the areas 2002 and 2003 so that they are displayed in parallel in the vertical direction in the display area 2001 corresponding to the portrait screen. Deploy. Note that in FIG. 20, an area that is not displayed when the image area is cut out is not illustrated and will be described separately with reference to FIGS. 21 and 22.
 図20(b)は、副シーンを表示領域全体に背景として配置し、主シーンを中央部分に重畳させて配置する、ポートレート画面用の「中央配置」の例である。図20(b)において配置される画像は、領域2004を含むポートレート画像である主シーンAと、領域2005を含むポートレート画像である副シーンBである。シーン空間配置部1033は、副シーンBを表示領域2001全体に対応する領域2005に配置し、主シーンAの中央領域を切り出して、表示領域2001の垂直方向における中央部の領域2004に配置する。 FIG. 20B shows an example of “center arrangement” for a portrait screen in which the sub-scene is arranged as a background over the entire display area and the main scene is arranged so as to be superimposed on the center portion. The images arranged in FIG. 20B are a main scene A that is a portrait image including a region 2004 and a sub-scene B that is a portrait image including a region 2005. The scene space arrangement unit 1033 arranges the sub-scene B in the area 2005 corresponding to the entire display area 2001, cuts out the central area of the main scene A, and arranges it in the central area 2004 in the vertical direction of the display area 2001.
 図20(c)は、主シーンAを表示領域全体に対応する領域に配置し、副シーンBを子画面領域として主シーン上に重畳させて配置する、ポートレート画面用の「子画面配置」の例である。図20(c)において配置される画像は、領域2006に対応するポートレート画像である主シーンAと、領域2007に対応するポートレート画像である副シーンBである。シーン空間配置部1033は、主シーンAを表示領域2001全体に対応する領域2006に配置し、副シーンBを表示領域2001全体の面積の4分の1より小さいサイズの領域2007に収まるように配置する。領域2007のサイズは、例えば表示領域2001全体の面積の1/9程度とする。 FIG. 20C shows a “child screen arrangement” for a portrait screen in which the main scene A is arranged in an area corresponding to the entire display area, and the sub scene B is arranged as a child screen area so as to be superimposed on the main scene. It is an example. The images arranged in FIG. 20C are a main scene A that is a portrait image corresponding to the area 2006 and a sub-scene B that is a portrait image corresponding to the area 2007. The scene space arrangement unit 1033 arranges the main scene A in the area 2006 corresponding to the entire display area 2001, and arranges the sub scene B so as to fit in the area 2007 having a size smaller than a quarter of the area of the entire display area 2001. To do. The size of the area 2007 is, for example, about 1/9 of the area of the entire display area 2001.
 図20(d)は、ポートレート画面用に3つのシーンを垂直方向に並べて配置する例である。図20(d)において配置される画像は、領域2008を含むポートレート画像である主シーンA、領域2009を含むポートレート画像である主シーンBおよび、領域2010を含むポートレート画像である主シーンCである。シーン空間配置部1033は、主シーンA、主シーンBおよび主シーンCそれぞれの中央領域を切り出して、表示領域2001内で垂直方向に並列に表示されるように、それぞれ領域2008、2009、2010に配置する。 FIG. 20D shows an example in which three scenes are arranged in the vertical direction for the portrait screen. The images arranged in FIG. 20D are a main scene A that is a portrait image including a region 2008, a main scene B that is a portrait image including a region 2009, and a main scene that is a portrait image including a region 2010. C. The scene space arrangement unit 1033 cuts out the central areas of the main scene A, the main scene B, and the main scene C and displays them in the areas 2008, 2009, and 2010 so that they are displayed in parallel in the vertical direction in the display area 2001. Deploy.
 以上、図19および図20で説明したものは、出力先の映像表示デバイスの画面アスペクト比が横長(ランドスケープ画面)あるいは縦長(ポートレート画面)いずれであっても、出力する主シーンおよび副シーンがいずれも縦長の画面アスペクト比の画像(ポートレート画像)である場合の、シーン配置の例である。同様に、図5および図6で説明したものは、出力する主シーンおよび副シーンがいずれも横長の画面アスペクト比の画像(以降、「ランドスケープ画像」と記す)である場合の、シーン配置の例である。しかし、同じ画面上に配置する主シーン及び副シーンは、必ずしも同じ画面アスペクト比の画像であるとは限らない。そこで次に、主シーンと副シーンで異なる画面アスペクト比の画像が混在する場合に、各画像から表示用に出力する画像領域を決定する方法について、図21および図22を用いて説明する。 As described above with reference to FIGS. 19 and 20, the main scene and the sub-scene to be output are output regardless of whether the screen aspect ratio of the output video display device is landscape (landscape screen) or portrait (portrait screen). Each of these is an example of a scene arrangement in the case of a vertically long screen aspect ratio image (portrait image). Similarly, what has been described with reference to FIGS. 5 and 6 is an example of a scene arrangement in the case where both the main scene and the sub scene to be output are images having a horizontally long screen aspect ratio (hereinafter referred to as “landscape images”). It is. However, the main scene and the sub scene arranged on the same screen are not necessarily images having the same screen aspect ratio. Next, a method for determining an image area to be output for display from each image when images having different screen aspect ratios are mixed in the main scene and the sub-scene will be described with reference to FIGS.
 シーン空間配置部1033において、複数シーンの配置を、「並列配置」、「中央配置」、「子画面配置」等に決定する際、出力先の映像表示デバイスの表示領域のサイズと画面アスペクト比および、配置しようとする各シーンの画像サイズとアスペクト比に基づいて、各シーンの画像中で表示する領域を決定する。この時、配置のパターンに応じて各シーンの画像を最も有効に利用できるように、各画像のスケーリング(拡大/縮小)やクロッピング(切り取り)などの画像処理を施す。これら画像処理の過程を、図21および図22を用いて説明する。 In the scene space arrangement unit 1033, when determining the arrangement of a plurality of scenes to “parallel arrangement”, “center arrangement”, “sub-screen arrangement”, etc., the size and screen aspect ratio of the display area of the video display device of the output destination and Based on the image size and aspect ratio of each scene to be arranged, an area to be displayed in the image of each scene is determined. At this time, image processing such as scaling (enlargement / reduction) and cropping (cutting) of each image is performed so that the image of each scene can be used most effectively according to the arrangement pattern. These image processing steps will be described with reference to FIGS. 21 and 22.
 図21に、ランドスケープ画面に画像を出力する際の、シーン空間配置部1033における画像のスケーリングとクロッピングの処理例を示す。ランドスケープ画面に画像を出力するケースに関しては、図5および図19にシーン配置の例を示したが、図21では、元画像2101~2104から、表示用に抽出する領域2101’~2104’をどのように決定するかの例を説明する。図中、斜線部分は各元画像から抽出される表示用の領域であることを示す。また、HoおよびVoは、それぞれ出力先の表示領域の水平方向のサイズ(画素数)と垂直方向のサイズ(画素数)を意味し、HおよびVは、スケーリングやクロッピングの処理を行う前の、元画像の水平方向のサイズ(画素数)と垂直方向のサイズ(画素数)をそれぞれ意味する。 FIG. 21 shows a processing example of image scaling and cropping in the scene space arrangement unit 1033 when an image is output to the landscape screen. As for the case of outputting an image to the landscape screen, examples of the scene arrangement are shown in FIGS. 5 and 19, but in FIG. 21, which areas 2101 ′ to 2104 ′ are extracted from the original images 2101 to 2104 for display. An example of how to determine this will be described. In the figure, the shaded area indicates a display area extracted from each original image. Further, Ho and Vo mean the horizontal size (number of pixels) and the vertical size (number of pixels) of the display area of the output destination, respectively, and H and V are the values before scaling and cropping processing. It means the size (number of pixels) in the horizontal direction and the size (number of pixels) in the vertical direction of the original image.
 図21(a)は、ランドスケープ画像2101を、図5(a)に示したようなランドスケープ画面用の「並列配置」の主シーンとして使用する際の、表示用領域2101’の決め方の例である。シーン空間配置部1033はまず、元画像2101の垂直方向のサイズVを出力先の表示領域の垂直方向のサイズVoに合わせるように、元画像2101全体をスケーリングする(V→Vo)。その後、シーン空間配置部1033は、スケーリング後の元画像2101の中央部分を、水平方向のサイズがHo/2になるようにクロッピングして、表示用の領域2101’を抽出する。なお、元画像2101をスケーリングする際は、元画像の画面アスペクト比を維持するように拡大/縮小を行い、シーン内の画像の歪みが生じないようにする。言い換えれば、スケーリング前後の、水平方向のサイズ比と垂直方向のサイズ比が同じになるように、スケーリングを行う。例えば、画像サイズがH×Vである元画像を、前述のように、垂直方向のサイズVを表示領域の垂直方向のサイズVoに合わせるようにスケーリングする場合、スケーリング後の画像の垂直方向のサイズV’は、V’=(Vo/V)×Vと表すことができ、スケーリング前後のサイズ比はVo/Vである。従って、スケーリング後の画像の水平方向のサイズH’に関しては、H’=(Vo/V)×Hとなるように、スケーリングを行う。以降、図21および図22に関する説明におけるスケーリング処理は、全てこれと同様の考え方に基づいて行う。 FIG. 21A shows an example of how to determine the display area 2101 ′ when the landscape image 2101 is used as the main scene of the “parallel arrangement” for the landscape screen as shown in FIG. 5A. . The scene space arrangement unit 1033 first scales the entire original image 2101 so that the vertical size V of the original image 2101 matches the vertical size Vo of the display area of the output destination (V → Vo). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2101 so that the horizontal size becomes Ho / 2, and extracts a display area 2101 ′. When scaling the original image 2101, enlargement / reduction is performed so as to maintain the screen aspect ratio of the original image so as not to cause distortion of the image in the scene. In other words, scaling is performed so that the size ratio in the horizontal direction and the size ratio in the vertical direction are the same before and after scaling. For example, when the original image whose image size is H × V is scaled so that the vertical size V matches the vertical size Vo of the display area as described above, the vertical size of the scaled image V ′ can be expressed as V ′ = (Vo / V) × V, and the size ratio before and after scaling is Vo / V. Accordingly, scaling is performed so that H ′ = (Vo / V) × H for the horizontal size H ′ of the scaled image. Hereinafter, all the scaling processing in the description related to FIGS. 21 and 22 is performed based on the same concept.
 図21(b)は、ポートレート画像2102を、図5(a)に示したようなランドスケープ画面用の「並列配置」の主シーンとして使用する際の、表示用領域2102’の決め方の例である。シーン空間配置部1033はまず、元画像2102の水平方向のサイズHを出力先の表示領域の水平方向のサイズの2分の1(=Ho/2)に合わせるように、元画像2102全体をスケーリングする(H→Ho/2)。その後、シーン空間配置部1033は、スケーリング後の元画像2102の中央部分を、垂直方向のサイズがVoになるようにクロッピングして、表示用の領域2102’を抽出する。シーン空間配置部1033はまた、図5(b)に示したようなランドスケープ画面用の「中央配置」の主シーンとしてポートレート画像2102を使用する際にも、図21(b)に従って、元画像2102から表示用領域2102’を決定してもよい。 FIG. 21B shows an example of how to determine the display area 2102 ′ when the portrait image 2102 is used as the main scene of the “parallel arrangement” for the landscape screen as shown in FIG. is there. The scene space arrangement unit 1033 first scales the entire original image 2102 so that the horizontal size H of the original image 2102 is adjusted to one half (= Ho / 2) of the horizontal size of the display area of the output destination. (H → Ho / 2). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2102 so that the vertical size is Vo, and extracts a display area 2102 ′. The scene space arrangement unit 1033 also uses the portrait image 2102 as the main scene of the “center arrangement” for the landscape screen as shown in FIG. 5B, according to FIG. 21B. Display area 2102 ′ may be determined from 2102.
 図21(c)は、ランドスケープ画像2103を、図5(d)に示したようなランドスケープ画面用のもう一つの「中央配置」の主シーンとして使用する際の、表示用領域2103’の決め方の例である。シーン空間配置部1033は、まず、元画像2103の垂直方向のサイズVを出力先の表示領域の垂直方向のサイズVoに合わせるように、元画像2103全体をスケーリングする(V→Vo)。次にシーン空間配置部1033は、元画像2103の中央部分を、水平方向のサイズが出力先の表示領域の水平方向のサイズの半分(=Ho/2)になるように、かつ、垂直方向のサイズが出力先の表示領域の垂直方向のサイズから所定の画素数Ωだけ小さいサイズ(=Vo-Ω)になるようにクロッピングして、表示用の領域2103’を抽出する。所定の画素数Ωは例えば、出力先の表示領域の垂直方向のサイズVoの5%と定める。 FIG. 21C shows how to determine the display area 2103 ′ when the landscape image 2103 is used as another “center arrangement” main scene for the landscape screen as shown in FIG. 5D. It is an example. The scene space arrangement unit 1033 first scales the entire original image 2103 so that the vertical size V of the original image 2103 matches the vertical size Vo of the display area of the output destination (V → Vo). Next, the scene space arrangement unit 1033 sets the central portion of the original image 2103 so that the horizontal size is half the horizontal size (= Ho / 2) of the display area of the output destination, and the vertical direction The display area 2103 ′ is extracted by cropping the size so that the size is smaller by a predetermined number of pixels Ω (= Vo−Ω) than the vertical size of the display area of the output destination. The predetermined number of pixels Ω is determined to be, for example, 5% of the vertical size Vo of the output destination display area.
 図21(d)は、ポートレート画像2104を、図5(c)に示したようなランドスケープ画面用の「子画面配置」の主シーンとして使用する際の、表示用領域2104’の決め方の例である。シーン空間配置部1033はまず、元画像2104の水平方向のサイズHを出力先の表示領域の水平方向のサイズHoに合わせるように、元画像2104全体をスケーリングする(H→Ho)。その後、シーン空間配置部1033は、スケーリング後の元画像2104の中央部分を、垂直方向のサイズがVoになるようにクロッピングして、表示用の領域2104’を抽出する。シーン空間配置部1033はまた、ランドスケープ画面用の「中央配置」の副シーンとしてポートレート画像2104を使用する際にも、図21(d)に従って、元画像2104から表示用領域2104’を決定してもよい。 FIG. 21D shows an example of how to determine the display area 2104 ′ when the portrait image 2104 is used as the main scene of the “child screen arrangement” for the landscape screen as shown in FIG. 5C. It is. First, the scene space arrangement unit 1033 scales the entire original image 2104 so that the horizontal size H of the original image 2104 matches the horizontal size Ho of the display area of the output destination (H → Ho). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2104 so that the vertical size is Vo, and extracts a display area 2104 ′. The scene space arrangement unit 1033 also determines the display area 2104 ′ from the original image 2104 according to FIG. 21D when using the portrait image 2104 as a sub-scene of the “center arrangement” for the landscape screen. May be.
 図22に、ポートレート画面に画像を出力する場合の、シーン空間配置部1033におけるシーンのスケーリングとクロッピングの処理例を示す。ポートレート画面に画像を出力するケースに関しては、図6および図20にシーン配置の例を示したが、図22では、元画像2201~2203から、表示用に抽出する領域2201’~2203’をどのように決定するかの例を説明するものである。図中の記号の意味は、前述の図21におけるものと同様であるため、説明を省略する。 FIG. 22 shows an example of scene scaling and cropping in the scene space layout unit 1033 when an image is output on the portrait screen. Regarding the case of outputting an image on the portrait screen, examples of scene arrangement are shown in FIGS. 6 and 20, but in FIG. 22, regions 2201 ′ to 2203 ′ extracted for display from the original images 2201 to 2203 are displayed. An example of how to determine is described. The meanings of the symbols in the figure are the same as those in FIG.
 図22(a)は、ランドスケープ画像2201を、図6(a)に示したようなポートレート画面用の「並列配置」の主シーンとして使用する際の、表示用領域2201’の決め方の例である。シーン空間配置部1033はまず、元画像2201の垂直方向のサイズVを出力先の表示領域の垂直方向のサイズの2分の1(Vo/2)に合わせるように、元画像2201全体をスケーリングする(V→Vo/2)。その後、シーン空間配置部1033は、スケーリング後の元画像2201の中央部分を、水平方向のサイズがHoになるようにクロッピングして、表示用の領域2201’を抽出する。シーン空間配置部1033はまた、図6(b)に示したようなポートレート画面用の「中央配置」の主シーンとしてランドスケープ画像2201を使用する際にも、図22(a)に従って、元画像2201から表示用領域2201’を決定してもよい。シーン空間配置部1033はまた、図6(c)に示したようなポートレート画面用の「子画面配置」の主シーンとしてランドスケープ画像2201を使用する際にも、図22(a)に従って、元画像2201から表示用領域2201’を決定してもよい。 FIG. 22A shows an example of how to determine the display area 2201 ′ when the landscape image 2201 is used as the main scene of the “parallel arrangement” for the portrait screen as shown in FIG. 6A. is there. The scene space arrangement unit 1033 first scales the entire original image 2201 so that the vertical size V of the original image 2201 is adjusted to one half (Vo / 2) of the vertical size of the display area of the output destination. (V → Vo / 2). Thereafter, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2201 so that the horizontal size becomes Ho, and extracts a display region 2201 ′. The scene space arrangement unit 1033 also uses the landscape image 2201 as the main scene of the “center arrangement” for the portrait screen as shown in FIG. 6B in accordance with FIG. 22A. The display area 2201 ′ may be determined from 2201. The scene space layout unit 1033 also uses the landscape image 2201 as the main scene of the “child screen layout” for the portrait screen as shown in FIG. 6C according to FIG. 22A. The display area 2201 ′ may be determined from the image 2201.
 図22(b)は、ポートレート画像2202を、図20(a)に示したようなポートレート画面用の「並列配置」の主シーンとして使用する際の、表示用領域2202’の決め方の例である。シーン空間配置部1033はまず、元画像2202の水平方向のサイズHを出力先の表示領域の水平方向のサイズ(Ho)に合わせるように、元画像2202全体をスケーリングする(H→Ho)。その後、シーン空間配置部1033は、スケーリング後の元画像2202の中央部分を、垂直方向のサイズが出力先の表示領域の垂直方向のサイズの2分の1(Vo/2)になるようにクロッピングして、表示用の領域2202’を抽出する。シーン空間配置部1033はまた、図6(b)に示したようなポートレート画面用の「中央配置」の主シーンとしてポートレート画像2202を使用する際にも、図22(b)に従って、元画像2202から表示用領域2202’を決定してもよい。 FIG. 22B shows an example of how to determine the display area 2202 ′ when the portrait image 2202 is used as the main scene of the “parallel arrangement” for the portrait screen as shown in FIG. It is. The scene space arrangement unit 1033 first scales the entire original image 2202 so that the horizontal size H of the original image 2202 matches the horizontal size (Ho) of the display area of the output destination (H → Ho). After that, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2202 so that the vertical size is one half (Vo / 2) of the vertical size of the display area of the output destination. Thus, the display area 2202 ′ is extracted. The scene space arrangement unit 1033 also uses the portrait image 2202 as the main scene of the “center arrangement” for the portrait screen as shown in FIG. 6B according to FIG. 22B. The display area 2202 ′ may be determined from the image 2202.
 図22(c)は、ランドスケープ画像2203を、図6(b)に示したようなポートレート画面用の「中央配置」の副シーンとして使用する際の、表示用領域2203’の決め方の例である。シーン空間配置部1033はまず、元画像2203の垂直方向のサイズVを出力先の表示領域の水平方向のサイズ(Vo)に合わせるように、元画像2203全体をスケーリングする(V→Vo)。その後、シーン空間配置部1033は、スケーリング後の元画像2203の中央部分を、水平方向のサイズがHoになるようにクロッピングして、表示用の領域2203’を抽出する。シーン空間配置部1033はまた、図6(c)に示したようなポートレート画面用の「子画面配置」の背景部分(領域607)の主シーンもしくは副シーンとしてランドスケープ画像2203を使用する際にも、図22(c)に従って、元画像2203から表示用領域2203’を決定してもよい。 FIG. 22C shows an example of how to determine the display area 2203 ′ when the landscape image 2203 is used as the “center arrangement” sub-scene for the portrait screen as shown in FIG. 6B. is there. The scene space arrangement unit 1033 first scales the entire original image 2203 so that the vertical size V of the original image 2203 matches the horizontal size (Vo) of the display area of the output destination (V → Vo). After that, the scene space arrangement unit 1033 crops the central portion of the scaled original image 2203 so that the horizontal size is Ho, and extracts a display area 2203 ′. The scene space layout unit 1033 also uses the landscape image 2203 as the main scene or sub-scene of the background portion (area 607) of the “child screen layout” for the portrait screen as shown in FIG. Alternatively, the display area 2203 ′ may be determined from the original image 2203 in accordance with FIG.
 以上のように、ランドスケープ画像およびポートレート画像の両方を含んで構成される画像群から、複数の画像を組み合わせて出力する際に、出力先の映像表示デバイスの画面サイズおよび画面アスペクト比に合わせて、元画像ごとにスケーリングやクロッピングを行うことによって、元画像の画面アスペクト比と出力先の画面アスペクト比が異なる場合や、画像サイズや画面アスペクト比の異なる複数の画像が混在するシーン配置を行う場合であっても、出力画面の表示領域を最大限有効に活用しながら、画像の歪みを生じない高品位な映像を出力することが可能になる。 As described above, when a plurality of images are combined and output from an image group that includes both landscape images and portrait images, it matches the screen size and screen aspect ratio of the output video display device. When scaling or cropping is performed for each original image, the screen aspect ratio of the original image and the screen aspect ratio of the output destination are different, or a scene arrangement in which multiple images with different image sizes or screen aspect ratios are mixed Even so, it is possible to output a high-quality video that does not cause image distortion while maximally effectively using the display area of the output screen.
 なお、スケーリングを行う際には、元画像の水平方向の画素数(H)もしくは垂直方向の画素数(V)のいずれかが、出力先の表示領域において対応する方向の画素数(Ho、Ho/2、Vo、Vo/2等)よりも大きい場合のみ行うようにしてもよい。そうすることにより、元画像の拡大処理を行う頻度を下げて、拡大処理に伴う画質の劣化や画像データ量の増大を抑えることができる。 When scaling is performed, either the number of pixels in the horizontal direction (H) or the number of pixels in the vertical direction (V) of the original image corresponds to the number of pixels (Ho, Ho) in the corresponding direction in the display area of the output destination. / 2, Vo, Vo / 2, etc.). By doing so, it is possible to reduce the frequency of performing the enlargement process of the original image, and to suppress the deterioration in image quality and the increase in the amount of image data accompanying the enlargement process.
 以上説明したように、生成するダイジェスト動画像の出力画像仕様および出力音声仕様を、出力先の映像表示デバイスや音声出力デバイスの仕様・能力に合わせることによって、出力先のデバイスに適したダイジェスト動画像を生成することができる。特に映像に関しては、表示デバイスの大きさや画面アスペクト比に応じて、複数のシーンを効果的に配置した見やすいダイジェスト動画像を生成することが可能になる。また、ダイジェスト動画像を符号化する場合には、出力先デバイスの能力を最大限活かすことのできる高品質な映像・音声を出力させることが可能になる。 As described above, by adjusting the output image specifications and output audio specifications of the generated digest moving image to the specifications and capabilities of the output destination video display device and audio output device, the digest moving image suitable for the output destination device Can be generated. Particularly for video, it is possible to generate an easy-to-see digest moving image in which a plurality of scenes are effectively arranged according to the size of the display device and the screen aspect ratio. Also, when encoding a digest video, it is possible to output high-quality video / audio that can make the most of the capabilities of the output destination device.
  (第2の実施形態)
 次に、本発明に係る第2の実施形態である映像編集装置について説明する。第2の実施形態の映像編集装置は、第1の実施形態の映像編集装置に対して、ダイジェスト動画像生成部103に違いがある点が特徴である。図示は省略するが、本実施形態におけるダイジェスト動画像生成部は、内部に、ダイジェスト動画像生成カウント部、ランダム配置パターン決定部をさらに備えて構成される。以下、第1の実施形態との違いを中心に詳しく説明する。
(Second Embodiment)
Next, a video editing apparatus according to the second embodiment of the present invention will be described. The video editing apparatus according to the second embodiment is characterized in that the digest moving image generation unit 103 is different from the video editing apparatus according to the first embodiment. Although illustration is omitted, the digest moving image generation unit in the present embodiment is configured to further include a digest moving image generation count unit and a random arrangement pattern determination unit. Hereinafter, the difference from the first embodiment will be described in detail.
 本実施形態におけるダイジェスト動画像生成部103では、ダイジェスト動画像生成カウント部が、イベント選択部104から通知された選択情報で示される画像データ群の単位で、ダイジェスト動画像の生成回数をカウントする。ダイジェスト動画像生成カウント部は、カウントした生成回数をランダム配置パターン決定部へ通知する。ランダム配置パターン決定部は、通知された生成回数が1回の場合は何もせず、生成回数が2回以上の場合は、複数シーンの空間的な配置パターンを決定する際に、乱数に基づいて配置パターンをランダムに変化させる。その結果、ダイジェスト動画像生成部はダイジェスト動画像を生成する際に、選択された画像データ群の単位で、その初回生成時には、本発明の第1の実施形態におけるシーン空間配置部1033に関して説明したように、シーンの種類および組み合わせシーン同士のシーン情報の関係に基づいて複数シーンの配置パターンを決定するが、2回目以降の生成時には、複数シーンの配置パターンを組み合わせシーンごとにランダムに変化させる。組み合わせシーンの決定に関しては、第1の実施形態におけるシーン種類決定部1032に関して説明したように、時間的に近接したシーン同士が選択されるように判定する。 In the digest moving image generation unit 103 in the present embodiment, the digest moving image generation counting unit counts the number of times a digest moving image is generated in units of image data groups indicated by the selection information notified from the event selection unit 104. The digest moving image generation counting unit notifies the random arrangement pattern determination unit of the counted number of generations. When the notified number of generations is 1, the random arrangement pattern determination unit does nothing, and when the number of generations is two or more, when determining the spatial arrangement pattern of a plurality of scenes, The arrangement pattern is changed randomly. As a result, when the digest moving image generating unit generates the digest moving image, the scene space arranging unit 1033 according to the first embodiment of the present invention has been described in the unit of the selected image data group at the first generation. As described above, the arrangement pattern of a plurality of scenes is determined based on the relationship between the type of scene and the scene information between the combination scenes. However, at the second and subsequent generations, the arrangement pattern of the plurality of scenes is randomly changed for each combination scene. Regarding the determination of the combination scene, as described with respect to the scene type determination unit 1032 in the first embodiment, it is determined that scenes close in time are selected.
 このような構成にすることにより、同じ画像データ群から何回もダイジェスト動画像を生成する場合に、ダイジェスト動画像に使用するシーンの時間的な前後関係は撮影時刻に基づいて保ちながら、生成するたびに異なるレイアウトを有するダイジェスト動画像を生成することができる。その結果、ユーザが、同じ画像データ群を新鮮な感覚で観賞することを可能にし、ユーザが飽きにくいダイジェスト動画像を提供することができる。 With such a configuration, when a digest moving image is generated many times from the same image data group, the time sequence of the scene used for the digest moving image is generated based on the shooting time. Digest moving images having different layouts can be generated each time. As a result, it is possible for the user to view the same image data group with a fresh sensation, and to provide a digest moving image in which the user is less tired.
  (第3の実施形態)
 次に、本発明に係る第3の実施形態である映像編集装置について説明する。第3の実施形態の映像編集装置は、第1の実施形態の映像編集装置に対して、シーン情報生成部およびダイジェスト動画像生成部に違いがある点が特徴である。図8(a)に、本実施形態による映像編集装置100aの内部構成を示す。映像編集装置100aは、画像データ分類部101、シーン情報生成部102a、ダイジェスト動画像生成部103a、イベント選択部104、および出力制御部105を含んで構成される。以下、第1の実施形態による映像編集装置100との相違点を中心に詳しく説明する。
(Third embodiment)
Next, a video editing apparatus according to the third embodiment of the present invention will be described. The video editing apparatus according to the third embodiment is different from the video editing apparatus according to the first embodiment in that the scene information generation unit and the digest moving image generation unit are different. FIG. 8A shows the internal configuration of the video editing apparatus 100a according to the present embodiment. The video editing apparatus 100a includes an image data classification unit 101, a scene information generation unit 102a, a digest moving image generation unit 103a, an event selection unit 104, and an output control unit 105. Hereinafter, the difference from the video editing apparatus 100 according to the first embodiment will be described in detail.
  (シーン情報生成部102a)
 シーン情報生成部102aは、画像データを分析して、画像信号や音声信号で特徴づけられる1つ以上のシーンに分類し、シーン単位の特徴を示す情報であるシーン情報を生成する。シーン情報は、画像内の特徴領域に関する情報として、「人物数」、「最大人物サイズ」、「最大人物位置」を含むように構成される(以降、これら3種類の情報を総称して人物情報と呼ぶ)。「人物数」は、各シーンの画像内に現れる人物の画像領域(人物領域)の数の、画像フレーム単位での最大値を表し、「最大人物サイズ」は、各シーンの人物領域の中で最も面積が大きい領域のサイズを表し、「最大人物位置」は、最大人物サイズに該当する領域の位置(画像内の座標)を表す。シーン情報生成部102aは、各シーンの特徴領域として画像内の顔画像と全身画像を検出し、顔画像が検出された場合は顔画像の領域に関する情報からシーン情報を生成し、顔画像が検出されない場合(人物は写っているが横や後ろを向いている場合など)は、全身画像の領域に関する情報からシーン情報を生成する。顔画像を検出する方法としては、例えば、所定サイズの領域単位で画像の特徴量を抽出し、Haar-Like特徴量を利用した顔画像の識別器に基づいて顔の領域を検出(識別)する方法がある。また、全身画像を検出する方法としては、所定の画像領域単位で勾配方向ヒストグラム(HOG:Histograms of oriented gradients)を算出し、HOG特徴量を利用した全身画像の識別器に基づいて全身像の領域を検出(識別)する方法がある。これら顔画像や全身画像を検出する方法は一例であり、検出する領域のサイズと位置が得られる方法であれば、上記方法に限定されない。また、顔画像と全身画像に限らず、別途用意する特徴量を利用した識別器に基づいて上半身画像や下半身画像の領域も検出し、例えば顔画像や全身画像が検出されない場合は、上半身画像の領域に関する情報(数、サイズ、位置)に基づいてシーン情報を生成してもよいし、上半身画像も検出されない場合には、下半身画像の領域に関する情報(数、サイズ、位置)に基づいてシーン情報を生成してもよい。
(Scene information generation unit 102a)
The scene information generation unit 102a analyzes the image data, classifies the image data into one or more scenes characterized by an image signal or an audio signal, and generates scene information that is information indicating the feature of each scene. The scene information is configured to include “number of persons”, “maximum person size”, and “maximum person position” as information regarding the feature region in the image (hereinafter, these three types of information are collectively referred to as person information. Called). “Number of persons” represents the maximum number of image areas (person areas) of persons appearing in the image of each scene in units of image frames. “Maximum person size” represents the number of person areas in each scene. The size of the area with the largest area is represented, and the “maximum person position” represents the position (coordinate in the image) of the area corresponding to the maximum person size. The scene information generation unit 102a detects a face image and a whole body image in the image as feature regions of each scene, and when a face image is detected, generates scene information from information related to the region of the face image and detects the face image. If it is not performed (such as a person appearing but looking sideways or behind), scene information is generated from information about the whole body image area. As a method for detecting a face image, for example, image feature amounts are extracted in units of regions of a predetermined size, and face regions are detected (identified) based on a face image discriminator using Haar-Like feature amounts. There is a way. Further, as a method for detecting a whole body image, a gradient direction histogram (HOG: Histograms of oriented gradients) is calculated in a predetermined image area unit, and a whole body image region based on a whole body image discriminator using the HOG feature amount is calculated. There is a method of detecting (identifying). The method for detecting the face image and the whole body image is an example, and the method is not limited to the above method as long as the size and position of the area to be detected can be obtained. Also, not only the face image and the whole body image, but also the upper body image and lower body image areas are detected based on a classifier using separately prepared feature amounts. For example, when the face image and the whole body image are not detected, the upper body image Scene information may be generated based on information (number, size, position) on the area, and if no upper body image is detected, scene information based on information (number, size, position) on the lower body image area May be generated.
 図9は、上記の人物情報の概念を示す図である。図9(a)は、人物領域701が座標702(x1,y1)に位置し、そのサイズが(H1×V1)であるようなシーンの一例である。このように、シーン内の人物数(人物領域の数)が1つの場合、「最大人物サイズ」および「最大人物位置」はそれぞれ一意に、(H1×V1)および(x1,y1)に定まる。図9(b)は、2つの人物領域703、704が、それぞれ座標705(x2,y2)、706(x3,y3)に位置し、各領域のサイズがそれぞれ(H2×V2)、(H3×V3)であるシーンの例である。図9(b)のように、シーン内の人物数が2つの場合、2つの人物領域(703、704)の内、面積が大きい方の領域703のサイズ(H2×V2)を示す情報を「最大人物サイズ」として定め、また、その領域703の座標(x2,y2)を示す情報を「最大人物位置」として定める。図9(c)は、4つの人物領域707、708、709、710を含むシーンの例である。図9(c)の例では、人物領域707が4つの人物領域のうちで最も面積が大きいものとする。この場合、領域707のサイズ(H4×V4)を示す情報を「最大人物サイズ」として定め、また、その領域707の座標(x4,y4)を示す情報を「最大人物位置」として定める。 FIG. 9 is a diagram showing the concept of the person information. FIG. 9A shows an example of a scene in which the person area 701 is located at the coordinates 702 (x1, y1) and the size thereof is (H1 × V1). Thus, when the number of persons (number of person areas) in the scene is one, the “maximum person size” and the “maximum person position” are uniquely determined as (H1 × V1) and (x1, y1), respectively. In FIG. 9B, two person regions 703 and 704 are located at coordinates 705 (x2, y2) and 706 (x3, y3), respectively, and the sizes of the regions are (H2 × V2) and (H3 ×), respectively. It is an example of the scene which is V3). As shown in FIG. 9B, when the number of persons in the scene is two, information indicating the size (H2 × V2) of the area 703 having the larger area out of the two person areas (703, 704) is “ “Maximum person size” is defined, and information indicating the coordinates (x2, y2) of the area 703 is defined as “maximum person position”. FIG. 9C is an example of a scene including four person areas 707, 708, 709, and 710. In the example of FIG. 9C, it is assumed that the person area 707 has the largest area among the four person areas. In this case, information indicating the size (H4 × V4) of the area 707 is defined as “maximum person size”, and information indicating the coordinates (x4, y4) of the area 707 is defined as “maximum person position”.
 図10に、図9に示した例に対応するシーン情報の例を示す。シーン情報800は、シーン情報200と同様に、シーンに関する情報を行単位で記述したものであり、各行801、802、803、…の記述がそれぞれ一つのシーンに対応するように構成される。各行で記述される情報は、左から順に、画像ファイル名、撮影年月日、撮影時刻、シーン先頭フレーム番号、シーン終端フレーム番号、人物数、最大人物サイズ、最大人物位置、動き情報、会話情報を示している。以下に、シーン情報800のうち、人物数、最大人物サイズ、最大人物位置に関して説明する。これ以降、図10に関する説明においては、記述の簡単化のため、各シーン情報の符号がそれぞれシーンそのものを意味するものとしても記述する。 FIG. 10 shows an example of scene information corresponding to the example shown in FIG. Like the scene information 200, the scene information 800 describes information about the scene in units of lines, and each line 801, 802, 803,... Is configured to correspond to one scene. The information described in each line is the image file name, shooting date, shooting time, scene start frame number, scene end frame number, number of people, maximum person size, maximum person position, motion information, conversation information in order from the left. Is shown. Hereinafter, the number of persons, the maximum person size, and the maximum person position in the scene information 800 will be described. Hereinafter, in the description related to FIG. 10, for the sake of simplicity of description, the symbols of each scene information are also described as meaning the scenes themselves.
 シーン801のシーン情報は、図9(a)の例に対応する例である。図9(a)において、人物領域は1つのみ(領域701)であるため、シーン801の人物領域に関するシーン情報は、人物数=1となり、最大人物サイズおよび最大人物位置は、領域701のサイズ(H1×V1)および座標(x1,y1)となる。図10では、H1,V1,x1,y1に対応する数値として、H1=400、V1=500、x1=500、y1=300がそれぞれ記述されている(シーン801のシーン情報)。シーン802のシーン情報は、図9(a)の例と同様、人物領域が1つであるシーンに対応する例である。シーン803のシーン情報は、図9(b)の例に対応する例である。図9(b)において、人物領域は2つ(領域703、704)であり、面積が大きい方は領域703である。従って、シーン803の人物領域に関するシーン情報は、人物数=2となり、最大人物サイズおよび最大人物位置は、領域703のサイズ(H2×V2)および座標(x2,y2)となる。図10では、H2,V2,x2,y2に対応する数値として、H2=360、V2=480、x2=400、y2=500がそれぞれ記述されている(シーン803のシーン情報)。シーン804のシーン情報は、図9(c)の例に対応する例である。図9(c)において、人物領域は4つ(領域707、708、709、710)であり、そのうち最も面積が大きいのは領域707である。従って、シーン804の人物領域に関するシーン情報は、人物数=4となり、最大人物サイズおよび最大人物位置は、領域707のサイズ(H4×V4)および座標(x4,y4)となる。図10では、H4,V4,x4,y4に対応する数値として、H4=450、V4=520、x4=100、y4=300がそれぞれ記述されている(シーン804のシーン情報)。シーン805のシーン情報は、人物領域が5つであるようなシーンに対応する例である。シーン806のシーン情報は、人物数がゼロ、すなわち画像内に人物が検出されなかったシーンに対応する例である。人物数がゼロの場合、最大人物サイズおよび最大人物位置に該当するシーン情報は存在しない。図10において、これらの存在しない情報は、記号「*」で表している。 Scene information of the scene 801 is an example corresponding to the example of FIG. In FIG. 9A, since there is only one person area (area 701), the scene information regarding the person area of the scene 801 is the number of persons = 1, and the maximum person size and the maximum person position are the size of the area 701. (H1 × V1) and coordinates (x1, y1). In FIG. 10, H1 = 400, V1 = 500, x1 = 500, and y1 = 300 are respectively described as numerical values corresponding to H1, V1, x1, and y1 (scene information of the scene 801). The scene information of the scene 802 is an example corresponding to a scene having one person area, as in the example of FIG. The scene information of the scene 803 is an example corresponding to the example of FIG. In FIG. 9B, there are two person regions (regions 703 and 704), and the region with the larger area is the region 703. Accordingly, the scene information related to the person area of the scene 803 is the number of persons = 2, and the maximum person size and the maximum person position are the size (H2 × V2) and coordinates (x2, y2) of the area 703. In FIG. 10, H2 = 360, V2 = 480, x2 = 400, and y2 = 500 are respectively described as numerical values corresponding to H2, V2, x2, and y2 (scene information of the scene 803). The scene information of the scene 804 is an example corresponding to the example of FIG. In FIG. 9C, there are four person regions (regions 707, 708, 709, and 710), and the region 707 has the largest area. Accordingly, the scene information related to the person area of the scene 804 is the number of persons = 4, and the maximum person size and the maximum person position are the size (H4 × V4) and coordinates (x4, y4) of the area 707. In FIG. 10, H4 = 450, V4 = 520, x4 = 100, and y4 = 300 are described as numerical values corresponding to H4, V4, x4, and y4 (scene information of the scene 804). The scene information of the scene 805 is an example corresponding to a scene having five person areas. The scene information of the scene 806 is an example corresponding to a scene where the number of persons is zero, that is, no person is detected in the image. When the number of persons is zero, there is no scene information corresponding to the maximum person size and the maximum person position. In FIG. 10, these non-existing information is represented by the symbol “*”.
 上記説明においては、シーン情報800に関して、「最大人物サイズ」は、人物領域に対応する矩形領域の水平方向と垂直方向の画素数で表し、「最大人物位置」は、画像内の左上画素を原点とする、上記矩形領域の左上の画素の座標で表すものとして説明した。ただし、人物領域のうち、顔画像に対応する領域の形は、矩形ではなく円形でもよく、その場合、「最大人物サイズ」は、円の直径に対応する画素数で表してもよい。また、「最大人物位置」に対応する座標は、領域の左上でなく領域の中央の画素の座標でもよい。 In the above description, regarding the scene information 800, “maximum person size” is represented by the number of pixels in the horizontal and vertical directions of the rectangular area corresponding to the person area, and “maximum person position” is the upper left pixel in the image as the origin. In the above description, it is expressed by the coordinates of the upper left pixel of the rectangular area. However, the shape of the region corresponding to the face image in the person region may be a circle instead of a rectangle, and in this case, the “maximum person size” may be expressed by the number of pixels corresponding to the diameter of the circle. Further, the coordinates corresponding to the “maximum person position” may be the coordinates of the pixel at the center of the area instead of the upper left of the area.
 シーン情報生成部102aは、以上説明した人物情報(人物数、最大人物サイズ、最大人物位置)を含むシーン情報を生成し、生成したシーン情報をダイジェスト動画像生成部103aへ出力する。 The scene information generation unit 102a generates scene information including the person information (number of persons, maximum person size, maximum person position) described above, and outputs the generated scene information to the digest moving image generation unit 103a.
  (ダイジェスト動画像生成部103a)
 ダイジェスト動画像生成部103aは、シーン情報生成部102aによって生成されたシーン情報を読み込み、画像データ分類部101によって分類された画像データ群もしくはイベント選択部104が選択した画像データ群を対象として、ダイジェスト動画像を生成する。図8(b)に、本実施形態におけるダイジェスト動画像生成部103aの内部構成を示す。ダイジェスト動画像生成部103aは、対象画像抽出部1031、シーン種類決定部1032a、シーン空間配置部1033a、シーン時間配置部1034、ダイジェスト化制御部1035を含んで構成される。以下、実施形態1との相違点を中心に詳細に説明する。
(Digest video generation unit 103a)
The digest moving image generation unit 103a reads the scene information generated by the scene information generation unit 102a, and digests the image data group classified by the image data classification unit 101 or the image data group selected by the event selection unit 104 as a target. Generate a moving image. FIG. 8B shows an internal configuration of the digest moving image generating unit 103a in the present embodiment. The digest moving image generation unit 103a includes a target image extraction unit 1031, a scene type determination unit 1032a, a scene space arrangement unit 1033a, a scene time arrangement unit 1034, and a digest control unit 1035. Hereinafter, the differences from the first embodiment will be mainly described.
 対象画像抽出部1031は、イベント選択部104から通知される対象の画像データ群を示す選択情報を参照して、ダイジェスト動画像を生成する際の入力画像を抽出する。抽出した画像データを示す情報を、シーン種類決定部1032aおよびシーン空間配置部1033aへ通知する。シーン種類決定部1032aは、シーン情報生成部102aで生成されたシーン情報を参照し、対象画像抽出部1031で抽出された画像データを示す情報に対応するシーンのシーン情報を読み込み、シーンの種類を決定する。 The target image extraction unit 1031 refers to the selection information indicating the target image data group notified from the event selection unit 104, and extracts an input image when generating a digest moving image. Information indicating the extracted image data is notified to the scene type determination unit 1032a and the scene space arrangement unit 1033a. The scene type determination unit 1032a refers to the scene information generated by the scene information generation unit 102a, reads the scene information of the scene corresponding to the information indicating the image data extracted by the target image extraction unit 1031, and determines the scene type. decide.
 以下、シーン種類決定部1032aにおいてシーンの種類を決定する過程を、図10を参照しながら説明する。シーン種類決定部1032aは、シーン情報800を参照して、撮影時刻の順に連続する2つのシーンの撮影時刻を比較し、両者の撮影時刻の差ΔTがシーン近接判定閾値THt以内であるか超えるか、すなわち時間的に近接するか否かによって、各シーンを、単独で使用する「単独シーン」か、組み合わせて使用する「組み合わせシーン」に決定する。シーン近接判定閾値THt=300秒とすると、シーン801と802の撮影時刻の差ΔTは、ΔT=1分41秒=101秒<THtであるので、シーン801とシーン802は「組み合わせシーン」に決定する。同様に、シーン803とシーン804は、時間的に近接するため「組み合わせシーン」に決定し、シーン805とシーン806も、時間的に近接するため「組み合わせシーン」に決定する。シーン種類決定部1032aは、組み合わせシーンに決めた各シーンに関して、そのシーン情報に含まれる人物情報(人物数、最大人物サイズ、最大人物位置)、動き情報、会話情報を参照し、主要なシーンであると判定したら主シーンに、主要ではないシーンであると判定したら副シーンに、各シーンを分類する。図10におけるシーン801とシーン802に関しては、どちらも「人物数」=1であるので、両方とも主要なシーンであると判定して主シーンに分類する。シーン803とシーン804は、それぞれ「人物数」が2、4であるので、シーン803を主シーンに分類し、シーン804を副シーンに分類する。シーン805と806は、それぞれ「人物数」が5、0であるので、シーン805を主シーンに、シーン806を副シーンに分類する。このように、シーン種類決定部1032aは、組み合わせシーンにおいて、シーン内の「人物数」がより少ない(ただしゼロではない)シーンを主要であると判定し、相対的に「人物数」が多いシーンを主要でないと判定する。「人物数」がゼロのシーンは、「人物数」がゼロでないシーンよりも主要でないと判定する。組み合わせシーン同士の「人物数」が同一であれば双方を主要であると判定して、両方とも主シーンに分類する。 Hereinafter, the process of determining the scene type in the scene type determination unit 1032a will be described with reference to FIG. The scene type determination unit 1032a refers to the scene information 800, compares the shooting times of two scenes that are consecutive in the order of the shooting times, and determines whether the difference ΔT between the two shooting times is within the scene proximity determination threshold THt or exceeds it. That is, depending on whether or not they are close in time, each scene is determined as a “single scene” used alone or a “combined scene” used in combination. Assuming that the scene proximity determination threshold THt = 300 seconds, the difference ΔT between the shooting times of the scenes 801 and 802 is ΔT = 1 minute 41 seconds = 101 seconds <THt, so the scenes 801 and 802 are determined as “combined scenes”. To do. Similarly, the scenes 803 and 804 are determined as “combination scenes” because they are close in time, and the scenes 805 and 806 are also determined as “combination scenes” because they are close in time. The scene type determination unit 1032a refers to person information (number of persons, maximum person size, maximum person position), motion information, and conversation information included in the scene information for each scene determined as a combination scene, If it is determined that there is a scene, the main scene is classified. If it is determined that the scene is not a main scene, each scene is classified into a sub-scene. Since both the scene 801 and the scene 802 in FIG. 10 have “number of persons” = 1, both are determined to be main scenes and classified as main scenes. Since the scenes 803 and 804 have “number of persons” of 2 and 4, respectively, the scene 803 is classified as a main scene and the scene 804 is classified as a sub-scene. Since the scenes 805 and 806 have “number of persons” of 5 and 0, respectively, the scene 805 is classified as a main scene and the scene 806 is classified as a sub-scene. As described above, the scene type determination unit 1032a determines that a scene having a smaller (number of people) in the scene (but not zero) is the main in the combination scene, and a scene having a relatively larger number of persons. Is determined not to be major. A scene with a “number of persons” of zero is determined to be less important than a scene with a non-zero “number of persons”. If the “number of persons” of the combination scenes is the same, it is determined that both are main, and both are classified as main scenes.
 シーン空間配置部1033aは、各シーンの空間的な配置を決定し、シーンを空間的に配置した画像クリップを生成して、シーン時間配置部1034へ出力する。シーン空間配置部1033aは、シーン種類決定部1032aが決定したシーンの種類および組み合わせシーン同士のシーン情報の関係に基づいて、各シーンの空間的な配置(レイアウト)を決定する。シーン空間配置部1033aにおいてシーンのレイアウトを決定する方法は、前述の、シーン空間配置部1033における方法と基本的には同様であるが、シーン空間配置部1033aにおいては、シーン情報に含まれる「人物数」を、シーン空間配置部1033におけるレイアウト決定の基準に用いた「人物情報」と対応させて、レイアウト決定に利用する。例えば、シーン情報に含まれる「人物数」が1または2である場合は、「人物情報」が「主要人物(1)」であるシーンと同等の扱いとする。また、「人物数」が3以上である場合は、「人物情報」が「その他人物(2)」であるシーンと同等の扱いとする。また、「人物数」が0である場合は、「人物情報」が「人物なし(0)」であるシーンと同等の扱いとする。その他、シーン空間配置部1033aによるレイアウト決定方法に関して、シーン空間配置部1033との違いは、シーン情報によって示される「最大人物位置」に応じたシーンの配置位置の制御、および、「最大人物サイズ」や「人物数」に応じたエフェクト制御である。 The scene space arrangement unit 1033a determines the spatial arrangement of each scene, generates an image clip in which the scene is arranged spatially, and outputs the image clip to the scene time arrangement unit 1034. The scene space arrangement unit 1033a determines the spatial arrangement (layout) of each scene based on the scene type determined by the scene type determination unit 1032a and the scene information relationship between the combination scenes. The method for determining the layout of the scene in the scene space arrangement unit 1033a is basically the same as the method in the scene space arrangement unit 1033 described above, but in the scene space arrangement unit 1033a, the “person” included in the scene information is displayed. The “number” is used for layout determination in association with the “person information” used as the layout determination reference in the scene space arrangement unit 1033. For example, when the “number of persons” included in the scene information is 1 or 2, it is handled in the same way as a scene whose “person information” is “main person (1)”. Further, when the “number of persons” is 3 or more, it is handled in the same manner as a scene in which “person information” is “other person (2)”. Further, when the “number of persons” is 0, it is handled in the same way as a scene where the “person information” is “no person (0)”. In addition, regarding the layout determination method by the scene space arrangement unit 1033a, the difference from the scene space arrangement unit 1033 is that the control of the arrangement position of the scene according to the “maximum person position” indicated by the scene information and the “maximum person size” And effect control according to the “number of persons”.
 図11~図13に、シーン空間配置部1033aによるシーンの配置位置制御およびエフェクト制御に関する処理例を示す。図11におけるシーン901とシーン902は、図10におけるシーン801、802に対応するものとする。シーン801と802は、前述の通り互いに組み合わせシーンであり、両方とも主シーンであるので、シーン空間配置部1033aは、両シーンのレイアウトを、シーン901、902を互いに同等のサイズで並列に表示する配置である「並列配置」に決定する(図11(c))。この時、シーン空間配置部1033aは、各シーンのシーン情報に含まれる「最大人物位置」が示す領域(領域911および912)を、それぞれ中央付近に含むような領域(領域921および922)を決定し、これら領域921、922を、それぞれシーン901、902の画像から切り出して、出力する画像930内の領域931、932に配置する。 FIG. 11 to FIG. 13 show processing examples related to scene arrangement position control and effect control by the scene space arrangement unit 1033a. A scene 901 and a scene 902 in FIG. 11 correspond to the scenes 801 and 802 in FIG. Since the scenes 801 and 802 are combined scenes as described above, and both are main scenes, the scene space layout unit 1033a displays the layouts of both scenes in parallel with the scenes 901 and 902 having the same size. The arrangement is determined to be “parallel arrangement” (FIG. 11C). At this time, the scene space arrangement unit 1033a determines the areas (areas 921 and 922) each including the area (area 911 and 912) indicated by the “maximum person position” included in the scene information of each scene near the center. These areas 921 and 922 are cut out from the images of the scenes 901 and 902, respectively, and are arranged in areas 931 and 932 in the output image 930.
 次の例として、図12におけるシーン903とシーン904は、図10におけるシーン803、804に対応するものとする。シーン803と804は、前述の通り互いに組み合わせシーンであり、それぞれ主シーンと副シーンである。さらに、主シーンであるシーン803は、「人物数」=2であるため人物情報が「主要人物(1)」であるシーンと同等の扱いになるので、シーン空間配置部1033aは、両シーンのレイアウトを、副シーン904を、出力する画像940内の領域全体に表示しながら、画面中央部分の領域941に主シーン903を重畳表示する配置である「中央配置」に決定する(図12(c))。この時、シーン空間配置部1033aは、主シーンである903のシーン情報に含まれる「最大人物位置」が示す領域(領域913)を中央付近に含むような領域923を決定し、この領域923をシーン903の画像から切り出して、出力する画像940内の領域941に配置する。 As a next example, scenes 903 and 904 in FIG. 12 correspond to scenes 803 and 804 in FIG. Scenes 803 and 804 are combined scenes as described above, and are a main scene and a sub-scene, respectively. Further, since the scene 803 which is the main scene is “number of persons” = 2, it is handled in the same way as the scene whose person information is “main person (1)”. The layout is determined to be “center arrangement”, which is an arrangement in which the main scene 903 is superimposed and displayed on the area 941 in the center portion of the screen while the sub-scene 904 is displayed over the entire area in the output image 940 (FIG. 12C). )). At this time, the scene space arrangement unit 1033a determines an area 923 that includes the area (area 913) indicated by the “maximum person position” included in the scene information of the main scene 903, and this area 923 is determined as the area 923. It is cut out from the image of the scene 903 and arranged in an area 941 in the output image 940.
 次の例として、図13におけるシーン905とシーン906は、図10におけるシーン805、806に対応するものとする。シーン805とシーン806は、前述の通り互いに組み合わせシーンであり、それぞれ主シーンと副シーンである。さらに、主シーンであるシーン805は、「人物数」=5であるため人物情報が「その他人物(2)」であるシーンと同等の扱いになるので、シーン空間配置部1033aは、両シーンのレイアウトを、主シーン905を、出力する画像950内の領域全体951に表示しながら、副シーン906を縮小した画像を子画面領域952として主シーン上に重畳する配置である「子画面配置」に決定する(図13(c))。この時、シーン空間配置部1033aは、主シーンである805のシーン情報に含まれる「最大人物位置」が示す領域(領域915)が、重畳する副シーンに隠されないように、出力する画像950内に重畳する子画面領域952の位置を決定する。その際、シーン空間配置部1033aは、子画面領域の位置として、画面内の四隅いずれかであって、「最大人物位置」が示す領域(領域915)の位置から最も距離の遠い位置を選択して、子画面領域952の位置を決定する。なお、子画面領域952の位置は、主シーンのシーン情報に含まれる「最大人物位置」と重ならない位置であれば、画面内の四隅に限らず、別の位置に決定してもよい。 As a next example, scenes 905 and 906 in FIG. 13 correspond to scenes 805 and 806 in FIG. As described above, the scene 805 and the scene 806 are combined scenes, and are a main scene and a sub-scene, respectively. Further, since the scene 805 which is the main scene is “number of persons” = 5, the scene information is treated in the same way as the scene whose person information is “other person (2)”. While displaying the main scene 905 in the entire area 951 in the output image 950, the layout is changed to “sub-screen arrangement” which is an arrangement in which an image obtained by reducing the sub-scene 906 is superimposed on the main scene as a sub-screen area 952. It decides (FIG.13 (c)). At this time, the scene space arrangement unit 1033a causes the region (region 915) indicated by the “maximum person position” included in the scene information of the main scene 805 to be output in the image 950 so that it is not hidden by the sub-scene to be superimposed. The position of the sub-screen area 952 to be superimposed on is determined. At that time, the scene space arrangement unit 1033a selects a position that is farthest from the position of the area (area 915) indicated by the “maximum person position” as one of the four corners in the screen as the position of the sub-screen area. Thus, the position of the small screen area 952 is determined. Note that the position of the sub-screen area 952 is not limited to the four corners in the screen as long as it does not overlap with the “maximum person position” included in the scene information of the main scene.
 以上のようにシーン空間配置部1033aが複数シーンのレイアウトを決定することによって、主要なシーンにおける大きな被写体(例えば、注目されやすい人物領域)が、同一画面内に配置される別シーンとの境界にかかって画面内に収まらなかったり、主要でない他のシーンで隠されたりするケースを回避することができ、結果として観賞しやすいダイジェスト動画像を生成することができる。 As described above, the scene space placement unit 1033a determines the layout of a plurality of scenes, so that a large subject (for example, a person area that is likely to be noticed) in the main scene becomes a boundary with another scene placed in the same screen. Therefore, it is possible to avoid cases that do not fit within the screen or are hidden in other scenes that are not main, and as a result, it is possible to generate a digest moving image that is easy to watch.
 次に、シーン空間配置部1033aによるエフェクト制御について説明する。シーン空間配置部1033aは、複数シーンのレイアウトを決定する際、さらに、一部のシーンに空間フィルタを適用し、主シーンと副シーンの違いを強調した画像にしてもよい。例えば図12のような「中央配置」の画像940における領域942に平滑化フィルタを施すことによって、主シーンを表示した中央領域941と副シーンを表示した周辺領域942との間で画像の鮮鋭度に差をつけ、注目すべき領域をより明確にする。この時、シーン空間配置部1033aは、シーン情報に含まれる「最大人物サイズ」に応じて、平滑化フィルタの強度を制御する。例えば、主シーンのシーン情報に含まれる「最大人物サイズ」HSmainと、副シーンのシーン情報に含まれる「最大人物サイズ」HSsubの比HSratio(=HSmain/HSsub)を定義し、HSratioの大きさに反比例するように、平滑化フィルタの強度を制御する。例えば、シーン空間配置部1033aは、平滑化フィルタの平滑化度を制御するパラメータFfとして、平滑化度が弱い順にα、β、γの3種類のパラメータを使用するものとする。その際、HSratioが小さい時にパラメータγを選択し、HSratioが大きい時にパラメータαを選択するように制御する。図14(a)は、HSratioと平滑化フィルタの強度Ffの関係の一例を示すグラフである。主シーン内と副シーン内の最大人物サイズの差が小さい時(HSratio:小)は、主シーンと副シーンを重畳表示した際に、副シーンの画像が主シーンの画像の観賞の妨げになりやすい(両者を混同しやすい)ため、副シーンを表示する領域(領域942)にかける平滑化を強くして、主シーンと副シーンの鮮鋭度の差を大きくする。一方、主シーン内と副シーン内の最大人物サイズの差が大きい時(HSratio:大)は、主シーンと副シーンを重畳表示した際に、副シーンの画像が主シーンの画像の観賞の妨げになりにくいため、副シーンを表示する領域(領域942)にかける平滑化を弱くして、主シーンと副シーンの鮮鋭度の差を小さくする。主シーン内と副シーン内の最大人物サイズの差がさらに大きい時は、平滑化をかけないようにしてもよい(図14(a)において、HSratio>r3の時)。平滑化フィルタによって、主シーンと副シーンの鮮鋭度に差をつける目的は、主シーンへの注目度を高めることが主であるが、鮮鋭度に差をつけ過ぎると、ダイジェスト動画像を観賞する際に、副シーンが何を写したものか判別しにくくなり、複数シーンを空間的に配置した効果が半減してしまう。従って、主シーンへの注目の妨げになりにくいシーンに関しては、平滑化フィルタを弱めて鮮鋭度にあまり差をつけないようにするか、平滑化フィルタ自体をかけないようにする。このような構成により、主要なシーンへの注目度を高めながら、複数シーンを様々なレイアウトで空間的に配置する際の、表示(見え方)のバリエーションをさらに増やし、ダイジェスト動画像を観賞する際に、より見やすく、またユーザを飽きさせにくい動画像を提供することを可能にするものである。 Next, effect control by the scene space arrangement unit 1033a will be described. When determining the layout of a plurality of scenes, the scene space arrangement unit 1033a may further apply a spatial filter to a part of the scenes to make an image that emphasizes the difference between the main scene and the sub-scene. For example, by applying a smoothing filter to a region 942 in an image 940 “centered” as shown in FIG. 12, the sharpness of the image between the central region 941 displaying the main scene and the peripheral region 942 displaying the sub-scene. To make the areas of interest clearer. At this time, the scene space arrangement unit 1033a controls the strength of the smoothing filter in accordance with the “maximum person size” included in the scene information. For example, a ratio HSratio (= HSmain / HSsub) of “maximum person size” HSmain included in the scene information of the main scene and “maximum person size” HSsub included in the scene information of the sub-scene is defined, and the size of HSratio is set. Control the strength of the smoothing filter to be inversely proportional. For example, the scene space arrangement unit 1033a uses three types of parameters α, β, and γ in order of increasing smoothness as the parameter Ff that controls the smoothness of the smoothing filter. At that time, control is performed such that the parameter γ is selected when the HS ratio is small, and the parameter α is selected when the HS ratio is large. FIG. 14A is a graph showing an example of the relationship between HSratio and the smoothing filter strength Ff. When the difference in maximum person size between the main scene and the sub scene is small (HSratio: small), when the main scene and the sub scene are superimposed, the sub scene image obstructs the viewing of the main scene image. Since it is easy (they are easily confused), the smoothing applied to the sub-scene display area (area 942) is strengthened to increase the sharpness difference between the main scene and the sub-scene. On the other hand, when the maximum person size difference between the main scene and the sub-scene is large (HSratio: large), when the main scene and the sub-scene are superimposed and displayed, the sub-scene image hinders viewing of the main scene image. Therefore, the difference in sharpness between the main scene and the sub-scene is reduced by reducing the smoothing applied to the sub-scene display area (area 942). When the difference between the maximum person sizes in the main scene and the sub-scene is larger, smoothing may not be applied (when HS Ratio> r3 in FIG. 14A). The purpose of differentiating the sharpness of the main scene and the sub-scene by the smoothing filter is mainly to increase the degree of attention to the main scene, but if the difference in sharpness is too large, the digest video will be viewed. In this case, it becomes difficult to determine what the sub-scene is copied, and the effect of spatially arranging a plurality of scenes is halved. Therefore, for a scene that is unlikely to obstruct attention to the main scene, the smoothing filter is weakened so that the sharpness is not significantly different, or the smoothing filter itself is not applied. With such a configuration, while increasing the degree of attention to the main scene, when displaying multiple scenes in various layouts, the display (appearance) variations will be further increased and the digest video will be viewed. In addition, it is possible to provide a moving image that is easier to see and less likely to get tired of the user.
 図14(b)に、シーン空間配置部1033aによる、平滑化フィルタ強度Ffの制御に関する、別の例を示す。図14(b)は、平滑化フィルタの強度Ffを、副シーンのシーン情報に含まれる「人物数」HNsubによって決定する際の、HNsubとFfの関係の一例を示すグラフである。グラフに示したように、シーン空間配置部1033aは、HNsubが小さい時に、平滑化度の強いパラメータγを選択し、HSsubが大きい時に、平滑化度の弱いパラメータαを選択する。HNsubが0の時には、平滑化自体をかけないように制御してもよい(図14(b)において、0≦HNsub<n1の時)。この方法に従えば、主シーンのシーン情報は参照せず副シーンのシーン情報のみから、簡便に平滑化の強度を決定することができる。平滑化の対象は副シーンであるため、副シーンのシーン情報(人物数)に従って平滑化フィルタの強度を制御すれば、主シーンの注目度を高めつつ、副シーンの画像を効果的に使ったダイジェスト動画像を生成することができる。また、図14(a)に示した関係と図14(b)に示した関係の両方を満たすように、平滑化フィルタ強度Ffを制御してもよい。例えば、Ffは、α、β、γの3種類だけでなく、多数の係数を選択可能にしておき、まず、副シーンの「人物数」HNsubに基づいて大まかな平滑化フィルタ強度Ffを決定し、その後、主シーンの「最大人物サイズ」HSmainと、副シーンの「最大人物サイズ」HSsubの比HSratio(=HSmain/HSsub)に基づいて、より細かくFfを制御する。このような構成により、適用するフィルタ強度の種類を増やすことができ、複数シーンを様々なレイアウトで空間的に配置する際の、表示(見え方)のバリエーションをさらに増やすことができる。 FIG. 14B shows another example relating to the control of the smoothing filter strength Ff by the scene space arrangement unit 1033a. FIG. 14B is a graph showing an example of the relationship between HNsub and Ff when the smoothing filter strength Ff is determined by the “number of persons” HNsub included in the scene information of the sub-scene. As shown in the graph, the scene space arrangement unit 1033a selects a parameter γ having a high degree of smoothing when HNsub is small, and selects a parameter α having a low degree of smoothing when HSsub is large. When HNsub is 0, it may be controlled not to perform smoothing itself (when 0 ≦ HNsub <n1 in FIG. 14B). According to this method, the smoothing intensity can be easily determined from only the scene information of the sub-scene without referring to the scene information of the main scene. Since the target of smoothing is a sub-scene, if the strength of the smoothing filter is controlled according to the scene information (number of persons) of the sub-scene, the sub-scene image is effectively used while increasing the attention of the main scene. A digest moving image can be generated. Further, the smoothing filter strength Ff may be controlled so as to satisfy both the relationship shown in FIG. 14A and the relationship shown in FIG. For example, Ff can select not only three types of α, β, and γ, but also a large number of coefficients. First, the rough smoothing filter strength Ff is determined based on the “number of persons” HNsub of the sub-scene. Thereafter, Ff is controlled more finely based on the ratio HSratio (= HSmain / HSsub) of the “maximum person size” HSmain of the main scene and the “maximum person size” HSsub of the sub-scene. With such a configuration, it is possible to increase the types of filter strengths to be applied, and to further increase display (appearance) variations when a plurality of scenes are spatially arranged in various layouts.
 上記で説明した平滑化フィルタの強度Ffは、例えば、平滑化フィルタとして単純画素間引きを行う場合の、間引き画素数を示すパラメータであってもよい。図14に示した例においては、例えばα=2、β=4、γ=8と定め、シーン空間配置部1033aは、α、β、γの逆数の割合になるように画素を間引き、その後、元の画素数になるように画素を補間することによって、画像の平滑化を行う。例えば、Ff=α=2なら、平滑化対象の画像の画素数が、水平方向・垂直方向ともに1/2になるように画素を間引き、その後、間引かれた画素位置の画素値を、間引き後の(残された)画素値をコピーして補間する。パラメータFfの数値=1の場合は、間引きを行わないことを意味し、その場合、平滑化は行われない。あるいは、平滑化フィルタの強度Ffは、例えば、平滑化フィルタとして移動平均フィルタを使用する場合の、フィルタを適用する画素範囲に対応する窓サイズを示すパラメータであってもよい。図14に示した例においては、例えばα=3、β=5、γ=9と定め、シーン空間配置部1033aは、α、β、γが示す窓サイズの単位で(例えばFf=α=3なら、3×3画素の窓サイズ)、画素値の平均化を行うことによって、画像の平滑化を行う。前述の例と同様に、パラメータFfの数値=1の時は平滑化を行わない。これらの例に限らず、平滑化フィルタの強度Ffは、ガウシアンフィルタや重みづけフィルタ等、シーン空間配置部1033aが使用する平滑化フィルタの方法に応じて、あらかじめ定めた係数セットを示すパラメータでもよい。 The smoothing filter strength Ff described above may be, for example, a parameter indicating the number of thinned pixels when performing simple pixel thinning as a smoothing filter. In the example shown in FIG. 14, for example, α = 2, β = 4, and γ = 8 are set, and the scene space arrangement unit 1033a thins out the pixels so that the reciprocal of α, β, and γ is obtained. The image is smoothed by interpolating the pixels to the original number of pixels. For example, if Ff = α = 2, the pixels are thinned out so that the number of pixels of the image to be smoothed is ½ in both the horizontal direction and the vertical direction, and then the pixel values at the thinned pixel positions are thinned out. Copy and interpolate later (remaining) pixel values. When the value of the parameter Ff = 1, this means that thinning is not performed, and in this case, smoothing is not performed. Alternatively, the smoothing filter strength Ff may be, for example, a parameter indicating a window size corresponding to a pixel range to which a filter is applied when a moving average filter is used as the smoothing filter. In the example shown in FIG. 14, for example, α = 3, β = 5, and γ = 9 are determined, and the scene space arrangement unit 1033a is a unit of the window size indicated by α, β, γ (for example, Ff = α = 3). If so, the image is smoothed by averaging the pixel values. Similar to the above example, smoothing is not performed when the value of the parameter Ff = 1. In addition to these examples, the smoothing filter strength Ff may be a parameter indicating a predetermined coefficient set according to the smoothing filter method used by the scene space arrangement unit 1033a, such as a Gaussian filter or a weighting filter. .
 なお、シーン空間配置部1033aが施す空間フィルタは平滑化フィルタに限らず、領域ごとの色調を変える色変換フィルタでもよい。例えば、シーン空間配置部1033aは副シーンの画像を平滑化する代わりに、その彩度を変更しても良い。例えば、副シーンの領域942内の画素に関して、画素の彩度を、前述のHSratioやHNsubに比例するように変化させる。例えば、図14(c)に示すような、彩度Sが0~Smaxまでの範囲で、HSratioと彩度Sの関係を示す特性を定め、その特性に合わせるように副シーンの領域942内の画素値を変換する。ここで、Smaxは、画素値を変換する前の、対象副シーン内の最大彩度を意味する。このような構成により、主シーン内と副シーン内の最大人物サイズの差が小さい時(HSratio:小)は、主シーンと副シーンを重畳表示した際に、副シーンの画像が主シーンの画像の観賞の妨げになりやすい(両者を混同しやすい)ため、副シーンの彩度Sを下げることによって主シーンと副シーンの彩度に差をつけて、主シーンの領域を目立ちやすくする。その際、HSratioが所定の閾値より小さい場合は、副シーンの彩度S=0にする(図14(c)において、HSratio<r0の時)、すなわちグレイスケールの画像にすることによって、主シーンとの彩度の差を特に強調するようにしてもよい。また、HSratioが所定の閾値より大きい場合は、副シーンの彩度を変更しないようにしてもよい(図14(c)において、HSratio>r4の時)。なお、図14(c)のようなHSratioと彩度Sの関係を示す特性の代わりに、HNsubと彩度Sの関係を示す同様の特性に基づいて、副シーンの彩度を変換してもよい。以上のように、主シーンと副シーンのシーン情報の関係に応じて、副シーンを配置する領域の彩度を下げるように画素値を変換することによって、副シーンの領域(例えば領域942)はグレイスケールの画像に近づくか、もしくはグレイスケールの画像になり、同一画面上に配置した主シーンの領域(例えば領域941)を目立ちやすくすることができる。あるいは、シーン空間配置部1033aは、空間フィルタを施すのではなく、副シーンの領域942の画像の時間方向の変化をゼロにする、すなわち静止画にすることで、主シーンの領域941との違いを強調してもよい。 Note that the spatial filter applied by the scene space arrangement unit 1033a is not limited to the smoothing filter, and may be a color conversion filter that changes the color tone of each region. For example, the scene space arrangement unit 1033a may change the saturation instead of smoothing the sub-scene image. For example, with respect to the pixels in the sub-scene region 942, the saturation of the pixels is changed so as to be proportional to the HSratio or HNsub described above. For example, as shown in FIG. 14 (c), a characteristic indicating the relationship between HSratio and saturation S is defined in the range from 0 to Smax in saturation S, and the sub-scene region 942 is adjusted to match that characteristic. Convert pixel values. Here, Smax means the maximum saturation in the target sub-scene before the pixel value is converted. With such a configuration, when the difference in maximum person size between the main scene and the sub-scene is small (HSratio: small), the sub-scene image is the main scene image when the main scene and the sub-scene are superimposed and displayed. Therefore, by reducing the saturation S of the sub scene, the saturation of the main scene and the sub scene is differentiated to make the main scene area stand out. At this time, if HSratio is smaller than a predetermined threshold value, the saturation of the sub-scene is set to S = 0 (when HSratio <r0 in FIG. 14C), that is, a gray scale image is obtained. The saturation difference may be particularly emphasized. If HSratio is greater than a predetermined threshold, the saturation of the sub-scene may not be changed (when HSratio> r4 in FIG. 14C). Note that the saturation of the sub-scene may be converted based on the same characteristic indicating the relationship between HNsub and saturation S instead of the characteristic indicating the relationship between HSratio and saturation S as shown in FIG. Good. As described above, the sub-scene region (for example, the region 942) is converted by changing the pixel value so as to lower the saturation of the region where the sub-scene is arranged according to the relationship between the scene information of the main scene and the sub-scene. An area of the main scene (for example, the area 941) arranged on the same screen can be made conspicuous as it approaches a gray scale image or becomes a gray scale image. Alternatively, the scene space arrangement unit 1033a does not apply a spatial filter, but makes the change in the time direction of the image in the sub-scene region 942 zero, that is, makes a still image, so that the difference from the main scene region 941 is different. May be emphasized.
 以上説明したように、本実施形態にかかわる映像編集装置100aによれば、複数のシーンを組み合わせて空間的に配置するダイジェスト動画像の生成において、主要なシーンにおける人物の領域など、注目されやすい画像領域を見やすく配置したダイジェスト動画像を提供することができる。また、同一画面内に空間的に配置した主要なシーンと主要でないシーンとの特徴の違いに応じて、鮮鋭度や色味など、シーン間の画像に差をつけることによって、主要なシーンへの注目度を高めながら、複数シーンを様々なレイアウトで空間的に配置する際の、表示(見え方)のバリエーションをさらに増やし、ダイジェスト動画像を観賞する際に、より見やすく、またユーザを飽きさせにくい動画像を提供することができる。 As described above, according to the video editing apparatus 100a according to the present embodiment, in generating a digest moving image in which a plurality of scenes are combined and spatially arranged, an image that is easily noticed, such as a human region in a main scene. It is possible to provide a digest moving image in which an area is easily seen. In addition, depending on the difference in characteristics between major and non-major scenes spatially arranged on the same screen, sharpness, color, etc. While increasing the degree of attention, the number of display (viewing) variations when spatially arranging multiple scenes in various layouts is further increased, and it is easier to see and less tired of the user when watching a digest video. A moving image can be provided.
 なお、第3の実施形態の説明においては、各シーンの特徴を示す画像内の特徴領域として人物(顔画像や全身画像)を検出する例を示したが、人物の代わりに別の被写体を特徴領域として検出し、それら特徴領域に対応する「領域数」、「最大領域サイズ」、「最大領域位置」を示す情報を、人物情報の代わりにシーン情報に含めるように構成してもよい。人物以外の被写体を特徴領域として検出する方法としては、前述のHaar-Like特徴量やHOG特徴量を利用する際に、注目したい被写体、例えば、動物(犬、鳥等)、乗り物(車両、航空機等)などに関する特徴量に対応する識別器をあらかじめ用意しておき、それら識別器に基づいて画像内の被写体を検出(識別)すればよい。 In the description of the third embodiment, an example is shown in which a person (a face image or a whole body image) is detected as a feature area in an image showing the features of each scene. However, another subject is used instead of a person. Information that is detected as a region and indicates the “number of regions”, “maximum region size”, and “maximum region position” corresponding to the feature regions may be included in the scene information instead of the person information. As a method for detecting a subject other than a person as a feature region, when using the Haar-Like feature amount or the HOG feature amount described above, a subject to be focused on, for example, an animal (dog, bird, etc.), a vehicle (vehicle, aircraft) Etc.) in advance, and a subject in the image may be detected (identified) based on the identifier.
  (第4の実施形態)
 次に、本発明に係る第4の実施形態である映像編集装置について説明する。第4の実施形態の映像編集装置は、第1の実施形態の映像編集装置に対して、ダイジェスト動画像生成部に含まれる対象画像抽出部、シーン空間配置部およびシーン時間配置部に違いがある点が特徴である。第4の実施形態において、映像編集装置100bは、ダイジェスト動画像生成部103bを含むように構成され、ダイジェスト動画像生成部103bは、対象画像抽出部1031b、シーン空間配置部1033b、シーン時間配置部1034bを含んで構成される。図17に、本実施形態による映像編集装置100bおよびダイジェスト動画像生成部103bの内部構成を示す。
(Fourth embodiment)
Next, a video editing apparatus according to the fourth embodiment of the present invention will be described. The video editing apparatus according to the fourth embodiment is different from the video editing apparatus according to the first embodiment in the target image extraction unit, the scene space arrangement unit, and the scene time arrangement unit included in the digest moving image generation unit. The point is a feature. In the fourth embodiment, the video editing apparatus 100b is configured to include a digest moving image generation unit 103b, and the digest moving image generation unit 103b includes a target image extraction unit 1031b, a scene space arrangement unit 1033b, and a scene time arrangement unit. 1034b. FIG. 17 shows an internal configuration of the video editing apparatus 100b and the digest moving image generating unit 103b according to the present embodiment.
 ダイジェスト動画像生成部103bは、シーン情報生成部102によって生成されたシーン情報を読み込み、画像データ分類部101によって分類された画像データ群もしくはイベント選択部104が選択した画像データ群を対象として、ダイジェスト動画像を生成する。ダイジェスト動画像生成部103bは、対象画像抽出部1031b、シーン種類決定部1032、シーン空間配置部1033b、シーン時間配置部1034b、ダイジェスト化制御部1035を含んで構成される。以下、実施形態1との相違点を中心に詳細に説明する。 The digest moving image generation unit 103b reads the scene information generated by the scene information generation unit 102, and uses the image data group classified by the image data classification unit 101 or the image data group selected by the event selection unit 104 as a target. Generate a moving image. The digest moving image generation unit 103b includes a target image extraction unit 1031b, a scene type determination unit 1032, a scene space arrangement unit 1033b, a scene time arrangement unit 1034b, and a digest control unit 1035. Hereinafter, the differences from the first embodiment will be mainly described.
 対象画像抽出部1031bは、イベント選択部104から通知される対象の画像データ群を示す選択情報を参照して、ダイジェスト動画像を生成する際の入力画像を抽出する。対象画像抽出部1031bは、抽出した画像データを示す情報を、シーン種類決定部1032およびシーン空間配置部1033bへ通知する。その際、対象画像抽出部1031bは、画像データ群識別情報から、画像データ群の名称および画像データ名と画像データの撮影日時を抽出して、シーン空間配置部1033bへ通知する。 The target image extraction unit 1031b refers to the selection information indicating the target image data group notified from the event selection unit 104, and extracts the input image when generating the digest moving image. The target image extraction unit 1031b notifies the scene type determination unit 1032 and the scene space arrangement unit 1033b of information indicating the extracted image data. At that time, the target image extraction unit 1031b extracts the name of the image data group, the image data name, and the shooting date / time of the image data from the image data group identification information, and notifies the scene space arrangement unit 1033b of the extracted image data.
 シーン空間配置部1033bは、第1の実施形態に関して説明したシーン空間配置部1033と同様に、各シーンの空間的な配置を決定し、シーンを空間的に配置した画像クリップを生成する。シーン空間配置部1033bにおいては、さらに、画像クリップを生成する際に、画像情報を示すテキスト画像を重畳する機能と、追加の画像クリップとしてタイトル画像を生成する機能を有する点が、第1の実施形態との相違点である。 The scene space arrangement unit 1033b determines the spatial arrangement of each scene and generates an image clip in which the scene is arranged spatially in the same manner as the scene space arrangement unit 1033 described with respect to the first embodiment. The scene space arrangement unit 1033b further has a function of superimposing a text image indicating image information when generating an image clip and a function of generating a title image as an additional image clip. This is a difference from the form.
 図18に、シーン空間配置部1033bが生成する画像クリップの例を示す。図18(a)は、シーン空間配置部1033bが生成するタイトル画面の例である。タイトル画面1000は、例えば黒一色の背景1001に白色のテキスト1002を重畳した画像であり、例えば5秒間程度の静止した画像である。シーン空間配置部1033bは、対象画像抽出部1031b経由で通知される画像データ群の名称を示すテキスト1002を、別途生成した背景画像1001の上に重畳させることにより、タイトル画面1000を生成する。図18(b)は、シーン空間配置部1033bが生成する、各シーン単位の画像情報を示すテキスト情報を含む画像クリップの例である。画像クリップ1003は、シーン1004とシーン1005を空間的に配置した画像クリップであって(図11(c)の画像930に相当)、シーン1004および1005上にそれぞれの撮影日時情報を示すテキストを重畳した画像である。シーン空間配置部1033bは、対象画像抽出部1031b経由で通知される画像データ群識別情報に含まれるシーン1004およびシーン1005の各画像データ(図15におけるDSC_2001.movおよびDSC_2002.mov)に対応する撮影日時情報を示すテキスト(1006、1007)を、各シーン上に重畳させることにより、画像クリップ1003を生成する。 FIG. 18 shows an example of an image clip generated by the scene space arrangement unit 1033b. FIG. 18A shows an example of a title screen generated by the scene space layout unit 1033b. The title screen 1000 is, for example, an image in which white text 1002 is superimposed on a black background 1001 and is a still image of about 5 seconds, for example. The scene space arrangement unit 1033b generates the title screen 1000 by superimposing the text 1002 indicating the name of the image data group notified via the target image extraction unit 1031b on the separately generated background image 1001. FIG. 18B is an example of an image clip including text information indicating image information for each scene, which is generated by the scene space arranging unit 1033b. The image clip 1003 is an image clip in which the scene 1004 and the scene 1005 are spatially arranged (corresponding to the image 930 in FIG. 11C), and text indicating the shooting date / time information is superimposed on the scenes 1004 and 1005. It is an image. The scene space arrangement unit 1033b captures images corresponding to the image data of the scene 1004 and the scene 1005 (DSC_2001.mov and DSC_2002.mov in FIG. 15) included in the image data group identification information notified via the target image extraction unit 1031b. An image clip 1003 is generated by superimposing text (1006, 1007) indicating date and time information on each scene.
 シーン時間配置部1034bは、第1の実施形態に関して説明したシーン時間配置部1034と同様に、シーン空間配置部1033bが生成した画像クリップ同士を、時間方向に結合する。その際、シーン時間配置部1034bは、シーン空間配置部1033bが生成した前記タイトル画面の画像クリップが時間的に先頭に位置するように、各画像クリップを時間方向に結合する。 The scene time arrangement unit 1034b combines the image clips generated by the scene space arrangement unit 1033b in the time direction in the same manner as the scene time arrangement unit 1034 described with reference to the first embodiment. At that time, the scene time arrangement unit 1034b combines the image clips in the time direction so that the image clip of the title screen generated by the scene space arrangement unit 1033b is positioned at the head in time.
 このように、入力画像データの日時情報や位置情報に基づいてタイトル画面を自動生成することで、ダイジェスト動画像を生成する際のユーザ操作の手間を軽減すると共に、生成されたダイジェスト動画像をユーザが観賞する際に、いつどこで撮影した画像データを対象としたダイジェストであるかが一目でわかる。従って、撮影してから日数を経た画像データを含むダイジェスト動画像を生成または観賞する場合でも、撮影時の状況を思い出しやすくなり、ダイジェスト動画像に対するユーザの満足度を高められる効果がある。さらに、シーン単位で撮影日時情報を重畳することで、ユーザがダイジェスト動画像の観賞後、シーン単位でじっくり観賞したいと思った場合に、画像データの特定をしやすくなるという効果も得られる。 Thus, by automatically generating the title screen based on the date / time information and position information of the input image data, it is possible to reduce the trouble of user operation when generating the digest moving image, and the generated digest moving image is used by the user. Can see at a glance when and where the digest is for image data taken. Therefore, even when a digest moving image including image data that has passed the number of days after shooting is generated or viewed, the situation at the time of shooting can be easily remembered, and the user's satisfaction with the digest moving image can be improved. Furthermore, by superimposing the shooting date and time information in units of scenes, it is possible to easily specify image data when the user wants to watch the digest moving images carefully after watching the digest moving images.
 なお、シーン空間配置部1033bにおいて、各シーンの上に撮影日時情報を示すテキストを重畳するかどうかは、ユーザの選択に従ってあらかじめ決定しておいてもよい。その場合、ダイジェスト化制御部1035が、ユーザの選択に従ってテキストを重畳するか否かをシーン空間配置部1033bに通知し、シーン空間配置部1033bはその通知に従って、撮影日時情報を示すテキストを各シーンに重畳するかどうかの制御を切り替える。また、画像クリップに撮影日時情報を示すテキストを重畳する際には、全シーン単位にテキストを重畳する代わりに、例えば主シーンの撮影日時情報だけ重畳する構成にしてもよい。 In the scene space arrangement unit 1033b, whether or not to superimpose text indicating shooting date / time information on each scene may be determined in advance according to the user's selection. In that case, the digest control unit 1035 notifies the scene space layout unit 1033b whether or not to superimpose text according to the user's selection, and the scene space layout unit 1033b sends the text indicating the shooting date / time information to each scene according to the notification. Toggle control of whether or not to superimpose. In addition, when superimposing the text indicating the shooting date / time information on the image clip, for example, only the shooting date / time information of the main scene may be superimposed instead of superimposing the text for every scene.
 以上説明したような構成を備えることによって、本実施形態に係る映像編集装置によれば、大量・多数の静止画像や動画像を、ユーザが手間をかけずに短時間で確認・観賞することができ、また、画像を表示する表示装置に適した品質および見やすい画像で観賞でき、さらに、同じ画像データ群を何度でも飽きずに観賞できる、という効果が得られる。 By providing the configuration as described above, the video editing apparatus according to the present embodiment allows a user to confirm and view a large number and a large number of still images and moving images in a short time without trouble. In addition, it is possible to view the image with a quality suitable for a display device that displays the image and an easy-to-view image, and further, it is possible to view the same image data group again and again without getting tired.
  (第5の実施形態)
 以下、図面を参照しながら本発明の実施形態について説明する。なお、説明の便宜上、前記の実施形態において示した部材と同一の機能を有する部材には、同一の符号を付し、その説明を省略する。
(Fifth embodiment)
Hereinafter, embodiments of the present invention will be described with reference to the drawings. For convenience of explanation, members having the same functions as those shown in the above embodiment are given the same reference numerals, and the explanation thereof is omitted.
 図23は、本発明に係る第5の実施形態である映像編集装置の構成を示す概略図である。 FIG. 23 is a schematic diagram showing the configuration of a video editing apparatus according to the fifth embodiment of the present invention.
 映像編集装置100cは、対象画像データ抽出部109、シーン情報生成部102、再生時間候補導出部110、再生時間候補表示部111、ダイジェスト動画像生成部103cを含んで構成される。映像編集装置100cはさらに、図示は省略するが、画像データを格納するデータ記録部や、画像を表示する映像表示部を内部に備えてもよいし、それらと同様の機能を備えるデータ記録装置や映像表示装置を、外部に接続可能な構成であってもよい。 The video editing apparatus 100c includes a target image data extraction unit 109, a scene information generation unit 102, a reproduction time candidate derivation unit 110, a reproduction time candidate display unit 111, and a digest moving image generation unit 103c. Although not shown, the video editing apparatus 100c may further include a data recording unit that stores image data, a video display unit that displays images, or a data recording device that has the same functions as those described above. The video display device may be configured to be connectable to the outside.
 次に、映像編集装置100cの各機能ブロックについて述べる。 Next, each functional block of the video editing apparatus 100c will be described.
 対象画像データ抽出部109は、画像データに含まれるメタデータに基づいて、所定の条件に合致する画像データを抽出する。そして、抽出した画像データを画像データ群としてまとめる。 The target image data extraction unit 109 extracts image data that meets a predetermined condition based on the metadata included in the image data. The extracted image data is collected as an image data group.
 例えば、ダイジェスト動画像の編集を行う編集日を基準にして、前日に撮影された画像データ、すなわち撮影日時が編集日の前日である画像データを編集対象に決定する。また、編集日ではなく、ユーザが指定する指定日時を基準として、撮影日時がその指定日時の前後の画像データを編集対象に決定しても良い。また、対象画像データ抽出部109が編集対象と決定する画像データは、日時情報だけでなく、位置情報や作成者情報に基づいてもよい。例えば、ユーザが指定する位置情報またはその位置を含む所定範囲内の位置情報を有する画像データを、編集対象に決定してもよい。あるいは、所定範囲内の位置情報を有する画像データが、異なる作成者別に複数ある中で、特定の作成者情報を有する画像データのみを編集対象に決定してもよいし、逆に、特定の作成者情報を有する画像データを除外した画像データを、編集対象に決定してもよい。対象画像データ抽出部109が編集対象として決定する画像データは、1つとは限らず、2つ以上でもよい。なお、対象画像データ抽出部109が、編集対象とする画像データを決定するタイミングとして、一日の切り替わりをトリガにしてもよい。例えば、午前0時を過ぎた時点で、その前日に撮影された画像データを編集対象と決定してもよい。対象画像データ抽出部109は、画像データ群をダイジェスト動画像生成部103cに出力する。 For example, on the basis of the editing date on which the digest moving image is edited, the image data shot on the previous day, that is, the image data whose shooting date is the previous day of the editing date is determined as the editing target. Further, based on the designated date and time specified by the user, not the editing date, the image data with the shooting date and time before and after the designated date and time may be determined as the editing target. Further, the image data determined by the target image data extraction unit 109 as the editing target may be based not only on the date / time information but also on position information and creator information. For example, image data having position information specified by the user or position information within a predetermined range including the position may be determined as an editing target. Alternatively, among a plurality of image data having position information within a predetermined range for different creators, only image data having specific creator information may be determined to be edited, and conversely, a specific creation The image data excluding the image data having the person information may be determined as an editing target. The image data determined by the target image data extraction unit 109 as an editing target is not limited to one, and may be two or more. Note that the target image data extraction unit 109 may use a day change as a trigger as the timing for determining image data to be edited. For example, when midnight has passed, image data captured on the previous day may be determined as an editing target. The target image data extraction unit 109 outputs the image data group to the digest moving image generation unit 103c.
 また、対象画像データ抽出部109は、抽出した全ての画像データの再生時間を合計することで総再生時間を算出する。対象画像データ抽出部109は、総再生時間を再生時間候補導出部110に出力する。 Also, the target image data extraction unit 109 calculates the total reproduction time by summing the reproduction times of all the extracted image data. The target image data extraction unit 109 outputs the total reproduction time to the reproduction time candidate derivation unit 110.
 再生時間候補導出部110は、対象画像データ抽出部109から入力された総再生時間に基づいて、ダイジェスト動画像の再生時間候補を導出する。導出方法としては、総再生時間の平方根を再生時間候補として算出する。単位が「分」である総再生時間の平方根を算出し、小数点以下を切り捨てた値を再生時間候補とする。例えば、総再生時間が1時間の場合、60の平方根から小数点以下を切り捨てた値である7が再生時間候補となる。再生時間候補導出部110は、導出した再生時間候補を再生時間候補表示部111に出力する。 The reproduction time candidate derivation unit 110 derives digest video reproduction time candidates based on the total reproduction time input from the target image data extraction unit 109. As a derivation method, the square root of the total playback time is calculated as a playback time candidate. The square root of the total playback time in the unit of “minute” is calculated, and a value obtained by rounding down the decimal point is set as a playback time candidate. For example, when the total playback time is 1 hour, 7 which is a value obtained by rounding down the decimal point from the square root of 60 is a playback time candidate. The reproduction time candidate derivation unit 110 outputs the derived reproduction time candidate to the reproduction time candidate display unit 111.
 再生時間候補表示部111は、再生時間候補導出部110から入力された再生時間候補を、図示しない表示装置に表示する。なお、表示装置には、タッチパネルやマウスなどのユーザの入力手段が備わっているものとする。再生時間候補表示部111は、入力手段を介してユーザイベントを受け取り、ユーザイベントによって選択された再生時間候補を指定時間とする。再生時間候補表示部111は、指定時間をダイジェスト動画像生成部103cに出力する。 The playback time candidate display unit 111 displays the playback time candidates input from the playback time candidate derivation unit 110 on a display device (not shown). It is assumed that the display device includes user input means such as a touch panel and a mouse. The reproduction time candidate display unit 111 receives a user event via the input means, and sets the reproduction time candidate selected by the user event as a designated time. The reproduction time candidate display unit 111 outputs the designated time to the digest moving image generation unit 103c.
 図24は、本実施形態の映像編集装置100cにおいて、ダイジェスト動画像の再生時間を指定するユーザインターフェースの一例である。ユーザは「ダイジェスト動画像再生時間」表示の下側に表示されているバー31のボタン32を左右にスライドさせることで所望の再生時間を選択することが出来る。バー31の下側には、指定できる再生時間の最小値と最大値が表示される。図3の場合は、最小値として1分、最大値として再生時間候補である7分が表示されている。例えば、ユーザがボタン32をバー31の左端までスライドさせた場合、指定時間は1分となり、右端までスライドさせた場合、指定時間は7分となり、バー31の中間にボタン32をスライドさせた場合、指定時間は1分と7分の中間値である4分となる。本実施形態では、再生時間をバー31上のボタン32をスライドさせることで選択する例について説明したが、プルダウンメニューから再生時間を選択するようにしてもよいし、数値入力するようにしてもよい。 FIG. 24 is an example of a user interface for designating the playback time of the digest moving image in the video editing apparatus 100c of the present embodiment. The user can select a desired reproduction time by sliding the button 32 of the bar 31 displayed on the lower side of the “digest moving image reproduction time” display to the left and right. Below the bar 31, the minimum value and the maximum value of the playback time that can be specified are displayed. In the case of FIG. 3, the minimum value is 1 minute, and the maximum value is 7 minutes, which is a reproduction time candidate. For example, when the user slides the button 32 to the left end of the bar 31, the specified time is 1 minute, and when the user slides to the right end, the specified time is 7 minutes, and the button 32 is slid in the middle of the bar 31. The specified time is 4 minutes which is an intermediate value between 1 minute and 7 minutes. In this embodiment, the example in which the playback time is selected by sliding the button 32 on the bar 31 has been described. However, the playback time may be selected from a pull-down menu, or a numerical value may be input. .
 次に、ダイジェスト動画像生成部103cが実施する、ダイジェスト動画像生成の処理内容について述べる。図25は、本実施例の映像編集装置100cによるダイジェスト動画像の生成過程を示す概念図である。図25に示す通り、映像編集装置100cは、画像データ301のうち、選択された画像データの集合である画像データ群302を対象として、対応するシーン情報303を読み込み、再生時間候補表示部111から入力された指定時間に従ってダイジェスト動画像を生成する。ダイジェスト動画像生成の対象となる画像データ群302は、例えば、ある一日に撮影された全画像データである。この画像データ群は、対象画像データ抽出部109によって決定される。画像データ群302は、シーン情報生成部102によって1つ以上のシーンに分類され、シーン単位の特徴を示す情報であるシーン情報が生成される。次に、ダイジェスト動画像生成部103cは、シーン情報を撮影年月日と撮影時刻の早い順番で参照し、単独で使用するシーンと、他のシーンと組み合わせて使うシーンなど、シーンの種類を決定する。そして、ダイジェスト動画像生成部103cは、決定したシーンの種類に基づいて、各シーンを空間的に配置した画像データである画像クリップ306a、306b、306c、…を生成し、複数の画像クリップを時間的に結合することでダイジェスト動画像307を生成する。画像クリップ306a、306b、306c等は、少なくとも一つのシーンを含む動画像であるが、静止画像を含んでいてもかまわない。 Next, processing contents of digest moving image generation performed by the digest moving image generating unit 103c will be described. FIG. 25 is a conceptual diagram showing a digest moving image generation process by the video editing apparatus 100c of the present embodiment. As shown in FIG. 25, the video editing apparatus 100c reads the corresponding scene information 303 for the image data group 302, which is a set of selected image data, from the image data 301, and from the reproduction time candidate display unit 111. A digest moving image is generated according to the input specified time. A group of image data 302 for which a digest moving image is to be generated is, for example, all image data photographed on a certain day. This image data group is determined by the target image data extraction unit 109. The image data group 302 is classified into one or more scenes by the scene information generation unit 102, and scene information, which is information indicating the feature of each scene, is generated. Next, the digest moving image generation unit 103c refers to the scene information in the order of shooting date and shooting time, and determines the type of scene such as a scene to be used alone and a scene to be used in combination with another scene. To do. Based on the determined scene type, the digest moving image generation unit 103c generates image clips 306a, 306b, 306c,. Are combined to generate a digest moving image 307. The image clips 306a, 306b, 306c, and the like are moving images including at least one scene, but may include still images.
 また、ダイジェスト動画像生成部103cは、生成したダイジェスト動画像の再生時間が指定時間となるように、ダイジェスト動画像を調整する。ここで、「再生時間が指定時間となる」とは、再生時間と指定時間が一致することを意味してもよいし、再生時間と指定時間の間に多少の差がある状態を意味しても良いこととする。 Also, the digest moving image generating unit 103c adjusts the digest moving image so that the reproduction time of the generated digest moving image becomes the specified time. Here, “the playback time becomes the specified time” may mean that the playback time matches the specified time, or means that there is a slight difference between the playback time and the specified time. Also good.
 例えば、図26の(A)に示すように、ダイジェスト動画像50Aが画像クリップ51から57によって構成されており、ダイジェスト動画像50Aの最後の画像クリップ57の再生中に指定時間が過ぎてしまう場合でも、再生時間が指定時間となったと見なしてよい。 For example, as shown in FIG. 26A, the digest moving image 50A is composed of image clips 51 to 57, and the specified time passes during the reproduction of the last image clip 57 of the digest moving image 50A. However, it may be considered that the playback time has reached the specified time.
 また、図26の(B)に示すように、ダイジェスト動画像50Bが画像クリップ51から56によって構成されており、ダイジェスト動画像50Bの再生時間が指定時間より短いが、画像クリップをもう1つ、図26の(B)の場合には画像クリップ57、を結合すると、ダイジェスト動画像50Bの再生時間が指定時間より長くなってしまう場合も、再生時間が指定時間となったと見なしてよい。 Also, as shown in FIG. 26B, the digest moving image 50B is composed of image clips 51 to 56, and the playback time of the digest moving image 50B is shorter than the specified time, but one more image clip, In the case of FIG. 26B, when the image clip 57 is combined, even if the playback time of the digest moving image 50B becomes longer than the specified time, it may be considered that the playback time has reached the specified time.
 つまり、再生時間と指定時間の差が、画像クリップ1つ分以下であれば、再生時間が指定時間となったと見なしてよい。他にも、再生時間と指定時間の差を許容する範囲として、具体的な数値、例えば30秒や1分などとしてもよいし、指定時間に対する割合、例えば指定時間の1%としてもよい。 That is, if the difference between the playback time and the specified time is less than or equal to one image clip, the playback time may be regarded as the specified time. In addition, as a range allowing the difference between the reproduction time and the designated time, a specific numerical value such as 30 seconds or 1 minute may be used, or a ratio with respect to the designated time, for example, 1% of the designated time may be used.
 図27は、本実施形態におけるダイジェスト動画像生成部103cの内部構成である。ダイジェスト動画像生成部103cは、シーン種類決定部1032、シーン空間配置部1033、シーン時間配置部1034、ダイジェスト動画像編集部1036を含んで構成される。シーン種類決定部1032、シーン空間配置部1033及びシーン時間配置部1034の処理内容は第1の実施形態と同様である。 FIG. 27 shows the internal configuration of the digest moving image generating unit 103c in the present embodiment. The digest moving image generating unit 103c includes a scene type determining unit 1032, a scene space arranging unit 1033, a scene time arranging unit 1034, and a digest moving image editing unit 1036. The processing contents of the scene type determination unit 1032, the scene space arrangement unit 1033, and the scene time arrangement unit 1034 are the same as those in the first embodiment.
  (ダイジェスト動画像の再生時間調整)
 ダイジェスト動画像生成部103cがダイジェスト動画像を生成する際の、再生時間の調整方法について説明する。
(Adjustment of digest video playback time)
A method for adjusting the reproduction time when the digest moving image generating unit 103c generates a digest moving image will be described.
 ダイジェスト動画像編集部1036は、シーン時間配置部1034が出力したダイジェスト動画像を編集することにより、ダイジェスト動画像の再生時間を調整する。 The digest moving image editing unit 1036 adjusts the reproduction time of the digest moving image by editing the digest moving image output from the scene time arranging unit 1034.
 ダイジェスト動画像編集部1036は、ダイジェスト動画像の再生時間が指定時間となっている場合には、入力されたダイジェスト動画像をそのまま出力する。 The digest moving image editing unit 1036 outputs the input digest moving image as it is when the reproduction time of the digest moving image is the designated time.
 ダイジェスト動画像編集部1036は、ダイジェスト動画像の再生時間が指定時間となっていない場合には、ダイジェスト動画像の再生時間が指定時間となるように編集する。 The digest moving image editing unit 1036 edits the digest moving image so that the reproduction time of the digest moving image becomes the specified time when the reproduction time of the digest moving image is not the specified time.
 具体的には、ダイジェスト動画像再生時間が指定時間より長い場合には、ダイジェスト動画像編集部1036は、ダイジェスト動画像に含まれる各画像クリップを短くする。まず、ダイジェスト動画像編集部1036は、動きのない画像クリップの再生時間を短くすることで、ダイジェスト動画像の再生時間を調整する。具体的には、ダイジェスト動画像の先頭から順に画像クリップに含まれるシーンのシーン情報の動き情報を参照し、画像クリップに含まれる全てのシーンの動き情報が「動きなし(0)」の場合には、画像クリップのフレームを間引くことで再生時間を短くする。例えば、単純間引きでフレーム数を半分にすることで、画像クリップの再生時間を半分に、即ち再生速度を2倍にする。図28は、ダイジェスト動画像編集部1036における画像クリップの再生時間を短くする処理を説明するための概念図である。画像クリップ60Aは動き情報が「動きなし(0)」のシーンのみで構成されている画像クリップである。フレーム61からフレーム66は、画像クリップ60Aを構成するフレームであり、フレーム61からフレーム66の順に時系列で並んでいる。ダイジェスト動画像編集部1036は、ダイジェスト動画像再生時間が指定時間より長い場合には、画像クリップ60A中の2フレームにつき1フレーム、図28の場合、フレーム62、フレーム64、フレーム66・・・を削除することで、画像クリップ60Aに対してフレーム数が半分である画像クリップ60Bにする。画像クリップ60Bを表示する際のフレームレートは画像クリップ60Aと同じため、画像クリップ60Bは画像クリップ60Aの再生速度を2倍にした画像クリップとなる。ダイジェスト動画像編集部1036は、ダイジェスト動画像再生時間が指定時間となるまで、上記処理を繰り返す。 Specifically, when the digest video playback time is longer than the specified time, the digest video editing unit 1036 shortens each image clip included in the digest video. First, the digest moving image editing unit 1036 adjusts the reproduction time of the digest moving image by shortening the reproduction time of the image clip having no motion. Specifically, referring to the motion information of the scene information of the scene included in the image clip in order from the beginning of the digest moving image, the motion information of all the scenes included in the image clip is “no motion (0)”. Reduces the playback time by thinning out the frames of the image clip. For example, by halving the number of frames by simple decimation, the reproduction time of the image clip is halved, that is, the reproduction speed is doubled. FIG. 28 is a conceptual diagram for explaining processing for shortening the reproduction time of an image clip in the digest moving image editing unit 1036. The image clip 60A is an image clip composed of only a scene whose motion information is “no motion (0)”. Frames 61 to 66 are frames constituting the image clip 60 </ b> A, and are arranged in chronological order from the frame 61 to the frame 66. When the digest moving image playback time is longer than the specified time, the digest moving image editing unit 1036 displays one frame for every two frames in the image clip 60A, and in the case of FIG. 28, the frame 62, the frame 64, the frame 66. By deleting the image clip 60B, the number of frames is half that of the image clip 60A. Since the frame rate for displaying the image clip 60B is the same as that of the image clip 60A, the image clip 60B is an image clip in which the playback speed of the image clip 60A is doubled. The digest moving image editing unit 1036 repeats the above processing until the digest moving image reproduction time reaches the specified time.
 ダイジェスト動画像編集部1036は、ダイジェスト動画像の最後の画像クリップまで上記処理を実施した後、ダイジェスト動画像再生時間が指定時間となっていない場合には、画像クリップに含まれる全てのシーンの動き情報が「動きなし(0)」ではない画像クリップの、一部分を切り取ることで、ダイジェスト動画像の再生時間を調整する。具体的には、ダイジェスト動画像再生時間をTd、指定時間をTsとすると、ダイジェスト動画像編集部1036は、各画像クリップの再生時間Tiが、Ts/Td倍になるように画像クリップの一部を切り取る。例えば、画像クリップの先頭から、画像クリップの再生時間Tiに対して1-(Ts/Td)倍に相当する時間を切り取る。また、切り取る箇所は先頭以外に、画像クリップの最後の部分でもよいし、先頭と最後両方から切り取っても良い。 The digest moving image editing unit 1036 performs the above processing up to the last image clip of the digest moving image, and if the digest moving image playback time is not the specified time, the motion of all scenes included in the image clip The playback time of the digest moving image is adjusted by cutting out a part of the image clip whose information is not “no motion (0)”. More specifically, assuming that the digest video playback time is Td and the specified time is Ts, the digest video editing unit 1036 has a part of the image clip so that the playback time Ti of each image clip is Ts / Td times. Cut out. For example, a time corresponding to 1− (Ts / Td) times of the reproduction time Ti of the image clip is cut from the head of the image clip. In addition to the beginning, the part to be cut may be the last part of the image clip, or may be cut from both the beginning and the end.
 また、ダイジェスト動画像再生時間が指定時間より短い場合には、まず、ダイジェスト動画像編集部1036は、動きのない画像クリップの再生時間を長くすることで、ダイジェスト動画像の再生時間を調整する。具体的には、ダイジェスト動画像の先頭から順に画像クリップに含まれるシーンのシーン情報の動き情報を参照し、画像クリップに含まれる全てのシーンの動き情報が「動きなし(0)」の場合には、画像クリップのフレームを補間することで再生時間を長くする。例えば、各フレームの間にフレームを1個補間することで、画像クリップの再生時間を2倍に、即ち再生速度を1/2倍にする。また別の例として、各フレームの間にフレームを2個補間した後、偶数番目のフレームを削除することで、画像クリップの再生時間を1.5倍、即ち再生速度を2/3倍にする。図29は、ダイジェスト動画像編集部1036における画像クリップの再生時間を長くする処理を説明するための概念図である。画像クリップ70Aは動き情報が「動きなし(0)」のシーンのみで構成されている画像クリップである。フレーム71、74、77は、画像クリップ70Aを構成するフレームであり、フレーム71、74、77の順に時系列で並んでいる。ダイジェスト動画像編集部36は、ダイジェスト動画像再生時間が指定時間より短い場合には、まず、画像クリップ70A中のフレームの間に2フレームを、図29の場合、フレーム72、フレーム73、フレーム75、フレーム76・・・をフレーム補間により補間することで、画像クリップ70Aに対してフレーム数が3倍である画像クリップ70Bにする。次に、ダイジェスト動画像編集部1036は、画像クリップ70B中の2フレームに1フレームを、図29の場合、フレーム72、フレーム74、フレーム76・・・を削除することで、画像クリップ70Bに対してフレーム数が半分である画像クリップ70Cにする。画像クリップ70Cのフレーム数は画像クリップ70Aの3/2倍であり、表示する際のフレームレートは画像クリップ70Aと同じため、画像クリップ70Cは画像クリップ70Aの再生速度を2/3倍にした画像クリップとなる。フレーム補間の具体的な方法は、特に限定しないが、例えば、線形補間やフレーム間の動きを推定し、動きに基づいて内挿する方法でもよい。ダイジェスト動画像編集部1036は、ダイジェスト動画像再生時間が指定時間となるまで、上記処理を繰り返す。 If the digest moving image playback time is shorter than the specified time, first, the digest moving image editing unit 1036 adjusts the playback time of the digest moving image by increasing the playback time of the image clip having no motion. Specifically, referring to the motion information of the scene information of the scene included in the image clip in order from the beginning of the digest moving image, the motion information of all the scenes included in the image clip is “no motion (0)”. Extends the playback time by interpolating the frames of the image clip. For example, by interpolating one frame between each frame, the reproduction time of the image clip is doubled, that is, the reproduction speed is halved. As another example, by interpolating two frames between each frame and then deleting even-numbered frames, the playback time of the image clip is 1.5 times, that is, the playback speed is 2/3 times. . FIG. 29 is a conceptual diagram for explaining processing for extending the playback time of an image clip in the digest moving image editing unit 1036. The image clip 70A is an image clip composed of only a scene whose motion information is “no motion (0)”. Frames 71, 74, and 77 are frames constituting the image clip 70A, and are arranged in time series in the order of the frames 71, 74, and 77. When the digest moving image playback time is shorter than the specified time, the digest moving image editing unit 36 first sets two frames between the frames in the image clip 70A, and in the case of FIG. 29, the frame 72, the frame 73, and the frame 75. The frames 76... Are interpolated by frame interpolation to obtain an image clip 70B having three times as many frames as the image clip 70A. Next, the digest moving image editing unit 1036 deletes one frame out of two frames in the image clip 70B, and in the case of FIG. 29, deletes the frame 72, the frame 74, the frame 76,. Thus, the image clip 70C has half the number of frames. Since the number of frames of the image clip 70C is 3/2 times that of the image clip 70A and the frame rate at the time of display is the same as that of the image clip 70A, the image clip 70C is an image obtained by doubling the playback speed of the image clip 70A. It becomes a clip. Although the specific method of frame interpolation is not particularly limited, for example, linear interpolation or a method of estimating motion between frames and interpolating based on the motion may be used. The digest moving image editing unit 1036 repeats the above processing until the digest moving image reproduction time reaches the specified time.
 ダイジェスト動画像編集部1036は、ダイジェスト動画像の最後の画像クリップまで上記処理を実施した後、ダイジェスト動画像再生時間が指定時間となっていない場合には、ダイジェスト動画像に含まれる画像クリップから、ランダムに選択した画像クリップをダイジェスト動画像の最後に結合する。ただし、同一の画像クリップが連続で再生されないように、ランダムに選択した画像クリップが、ダイジェスト動画像の最後の画像クリップと同一の場合には、選択した画像クリップの結合をスキップしてもよい。ダイジェスト動画像編集部1036は、ダイジェスト動画像再生時間が指定時間となるまで、上記処理を繰り返す。 The digest moving image editing unit 1036 performs the above processing up to the last image clip of the digest moving image, and when the digest moving image playback time is not the specified time, from the image clip included in the digest moving image, Randomly selected image clips are combined at the end of the digest video. However, when the randomly selected image clip is the same as the last image clip of the digest moving image so that the same image clip is not continuously reproduced, the combination of the selected image clips may be skipped. The digest moving image editing unit 1036 repeats the above processing until the digest moving image reproduction time reaches the specified time.
 また、別の調整方法として、画像クリップの切り替え時にトランジションのような映像効果を用いる方法もある。映像効果が再生されている間は、画像クリップの再生は停止するため、ダイジェスト動画像再生時間を長くすることが出来る。ダイジェスト動画像編集部1036は、画像クリップ間の撮影時刻の差が大きい箇所から順に映像効果を挿入することで、ダイジェスト動画像再生時間Tdを長くする。ダイジェスト動画像編集部1036は、ダイジェスト動画像再生時間が指定時間となるか、全ての画像クリップ間に映像効果を挿入するまで、上記処理を繰り返す。全ての画像クリップ間に映像効果を挿入しても、ダイジェスト動画像再生時間が指定時間とならない場合には、上述した、ランダムに選択した画像クリップをダイジェスト動画像の最後に結合する方法を用いる。なお、挿入する具体的な映像効果は、特に限定しないが、例えば、トランジションの1種であるクロスフェード、ディゾルブ、くさび型ワイプを用いてもよい。 Also, as another adjustment method, there is a method of using a video effect such as a transition when switching image clips. While the video effect is being played back, the playback of the image clip is stopped, so that the digest video playback time can be extended. The digest moving image editing unit 1036 extends the digest moving image reproduction time Td by inserting video effects in order from the location where the difference in shooting time between image clips is large. The digest moving image editing unit 1036 repeats the above processing until the digest moving image playback time reaches the specified time or video effects are inserted between all the image clips. If the digest video playback time does not reach the specified time even if the video effect is inserted between all the image clips, the above-described method of combining the randomly selected image clip at the end of the digest video is used. The specific video effect to be inserted is not particularly limited, but for example, a crossfade, dissolve, or wedge-shaped wipe that is one type of transition may be used.
 以上説明したように、動きのない画像クリップに関しては、単純間引き又はフレーム補間で再生時間を調整することによって、ユーザに違和感を極力与えることなく、かつ希望通りの再生時間のダイジェスト動画像を鑑賞できる、という効果が得られる。 As described above, with regard to an image clip with no motion, by adjusting the playback time by simple decimation or frame interpolation, it is possible to view a digest moving image with a desired playback time without giving the user a sense of discomfort as much as possible. The effect of is obtained.
  (第6の実施形態)
 次に、本発明に係る第6の実施形態である映像編集装置について説明する。第6の実施形態の映像編集装置は、第5の実施形態の映像編集装置に対して、ダイジェスト動画像生成部103cに代えてダイジェスト動画像生成部103dを備える点に違いがある点が特徴である。
(Sixth embodiment)
Next, a video editing apparatus according to the sixth embodiment of the present invention will be described. The video editing apparatus according to the sixth embodiment is different from the video editing apparatus according to the fifth embodiment in that a digest moving image generating unit 103d is provided instead of the digest moving image generating unit 103c. is there.
 第5の実施形態では、画像クリップを短くする時に、画像クリップの一部分を切り取る場合があるが、その際にユーザが視聴したい箇所を切り取ってしまう可能性がある、という課題がある。 In the fifth embodiment, when shortening an image clip, a part of the image clip may be cut out, but there is a problem that the user may cut out a portion that the user wants to view.
 それに対し、本実施形態の映像編集装置では、画像クリップを切り取らずに、ダイジェスト動画像の再生時間が指定時間となる調整方法を提供する。 On the other hand, the video editing apparatus according to the present embodiment provides an adjustment method in which the playback time of the digest moving image becomes the specified time without cutting the image clip.
 図30は、本実施形態におけるダイジェスト動画像生成部103dの内部構成を示す。ダイジェスト動画像生成部103dは、シーン種類決定部1032d、シーン空間配置部1033、シーン時間配置部1034を備えて構成される。シーン空間配置部1033及びシーン時間配置部1034の処理内容は第1の実施形態と同様である。 FIG. 30 shows an internal configuration of the digest moving image generating unit 103d in the present embodiment. The digest moving image generation unit 103d includes a scene type determination unit 1032d, a scene space arrangement unit 1033, and a scene time arrangement unit 1034. The processing contents of the scene space arrangement unit 1033 and the scene time arrangement unit 1034 are the same as those in the first embodiment.
 シーン種類決定部1032dは、シーン種類決定部1032と同様の方法で、シーン情報と閾値THtに基づいて、シーンの種類を決定する。 Scene type determination unit 1032d determines the type of scene based on the scene information and threshold value THt in the same manner as scene type determination unit 1032.
 シーン種類決定部1032dは、シーン情報とシーンの種類から、ダイジェスト動画像再生時間Tdを算出する。再生時間Tdの初期値は0とし、シーンの種類が「単独シーン」であるシーンについては、そのシーンの再生時間を再生時間Tdに加算する。シーンの種類が「複数シーン」である複数のシーンについては、各シーンのうち再生時間の最も短いシーンの再生時間を再生時間Tdに加算する。 The scene type determination unit 1032d calculates the digest video playback time Td from the scene information and the scene type. The initial value of the playback time Td is 0, and for a scene whose scene type is “single scene”, the playback time of that scene is added to the playback time Td. For a plurality of scenes whose scene type is “multiple scenes”, the playback time of the scene with the shortest playback time among the scenes is added to the playback time Td.
 そして、シーン種類決定部1032dは、算出したダイジェスト動画像再生時間Tdが指定時間となっていない場合には、ダイジェスト動画像再生時間Tdが指定時間になるように調整する。 Then, the scene type determination unit 1032d adjusts the digest video playback time Td to be the specified time when the calculated digest video playback time Td is not the specified time.
 具体的な調整方法としては、シーン種類決定部1032dは、ダイジェスト動画像再生時間Tdが指定時間より長い場合には、閾値THtを「単独シーン」より「複数シーン」が選択されやすくなるように変更する。例えば、閾値THtを5分間から10分間に変更する。このような閾値の変更により、ダイジェスト動画像に含まれる「複数シーン」の割合が大きくなるため、ダイジェスト動画像の再生時間が短くなるように調整することが出来る。また、シーン種類決定部1032dは、ダイジェスト動画像再生時間Tdが指定時間Tsより短い場合には、閾値THtを「複数シーン」より「単独シーン」が選択されやすくなるように変更する。例えば、閾値THtを5分間から3分間に変更する。このような閾値の変更により、ダイジェスト動画像に含まれる「単独シーン」の割合が大きくなるため、ダイジェスト動画像の再生時間が長くなるように調整することが出来る。 As a specific adjustment method, the scene type determination unit 1032d changes the threshold THt so that “multiple scenes” are more easily selected than “single scenes” when the digest video playback time Td is longer than the specified time. To do. For example, the threshold value THt is changed from 5 minutes to 10 minutes. By changing the threshold value, the ratio of “multiple scenes” included in the digest moving image increases, so that the reproduction time of the digest moving image can be adjusted to be short. In addition, when the digest moving image playback time Td is shorter than the specified time Ts, the scene type determination unit 1032d changes the threshold value THt so that “single scene” is more easily selected than “multiple scenes”. For example, the threshold value THt is changed from 5 minutes to 3 minutes. By changing the threshold value, the ratio of the “single scene” included in the digest moving image increases, so that the reproduction time of the digest moving image can be adjusted to be longer.
 そして、シーン種類決定部1032dは、変更した閾値THtに基づいて、シーンの種類を決定し、再度ダイジェスト動画像再生時間Tdを算出する。シーン種類決定部1032dは、上記の処理を、ダイジェスト動画像再生時間Tdが指定時間となるまで繰り返す。 Then, the scene type determination unit 1032d determines the scene type based on the changed threshold value THt, and calculates the digest moving image playback time Td again. The scene type determination unit 1032d repeats the above processing until the digest moving image playback time Td reaches the specified time.
 以上説明した方法によると、ダイジェスト動画像の「単独シーン」と「複数シーン」の割合を調整することによって、ユーザが視聴したい画像クリップの一部が削除されることなく、ダイジェスト動画像の再生時間をユーザが所望する再生時間にすることができるため、ユーザがより満足するダイジェスト動画像が生成されるという効果が得られる。 According to the method described above, by adjusting the ratio of “single scene” and “multiple scenes” of the digest video, the playback time of the digest video without deleting part of the image clip that the user wants to view Since the reproduction time desired by the user can be set, an effect of generating a digest moving image that satisfies the user can be obtained.
  (第7の実施形態)
 以下、図面を参照しながら本発明の実施形態について説明する。
(Seventh embodiment)
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 図31は、本発明に係る第1の実施形態である映像編集装置の構成を示す概略図である。映像編集装置100eは、画像データ分類部101、シーン情報生成部102、ダイジェスト動画像生成部103、イベント選択部104、出力制御部105、映像表示部106、ダイジェスト動画像編集制御部107、および操作部108を含んで構成される。映像編集装置100eはさらに、図示は省略するが、画像データを格納するデータ記録部を内部に備えてもよいし、それらと同様の機能を備えるデータ記録装置を外部に接続する構成にしてもよい。なお、画像データ分類部101、シーン情報生成部102、ダイジェスト動画像生成部103、イベント選択部104、出力制御部105の基本的な処理内容は第1の実施形態と同様である。 FIG. 31 is a schematic diagram showing the configuration of the video editing apparatus according to the first embodiment of the present invention. The video editing apparatus 100e includes an image data classification unit 101, a scene information generation unit 102, a digest moving image generation unit 103, an event selection unit 104, an output control unit 105, a video display unit 106, a digest moving image editing control unit 107, and an operation. The unit 108 is configured to be included. Although not shown, the video editing apparatus 100e may include a data recording unit that stores image data inside, or may be configured to connect a data recording apparatus having the same function as the outside to the outside. . The basic processing contents of the image data classification unit 101, the scene information generation unit 102, the digest moving image generation unit 103, the event selection unit 104, and the output control unit 105 are the same as those in the first embodiment.
 映像表示部106は、映像編集装置100eが生成するダイジェスト動画像および、操作に使われるユーザインタフェイス(UI)を含む映像を、表示デバイスに出力する。表示デバイスは、映像編集装置100eに内蔵されているか、外部に接続されている。 The video display unit 106 outputs a video including a digest moving image generated by the video editing apparatus 100e and a user interface (UI) used for operation to a display device. The display device is built in the video editing apparatus 100e or connected to the outside.
 ダイジェスト動画像編集制御部107は、ダイジェスト動画像生成部103が生成したダイジェスト動画像を再生し、画像と音声の同期やフレームレートの調整をしつつ、映像表示部106に出力する。さらにそれと並行して、ユーザからの入力に基づいて映像の編集処理をする。再生されるダイジェスト動画像は、ダイジェスト動画像生成部103がいったん記録メディアに記録メディアに保存した画像データでもよいし、ダイジェスト動画像生成部103から直接入力した画像データでもよい。さらに、映像編集装置100eと同等の他の映像編集装置が生成したダイジェスト動画像を記録メディアに保存した画像データでもよい。ダイジェスト動画像の形式が直接映像表示部106で使用可能な表示データの形式と異なる場合は、映像表示部106が使用可能な表示データに変換する。例として、ダイジェスト動画像がHEVCやAACなどの符号化方式により圧縮されている場合、ダイジェスト動画像編集制御部107は、画像データをデコードして映像表示部106に出力する。さらに、ダイジェスト動画像データが、再生時における映像の配置、変形、切り出し、重ね合わせの処理を含む、映像の生成処理を必要とする形式を用いて保存されている場合には、ダイジェスト動画像生成部103を制御して前記生成処理を行い、映像を取得して再生する。また、ダイジェスト動画像編集制御部107は、再生においては、一時停止/早送り/巻戻し/シーン間の移動などを含む再生制御が可能であることが望ましい。 The digest moving image editing control unit 107 reproduces the digest moving image generated by the digest moving image generating unit 103, and outputs it to the video display unit 106 while synchronizing the image and sound and adjusting the frame rate. In parallel with this, video editing processing is performed based on the input from the user. The digest moving image to be reproduced may be image data once stored in the recording medium by the digest moving image generation unit 103 or may be image data directly input from the digest moving image generation unit 103. Further, it may be image data in which a digest moving image generated by another video editing device equivalent to the video editing device 100e is stored in a recording medium. When the format of the digest moving image is different from the format of the display data that can be directly used by the video display unit 106, the digest moving image is converted into display data that can be used by the video display unit 106. For example, when the digest moving image is compressed by an encoding method such as HEVC or AAC, the digest moving image editing control unit 107 decodes the image data and outputs the decoded image data to the video display unit 106. In addition, if digest video data is stored in a format that requires video generation processing, including video layout, transformation, segmentation, and overlay processing during playback, digest video generation The generation processing is performed by controlling the unit 103 to acquire and reproduce the video. In addition, it is desirable that the digest moving image editing control unit 107 can perform reproduction control including pause / fast forward / rewind / movement between scenes in reproduction.
 操作部108は、表示画面上の位置指定を含む、ユーザからの入力動作を、例えば映像表示部106と一体となったタッチセンサや、外部に接続したマウスやキーボードで検出する。例として、映像編集装置100eが操作部としてタッチセンサを備える場合、ユーザは、タップ、フリック、ピンチなどの一般的操作により入力できる。このほか、録画や再生制御をするためのボタンやキーを備えていてもよい。 The operation unit 108 detects an input operation from the user including a position designation on the display screen by using, for example, a touch sensor integrated with the video display unit 106, a mouse or a keyboard connected to the outside. As an example, when the video editing apparatus 100e includes a touch sensor as an operation unit, the user can input by a general operation such as tap, flick, or pinch. In addition, buttons and keys for recording and playback control may be provided.
 次に、各部についてさらに詳細に説明する。 Next, each part will be described in more detail.
  (出力制御部)
 本実施形態では、出力制御部105は、映像表示部106の画像表示仕様や、図示しない音声出力デバイスの音声出力仕様、および、ダイジェスト動画像に対するユーザの好みを示す情報を含む条件に基づいて、前記生成方針を設定する。ユーザの好みを示す情報は、操作部108あるいはそれ以外の入力手段によって受け取られる。例えば、「人物主体」「風景主体」などの選択肢を映像表示部106により表示してユーザに選ばせるなどの方法があるが、これに限るものではない。また、ユーザの好みを示す情報が入力されない場合は、例えば「人物主体」を標準値として設定するようにするとよい。
(Output control unit)
In the present embodiment, the output control unit 105 is based on a condition including image display specifications of the video display unit 106, audio output specifications of an audio output device (not shown), and information indicating user preferences for the digest moving image. Set the generation policy. Information indicating user preferences is received by the operation unit 108 or other input means. For example, there is a method of displaying options such as “person main” and “landscape main” on the video display unit 106 and allowing the user to select, but the present invention is not limited to this. Further, when information indicating user preferences is not input, for example, “person main” may be set as a standard value.
 出力制御部105は、出力先の映像表示デバイスに応じて、複数シーンを同一画像フレーム内に同時に配置することを許容するか否かを示す情報を、「複数シーン同時配置」として設定する。例えば、映像表示部106が用いる表示デバイスの解像度または画面サイズがある閾値より小さい場合には複数シーン同時配置を「否」に設定し、大きい場合には複数シーン同時配置を「可」に設定する。 The output control unit 105 sets, as “multiple scene simultaneous arrangement”, information indicating whether or not multiple scenes are allowed to be simultaneously arranged in the same image frame in accordance with the video display device of the output destination. For example, when the resolution or screen size of the display device used by the video display unit 106 is smaller than a certain threshold, the simultaneous placement of multiple scenes is set to “No”, and when the resolution is larger, the simultaneous placement of multiple scenes is set to “Yes”. .
 (ダイジェスト動画像編集制御部)
 ダイジェスト動画像編集制御部107は、以上のようにして生成されたダイジェスト動画像を再生および編集する。これはユーザの指示によって開始してもよいし、ダイジェスト動画像の生成が完了した際に開始してもよい。
(Digest video editing control unit)
The digest moving image editing control unit 107 plays back and edits the digest moving image generated as described above. This may be started by an instruction from the user, or may be started when the generation of the digest moving image is completed.
 ダイジェスト動画像編集制御部107は、ダイジェスト動画像を再生する一方、ユーザからの入力を受けて、再生中のダイジェスト動画像に対してさらに編集を加える。図32は、ダイジェスト動画像の再生時の編集処理を示した図である。再生処理そのものや、早送り/巻戻しといった再生制御処理は図示しないが、編集処理と並行して実行される。以下、図中のステップS101からS104について説明する。 The digest moving image editing control unit 107 reproduces the digest moving image, and receives an input from the user and further edits the digest moving image being reproduced. FIG. 32 is a diagram showing an editing process when a digest moving image is reproduced. Although the reproduction process itself and the reproduction control process such as fast forward / rewind are not shown, they are executed in parallel with the editing process. Hereinafter, steps S101 to S104 in the figure will be described.
 ステップS101では、ダイジェスト動画像編集制御部107を用いて、ダイジェスト動画像の再生を開始する。さらに、操作部108からの入力を編集処理として解釈および実行する処理を開始する。 In step S101, the digest moving image editing control unit 107 is used to start reproduction of the digest moving image. Furthermore, a process for interpreting and executing an input from the operation unit 108 as an editing process is started.
 ステップS102は、動画像が再生中かどうかのチェックである。動画像の再生が終了あるいは中断されたことが検出されれば、編集処理は終了する。 Step S102 is a check of whether or not a moving image is being reproduced. If it is detected that the reproduction of the moving image is finished or interrupted, the editing process is finished.
 ステップS103は、入力操作のチェックである。編集処理の指示として解釈可能な操作が入力されているかどうかを調べる。前記操作が入力されていなければ、ステップS102に戻る。 Step S103 is an input operation check. It is checked whether an operation that can be interpreted as an instruction for editing processing is input. If the operation has not been input, the process returns to step S102.
 なお、ステップS102およびステップS103で行われる再生あるいは操作に関するイベントの検出は、定期的あるいは非定期的な割り込みによっても実現できるため、必ずしも図32の順序で実行される必要はない。また、チェックのステップ前には、再生状態の変化や入力の発生を待つための待ち時間を挿入してもよい。 It should be noted that the detection of the event related to the reproduction or operation performed in step S102 and step S103 can be realized by a periodic or non-periodic interruption, and therefore does not necessarily have to be executed in the order shown in FIG. Further, before the check step, a waiting time for waiting for a change in reproduction state or occurrence of input may be inserted.
 ステップS104は、編集操作の実行である。編集操作が発生した場合には、再生中のシーンを編集対象シーンとしてその編集操作の種類に応じた処理が実行される。ダイジェスト動画像の再生は、編集処理の開始時に一時停止され、編集処理の完了後に、編集後のデータを用いて再開される。以下では、ステップS104におけるいくつかの編集操作についてさらに詳しく説明する。 Step S104 is execution of an editing operation. When an editing operation occurs, a process corresponding to the type of editing operation is executed with the scene being reproduced as the scene to be edited. The reproduction of the digest moving image is temporarily stopped at the start of the editing process, and resumed using the edited data after the editing process is completed. Hereinafter, some editing operations in step S104 will be described in more detail.
 編集操作の種類は、操作部からの入力によって区別される。例えば、映像表示部106がタッチパネルを備えることで、画面上の座標の直接指定や、ジェスチャによる編集操作が実現できる。他に、マウスなどのポインティングデバイスによっても、前記の区別は可能である。このように入力デバイスはタッチパネルとは限らないが、ここでは、ユーザにとって最も直感的な操作が可能な、タッチパネルによる操作入力を用いる例を説明する。なお、画面上には、編集操作以外の操作のために、ウィンドウやアイコン、その他GUIの部品が表示されていてもよい。 The type of editing operation is distinguished by input from the operation unit. For example, when the video display unit 106 includes a touch panel, direct designation of coordinates on the screen and editing operation by gestures can be realized. In addition, the above-described distinction can be made by a pointing device such as a mouse. Thus, although an input device is not necessarily a touch panel, the example using the operation input by a touch panel in which the most intuitive operation for a user is possible is demonstrated here. Note that windows, icons, and other GUI components may be displayed on the screen for operations other than the editing operation.
 まず、一般的にみられるタッチパネルの操作(以下、タッチ操作とする)には、様々な種類がある。例えば、タップ(指先で画面を叩く)、ダブルタップ(指先で画面を二度叩く)、フリック(指先を画面に接し、素早く弾くように動かす)、スワイプ(指先を画面に接したまま、一定方向に移動する)、ドラッグ(指先を画面に接したまま移動する。一定方向とは限らない)、ピンチイン(指先を画面に2本またはそれ以上接し、閉じるように近づける)、ピンチアウト(指先を画面に2本またはそれ以上接し、開くように離す)、ツイストまたはローテート(指先を画面に2本またはそれ以上接し、ひねるように動かす)などである。他にも、各操作に用いられた指の本数や、指先の軌跡の位置/形状/速さの違いによって機能を区別する場合もある。以上は一般的なタッチ操作の説明であり、全てが映像編集装置100eの編集操作で使われるとは限らないし、これ以外のタッチ操作を以降で説明するような編集操作に割り当ててもよい。 First, there are various types of touch panel operations (hereinafter referred to as touch operations) generally seen. For example, tap (tap the screen with the fingertip), double tap (tap the screen twice with the fingertip), flick (move the fingertip to touch the screen and play quickly), swipe (with the fingertip touching the screen in a certain direction) ), Drag (move while the fingertip is in contact with the screen, not necessarily in a certain direction), pinch in (two or more fingertips touch the screen and move closer to close), pinch out (fingertip on the screen 2 or more in contact and release to open), twist or rotate (2 or more fingertips touch the screen and move to twist). In addition, the functions may be distinguished depending on the number of fingers used for each operation and the position / shape / speed of the fingertip locus. The above is a description of a general touch operation, and not all of the touch operations are used in the editing operation of the video editing apparatus 100e, and other touch operations may be assigned to the editing operations described below.
 図33はダイジェスト動画像で行われる操作の例を模式的に示している。太枠はダイジェスト動画像の画像領域全体を示し、太枠内にさらに矩形の枠がある場合は、組合せシーンの主シーンまたは副シーンが表示されていることを示す。点線の枠は、編集による変化を示す。矢印は、タッチ操作のおよその軌跡や長さを示す。また、タッチ操作が開始された座標を起点座標、タッチ操作が終了した座標を終点座標と呼ぶことにする。 FIG. 33 schematically shows an example of an operation performed on the digest moving image. The thick frame indicates the entire image area of the digest moving image, and when there is a rectangular frame in the thick frame, it indicates that the main scene or the sub scene of the combination scene is displayed. A dotted frame indicates a change caused by editing. The arrow indicates the approximate trajectory and length of the touch operation. Further, the coordinates where the touch operation is started are called start point coordinates, and the coordinates where the touch operation is finished are called end point coordinates.
 図33(a)は画面81上でのフリック操作を示している。フリック操作は、映像編集装置100eではシーンの削除に関連付けられる。ここでは、編集対象シーンを削除対象シーンとする。ダイジェスト動画像編集制御部107は、ダイジェスト動画像から削除対象シーンのデータを削除するか、あるいは削除対象シーンを再生しないようにマークする。編集後は、削除操作によってダイジェスト動画像の総再生時間が短縮されているため、削除されたシーンの次のシーンから再生を再開する。ダイジェスト動画像の最後のシーンを削除した場合には、再生を停止する。なお、削除操作を受け付けたときに、フリックされた方向に削除対象シーンが移動するような視覚的効果を提示すると、よりユーザが容易に理解でき、好ましい。このようにすることで、再生中にユーザが不要と感じたシーンを容易に削除できる。 FIG. 33A shows a flick operation on the screen 81. The flick operation is associated with scene deletion in the video editing apparatus 100e. Here, the scene to be edited is set as a scene to be deleted. The digest moving image editing control unit 107 deletes the deletion target scene data from the digest moving image, or marks the deletion target scene not to be reproduced. After editing, since the total playback time of the digest moving image is shortened by the deletion operation, the playback is resumed from the scene next to the deleted scene. When the last scene of the digest moving image is deleted, playback is stopped. In addition, it is preferable that a user can easily understand the visual effect that the scene to be deleted moves in the flicked direction when the deletion operation is accepted. In this way, scenes that the user feels unnecessary during playback can be easily deleted.
 図33(b)は、図5(a)に示したような並列配置の組合せシーンにおいてツイスト操作をする例である。ツイスト操作は、映像編集装置100eでは配置パターンの変更に関連付けられる。図33(b)は、画面上に2つの要素シーン82および83がある配置パターンであるが、ツイスト操作により、この2つの要素シーン82および83を左右入れ替えられる。ダイジェスト動画像編集制御部107は、ダイジェスト動画像が符号化されている場合には、新たな配置に変更した編集対象シーンを生成して、再度符号化する。なお、削除の例と同様に、ツイスト操作を受け付けたときに要素シーン82および83が入れ替わるような視覚的効果を提示すると好ましい。このようにすることで、再生中にユーザが、当該要素シーンの左右の配置が不自然であると感じた場合でも、より好ましい配置に容易に変更できる。 FIG. 33 (b) is an example in which a twist operation is performed in a combination scene arranged in parallel as shown in FIG. 5 (a). The twist operation is associated with the change of the arrangement pattern in the video editing apparatus 100e. FIG. 33B shows an arrangement pattern in which two element scenes 82 and 83 are present on the screen, but the two element scenes 82 and 83 can be switched left and right by a twist operation. When the digest moving image is encoded, the digest moving image editing control unit 107 generates an editing target scene that has been changed to a new layout and encodes it again. As in the example of deletion, it is preferable to present a visual effect such that the element scenes 82 and 83 are switched when a twist operation is received. In this way, even when the user feels that the right and left arrangement of the element scene is unnatural during reproduction, it can be easily changed to a more preferable arrangement.
 図33(c)は、図5(f)に示したような、画面を3等分した組合せシーンにおいてツイスト操作をする例である。この場合のツイスト操作では、映像編集装置100eは、要素シーンの空間的配置を可能な組合せの中から順に選択して変更する。例えば元のダイジェスト動画像では要素シーン84、85、86が画面左からこの順で配置されている。3つの要素シーンをA、B、Cとすると、ダイジェスト動画像編集制御部107はツイスト操作を実行する毎にこの配置を、{A,B,C}→{A,C,B}→{B,A,C}→{B,C,A}→{C,A,B}→{C,B,A}→{A,B,C}と入れ替える。 FIG. 33 (c) is an example in which a twist operation is performed in a combination scene obtained by dividing the screen into three equal parts as shown in FIG. 5 (f). In the twist operation in this case, the video editing apparatus 100e selects and changes the spatial arrangement of the element scenes in order from possible combinations. For example, in the original digest moving image, element scenes 84, 85, and 86 are arranged in this order from the left of the screen. Assuming that the three element scenes are A, B, and C, the digest moving image editing control unit 107 changes this arrangement to {A, B, C} → {A, C, B} → {B every time a twist operation is executed. , A, C} → {B, C, A} → {C, A, B} → {C, B, A} → {A, B, C}.
 図33(b)および図33(c)の例の場合のツイスト操作は、画面上のどこで行われても構わないが、組合せシーンの境界付近で行われるツイスト操作では、境界に接する要素シーンのみを入れ替えるようにしてもよい。このようにすることで、3つ以上の要素シーンを含む組合せシーンにおいても、ユーザにとってより好ましい配置を素早く選択できる。 The twist operation in the example of FIGS. 33 (b) and 33 (c) may be performed anywhere on the screen, but in the twist operation performed near the boundary of the combination scene, only the element scene that touches the boundary is used. May be replaced. In this way, it is possible to quickly select a more preferable arrangement for the user even in a combination scene including three or more element scenes.
 図33(d)は、図5(b)に示したような、画面全体に配置した副シーン87の中央に縮小した主シーン88を配置する、中央配置の組合せシーンにおいてピンチアウト操作をする例である。ピンチアウト操作は、映像編集装置100eでは、要素シーンに対するサイズの拡大に関連付けられる。要素シーンの拡大率はピンチアウト操作の起点座標と終点座標の間の距離に応じて定められ、最小値は操作前の主シーン88のサイズ、最大値は画面全体すなわち副シーン87のサイズである。ダイジェスト動画像編集制御部107は、編集対象シーンから主シーン88の領域を抽出し、前記拡大率に応じて拡大して、編集対象シーンの上に配置し直した画像を生成し、再度符号化する。主シーン88の位置は中央のまま維持し、編集前の主シーン88が拡大した主シーン88に完全に隠されるようにする。このように主シーン88のサイズを拡大することにより、主シーン88の内容や人物をより目立たせることができる。 FIG. 33D shows an example of performing a pinch-out operation in a centrally arranged combination scene in which a reduced main scene 88 is arranged at the center of a sub-scene 87 arranged on the entire screen as shown in FIG. 5B. It is. In the video editing apparatus 100e, the pinch out operation is associated with an increase in size with respect to the element scene. The enlargement ratio of the element scene is determined according to the distance between the start point coordinate and the end point coordinate of the pinch out operation, the minimum value is the size of the main scene 88 before the operation, and the maximum value is the size of the entire screen, that is, the sub scene 87. . The digest moving image editing control unit 107 extracts the area of the main scene 88 from the editing target scene, expands it according to the expansion ratio, generates an image rearranged on the editing target scene, and re-encodes it. To do. The position of the main scene 88 is maintained at the center so that the main scene 88 before editing is completely hidden by the enlarged main scene 88. Thus, by enlarging the size of the main scene 88, the contents and persons of the main scene 88 can be made more conspicuous.
 図33(e)は、図33(d)の例と同様の中央配置ではあるが、図5(d)に示したような、主シーン89が切り出されて画面中央に配置された組合せシーンにおいてピンチアウト操作をする例である。この場合、シーン拡大率の上限は、ダイジェスト動画像すなわち副シーン87の水平画素数(w0とする)および主シーン89の領域の水平画素数(w1とする)を用いてw0/w1と定められる。なお、拡大の結果、主シーン89の垂直方向の画素数(h1とする)がダイジェスト動画像全体の垂直画素数(h0とする)を上回る場合は、主シーン89の上下をそれぞれ{h0×(w0/w1-1)/2}画素分トリミングして、主シーン89の画像アスペクト比をダイジェスト動画像全体の画像アスペクト比に合わせる。ダイジェスト動画像編集制御部107は、このように拡大およびトリミングされた主シーン89を、編集対象シーンの上に配置し直した画像を生成し、符号化する。図33(d)の例と同様に、このようにすれば、主シーン89の内容や人物をより目立たせることができる。 FIG. 33 (e) shows a central arrangement similar to the example of FIG. 33 (d), but in a combination scene in which the main scene 89 is cut out and arranged at the center of the screen as shown in FIG. 5 (d). This is an example of a pinch-out operation. In this case, the upper limit of the scene enlargement ratio is determined as w0 / w1 using the digest moving image, that is, the horizontal pixel number of the sub-scene 87 (denoted w0) and the horizontal pixel number of the area of the main scene 89 (denoted w1). . If the number of pixels in the vertical direction (h1) of the main scene 89 exceeds the number of vertical pixels (h0) of the entire digest moving image as a result of the enlargement, {h0 × ( Trimming by w0 / w1-1) / 2} pixels to match the image aspect ratio of the main scene 89 with the image aspect ratio of the entire digest moving image. The digest moving image editing control unit 107 generates and encodes an image in which the main scene 89 enlarged and trimmed in this way is rearranged on the scene to be edited. As in the example of FIG. 33 (d), the content and person of the main scene 89 can be made more conspicuous by doing in this way.
 図33(b)~図33(e)においては、編集対象シーンが組合せシーンであることを前提としている。もし再生中のダイジェスト動画像に、当該編集対象シーンが組合せシーンであるかどうかの情報がない場合や、当該ダイジェスト動画像に関するシーン情報が取得できない場合には、フレーム毎に画素値のヒストグラム生成、輪郭抽出(特に直線部分)、領域毎の動き検出などを用いれば、当該編集対象シーンが組合せシーンかどうかを判定できる。 33 (b) to 33 (e) are based on the premise that the scene to be edited is a combination scene. If there is no information on whether or not the editing scene is a combination scene in the digest moving image being played back, or if the scene information related to the digest moving image cannot be acquired, a pixel value histogram generation for each frame, If contour extraction (particularly a straight line portion), motion detection for each region, or the like is used, it can be determined whether or not the scene to be edited is a combination scene.
 図33(f)は、複雑な軌跡をもつドラッグ操作をする例である。このようなドラッグ操作は、映像編集装置100eでは、軌跡付近の領域に対するフィルタ効果に関連付けられる。図33(f)の例のような軌跡は、例えば、軌跡をなす点の座標値の分布が水平方向に偏りなく広く、かつ垂直方向の分布が最大値と最小値に偏っている軌跡として判定する。このようなドラッグ操作が入力されると、ダイジェスト動画像編集制御部107は、ドラッグ操作の始点と終点を含む領域を、フィルタ対象領域として設定する。例えば、軌跡の線分から一定の距離以内の画素を含む領域や、始点と終点を結ぶ直線を対角線とする矩形領域を含む領域である。ダイジェスト動画像編集制御部107は、このような編集対象シーンに含まれる全フレームにおけるフィルタ対象領域の画素に対して、フィルタをかけ、ダイジェスト動画像を更新する。このとき、必要であればフィルタ結果を再度符号化する。ここで用いるフィルタは、ある領域を消す、または乱すような軌跡を示すドラッグ操作であれば、非鮮鋭化フィルタや、所定の画素値による単純な塗り潰しを含む、対象領域を目立たなくする機能を持つものが望ましい。また、ある領域を囲むなど、注目を意味すると解釈できる軌跡のドラッグ操作であれば、鮮鋭化フィルタや輝度を上げるフィルタを含む、対象領域を目立たせる機能を持つものが望ましい。このような操作によって、編集対象シーンから、不要あるいは公開したくない領域を目立たなくしたり、重要な領域を鮮明にしたりできる。なお、フィルタ対象領域は、対象物やカメラが移動している場合、フレーム毎に変える必要がある。このため、ダイジェスト動画像編集制御部107は、編集対象シーンについてフィルタ対象領域の動き検出処理を行い、フィルタ対象領域の位置、形状、大きさを調整することが望ましい。さらに、より詳細に軌跡のパターンを認識すれば、類似した目的のフィルタでも複数のフィルタを自動的に切り換えられる。例えば、図33(f)のような上下方向に往復する軌跡の場合は非鮮鋭化フィルタをかけ、図33(g)のように左右方向に往復する軌跡の場合は塗りつぶしフィルタをかけるようにもできる。 FIG. 33 (f) shows an example of a drag operation having a complicated trajectory. Such a drag operation is associated with a filter effect for an area near the locus in the video editing apparatus 100e. The trajectory as in the example of FIG. 33F is determined as, for example, a trajectory in which the distribution of the coordinate values of the points forming the trajectory is wide without deviation in the horizontal direction, and the vertical distribution is biased between the maximum value and the minimum value. To do. When such a drag operation is input, the digest moving image editing control unit 107 sets an area including the start point and end point of the drag operation as a filter target area. For example, a region including pixels within a certain distance from the line segment of the locus or a region including a rectangular region having a straight line connecting the start point and the end point as a diagonal line. The digest moving image editing control unit 107 filters the pixels in the filter target region in all frames included in such an editing target scene, and updates the digest moving image. At this time, if necessary, the filter result is encoded again. The filter used here has a function to make the target area inconspicuous, including a non-sharpening filter and simple filling with a predetermined pixel value, if the drag operation shows a trajectory that erases or disturbs a certain area. Things are desirable. In addition, if a drag operation of a trajectory that can be interpreted as meaning attention, such as surrounding a certain region, it is desirable to have a function that makes the target region stand out, including a sharpening filter and a filter that increases luminance. By such an operation, it is possible to make an area that is not necessary or unnoticeable from an editing target scene inconspicuous or to make an important area clear. Note that the filter target area needs to be changed for each frame when the object or the camera is moving. For this reason, it is desirable that the digest moving image editing control unit 107 performs a motion detection process of the filter target region for the editing target scene and adjusts the position, shape, and size of the filter target region. Furthermore, if the locus pattern is recognized in more detail, a plurality of filters can be automatically switched even with a similar target filter. For example, an unsharpening filter is applied in the case of a trajectory that reciprocates in the vertical direction as shown in FIG. 33 (f), and a fill filter is applied in the case of a trajectory that reciprocates in the horizontal direction as in FIG. it can.
 図33(h)は、画面上に映像撮影の開始を示すボタン90を表示しておき、シーンの追加を可能にする例である。ボタン90がタップ操作されたとき、ダイジェスト動画像編集制御部107は、映像編集装置100eに内蔵もしくは外部に接続したカメラ(図示省略)があれば、ダイジェスト動画像の再生を停止して表示をカメラからの入力画像に画面を切り替えて撮影を開始する。撮影はユーザの操作により終了し、撮影した画像データVは記録メディアに保存される。ダイジェスト動画像編集制御部は、画像データVを用いて、既に説明したダイジェスト動画像生成と同様の処理を行う。ただし、入力は画像データVのみである。出力される画像データVのダイジェスト動画像は、既に再生中のダイジェスト動画像の最後に追加され、新たなダイジェスト動画像として保存される。より簡易には、撮影した画像データVをそのままダイジェスト動画像の最後に追加してもよい。 FIG. 33 (h) is an example in which a button 90 indicating the start of video shooting is displayed on the screen, and a scene can be added. When the button 90 is tapped, the digest moving image editing control unit 107 stops the reproduction of the digest moving image and displays the display if there is a camera (not shown) built in the video editing apparatus 100e or connected to the outside. Switch the screen to the input image from and start shooting. Shooting is terminated by the user's operation, and the shot image data V is stored in a recording medium. The digest moving image editing control unit uses the image data V to perform the same processing as the already described digest moving image generation. However, only the image data V is input. The digest moving image of the image data V to be output is added to the end of the digest moving image that is already being reproduced, and is stored as a new digest moving image. More simply, the captured image data V may be added to the end of the digest moving image as it is.
 なお、以上の編集処理はいずれも操作入力時にすぐに実行され、結果が得られることが望ましいが、編集結果を保存するために、映像や音声の再符号化など負荷の高い処理が必要になる場合がある。再生中で映像編集装置100eが高負荷状態にある場合、あるいは編集処理を実行すると再生や操作に支障が生じる場合には、編集処理の即時の実行を避けて編集処理の種類および操作部108からの入力情報と編集対象シーンを含む指示情報を記憶し、後に映像編集装置100eが低負荷状態になったときに、指示情報に基づいて編集を実行し、ダイジェスト動画像を更新するようにするとよい。その場合でも、操作入力時には、アニメーション、アイコン、負荷の低い処理による仮の処理結果画像などを用いて、編集対象シーンであることをユーザに対して提示するのが望ましい。 In addition, it is desirable that all of the editing processes described above are executed immediately upon input of an operation and the result is obtained. However, in order to save the editing result, high-load processing such as video and audio re-encoding is required. There is a case. If the video editing apparatus 100e is in a high load state during playback, or if playback or operation is hindered when the editing process is executed, the editing process type and the operation unit 108 are avoided by avoiding immediate execution of the editing process. The input information and the instruction information including the scene to be edited are stored, and when the video editing apparatus 100e is in a low load state later, editing is performed based on the instruction information and the digest moving image is updated. . Even in such a case, it is desirable to present to the user that the scene is to be edited using an animation, an icon, a temporary processing result image obtained by processing with a low load, and the like at the time of operation input.
 以上説明したように、生成するダイジェスト動画像の出力画像仕様および出力音声仕様を、出力先の映像表示デバイスや音声出力デバイスの仕様・能力に合わせることによって、出力先のデバイスに適した高品質なダイジェスト動画像を生成し、さらに、簡易な操作でダイジェスト動画像の構成を再生時に編集できる。 As described above, by matching the output video specifications and output audio specifications of the generated digest video to the specifications and capabilities of the output video display device and audio output device, it is possible to achieve high quality suitable for the output device. A digest moving image is generated, and the configuration of the digest moving image can be edited at the time of reproduction with a simple operation.
  (第8の実施形態)
 次に、本発明に係る第8の実施形態である映像編集装置について説明する。第8の実施形態の映像編集装置は、第7の実施形態の映像編集装置と同じ構成であるが、ダイジェスト動画像におけるシーンの空間的および時間的な配置を示す情報を含む、ダイジェスト動画像の生成に伴い使われた情報(以下、配置情報と呼ぶ)および入力画像データを、再生時にも使用できるよう記録メディアやメモリに保存しておく点で異なる。または、ダイジェスト動画像自体が、入力画像データおよび前記配置情報を含む形式であってもよい。例えば、ダイジェスト動画像を、入力画像データおよび前記配置情報に相当するデータを含む、1つまたはそれ以上のファイルからなるデータとしてもよい。再生時には、前記配置情報を参照して入力画像データを配置することで、ダイジェスト動画像生成部103で意図した映像を生成できる。
(Eighth embodiment)
Next, a video editing apparatus according to an eighth embodiment of the present invention will be described. The video editing apparatus according to the eighth embodiment has the same configuration as that of the video editing apparatus according to the seventh embodiment, but includes information indicating the spatial and temporal arrangement of scenes in the digest video. The difference is that the information used for generation (hereinafter referred to as arrangement information) and input image data are stored in a recording medium or memory so that they can be used during reproduction. Alternatively, the digest moving image itself may be in a format including input image data and the arrangement information. For example, the digest moving image may be data composed of one or more files including input image data and data corresponding to the arrangement information. At the time of reproduction, the video image intended by the digest moving image generation unit 103 can be generated by arranging the input image data with reference to the arrangement information.
 なお、配置情報のうち、シーンの空間的配置を示す情報とは、前記第7の実施形態で説明したような各配置パターンにおける要素シーンに対応する入力画像データのインデクス、前記要素シーンの縦横のサイズ(画素数)、画面上での位置(座標)、前記入力画像データ上での切り出し位置を含む。あるいは、これらを導出するための間接的な情報、例えば、配置パターン/既定のサイズ/既定の位置を選択するためのインデクスでもよい。また、シーンの時間的配置を示す情報とは、各シーンが最終的なダイジェスト動画像の時間軸上でどこに相当するかを示す情報であり、少なくとも各シーンの開始時刻と終了時刻(あるいは長さ)を含む。時刻や長さはフレーム数を用いて表現してもよい。 Of the arrangement information, information indicating the spatial arrangement of the scene refers to the index of the input image data corresponding to the element scene in each arrangement pattern as described in the seventh embodiment, and the vertical and horizontal directions of the element scene. It includes the size (number of pixels), the position (coordinates) on the screen, and the cutout position on the input image data. Alternatively, indirect information for deriving these may be used, for example, an index for selecting an arrangement pattern / a predetermined size / a predetermined position. The information indicating the temporal arrangement of scenes is information indicating where each scene corresponds on the time axis of the final digest moving image, and at least the start time and end time (or length) of each scene. )including. The time and length may be expressed using the number of frames.
 本実施形態におけるダイジェスト動画像生成部103は、配置情報を含む、以前のダイジェスト動画像の生成に用いたデータの、記録メディアやメモリへの保存と、再利用が可能である。これにより、部分的あるいは完全に同一のダイジェスト動画像を再度生成する場合にも、同一処理の再実行を避けて負荷を低減できる。 The digest moving image generating unit 103 according to the present embodiment can store and reuse data used for generating a previous digest moving image including arrangement information in a recording medium or a memory. As a result, even when partially or completely the same digest moving image is generated again, the load can be reduced by avoiding re-execution of the same process.
 図34(a)~(b)は、画面上でのフリック操作を示している。フリック操作は、第7の実施形態と同様に、シーンの削除に関連付けられる。図34(a)は、編集対象シーンが組合せシーンであり、かつ、フリック操作の起点座標が主シーン91に重ねて配置された副シーン92の場合である。このとき、削除対象シーンは副シーン92となる。図34(b)は、起点座標に表示されているシーンが主シーン91の場合であり、削除対象シーンは、主シーン91である。または、当該編集対象シーンを全て削除対象シーンとしてもよい。なお、上下方向のフリック操作の場合のみ削除処理とみなすように限定すれば、次に説明する左右方向のドラッグ操作との区別を容易にして操作の誤りを減らせる。ダイジェスト動画像編集制御部107は、編集対象シーンが再生されないよう、空間的および時間的な配置情報から当該編集対象シーンを削除し、ダイジェスト動画像を再度生成する。要素シーンの削除において、他の要素シーンに重ねて配置されていない要素シーンを削除すると、シーンが何も表示されない領域がダイジェスト動画像の画面上に発生してしまう。このような場合、削除後の要素シーンが1つだけであれば単独シーンとして配置し直し、削除後の要素シーンが2つ以上であれば画面を再分割するとよい。例えば、図5(f)のような3つの要素シーンの並列配置から1つを削除した場合は、残り2つの要素シーンを、図5(b)の並列配置を用いて配置し直すとよい。 34 (a) to 34 (b) show a flick operation on the screen. The flick operation is associated with the deletion of the scene, as in the seventh embodiment. FIG. 34A shows a case where the scene to be edited is a combination scene, and the sub-scene 92 in which the starting point coordinates of the flick operation are arranged so as to overlap the main scene 91. At this time, the scene to be deleted is the sub-scene 92. FIG. 34B shows the case where the scene displayed at the starting point coordinates is the main scene 91, and the deletion target scene is the main scene 91. Alternatively, all the editing target scenes may be set as deletion target scenes. It should be noted that if the deletion process is limited to only the flick operation in the up and down direction, it is possible to easily distinguish the operation from the drag operation in the left and right direction, which will be described below, and to reduce operation errors. The digest moving image editing control unit 107 deletes the editing target scene from the spatial and temporal arrangement information so that the editing target scene is not reproduced, and generates a digest moving image again. In deleting an element scene, if an element scene that is not placed over another element scene is deleted, an area where no scene is displayed is generated on the screen of the digest moving image. In such a case, the screen may be rearranged as a single scene if there is only one element scene after deletion, and the screen may be subdivided if there are two or more element scenes after deletion. For example, when one is deleted from the parallel arrangement of three element scenes as shown in FIG. 5 (f), the remaining two element scenes may be rearranged using the parallel arrangement of FIG. 5 (b).
 図34(c)~図34(e)は、組合せシーンの副シーン92を起点座標としたドラッグ操作の例を示している。 34 (c) to 34 (e) show examples of the drag operation using the sub-scene 92 of the combination scene as the starting point coordinates.
 図34(c)は、ドラッグが左右いずれかの画面端まで続いた場合である。このとき、ダイジェスト動画像編集制御部107は、副シーン92を単独シーンへと変更し、編集対象シーンから副シーン92を削除する。元の副シーン92に相当する新たに生成される単独シーンは、ドラッグの終点が左の画面端であれば編集対象シーンの直前に、右の画面端であれば編集対象シーンの直後に挿入する。図35は、ドラッグの終点が右の画面端であった場合の、編集前後のダイジェスト動画像の変化を示している。編集前のダイジェスト動画像1100は、途中にシーン1100a、1100b、1100cを含んでいる。シーン1100bは組合せシーンであり、要素シーンとして主シーンS21と副シーンS22を含む。シーン1100bが編集対象シーンとなって副シーンS22から開始されたドラッグ操作が画面右端まで行われた場合、副シーンS22はシーン1100bから独立し、編集後のダイジェスト動画像1101においては、単独シーン1100b2となる。副シーンS22が削除されたシーン1100bは、主シーンS21だけとなり、単独シーン1100b1となる。より単純な実施形態として、終点の位置に関わらず、編集対象シーンの直前または直後のいずれか所定の位置に挿入しても構わない。ダイジェスト動画像編集制御部107は、まず元の編集対象シーンを削除し、編集対象シーンのあった時間的位置に、元の編集対象シーンから副シーン92を削除した新たなシーン、および、副シーン92に相当する単独シーンの2つのシーンを、前記の順序で挿入する。図34(c)のように編集対象シーンが副シーン92以外に含む要素シーンが1つしかない場合は、編集対象シーンは編集後に2つの単独シーンとなる。編集対象シーンが含む要素シーンが3つ以上ある場合は、編集対象シーンは編集後に、1つの組合せシーンと1つの単独シーンとなる。図34(d)および図34(e)は、ドラッグ操作の終点が画面端に達しない場合である。図におけるドラッグの方向は一例である。 FIG. 34 (c) shows a case where dragging continues to either the left or right screen edge. At this time, the digest moving image editing control unit 107 changes the sub-scene 92 to a single scene and deletes the sub-scene 92 from the editing target scene. A newly generated single scene corresponding to the original sub-scene 92 is inserted immediately before the scene to be edited if the end point of the drag is the left screen edge, and immediately after the scene to be edited if the right screen edge. . FIG. 35 shows changes in the digest video before and after editing when the end point of the drag is the right screen edge. The digest moving image 1100 before editing includes scenes 1100a, 1100b, and 1100c on the way. The scene 1100b is a combination scene, and includes a main scene S21 and a sub scene S22 as element scenes. When the scene 1100b is an editing target scene and the drag operation started from the sub-scene S22 is performed up to the right end of the screen, the sub-scene S22 is independent from the scene 1100b, and the edited digest moving image 1101 has a single scene 1100b2. It becomes. The scene 1100b from which the sub-scene S22 is deleted is only the main scene S21 and becomes a single scene 1100b1. As a simpler embodiment, it may be inserted at a predetermined position immediately before or after the scene to be edited regardless of the position of the end point. The digest moving image editing control unit 107 first deletes the original scene to be edited, and a new scene in which the sub scene 92 is deleted from the original scene to be edited and the sub scene at the temporal position where the scene to be edited exists. Two scenes of a single scene corresponding to 92 are inserted in the order described above. When the editing target scene includes only one element scene other than the sub-scene 92 as shown in FIG. 34C, the editing target scene becomes two single scenes after editing. When there are three or more element scenes included in the editing target scene, the editing target scene becomes one combination scene and one single scene after editing. 34D and 34E show a case where the end point of the drag operation does not reach the screen edge. The drag direction in the figure is an example.
 図34(d)は、副シーン92の境界部分以外をドラッグした場合である。このとき、ダイジェスト動画像編集制御部107は、副シーン92を、主シーン上の別の場所へ移動する。これにより、組合せシーン上で注目したい人物や物体を副シーン92が覆っている場合でも、副シーン92を移動させて、ユーザにとってより好ましい映像にできる。移動先は、ドラッグの終点付近の任意の位置にしてもよいし、図34(d)に点線の矩形で示したように、システムで既定した複数の位置のうちドラッグの終点に最も近い位置にしてもよい。ダイジェスト動画像編集制御部107は、配置情報における前記編集対象シーンの情報を上記に対応するよう書き換え、前記編集対象シーンを再度生成し、保存する。 FIG. 34 (d) shows a case where a portion other than the boundary portion of the sub scene 92 is dragged. At this time, the digest moving image editing control unit 107 moves the sub-scene 92 to another place on the main scene. As a result, even when the sub-scene 92 covers a person or object to be noticed on the combination scene, the sub-scene 92 can be moved to make the video more preferable for the user. The movement destination may be an arbitrary position near the end point of the drag, or, as indicated by a dotted rectangle in FIG. 34 (d), a position closest to the end point of the drag among a plurality of positions defined by the system. May be. The digest moving image editing control unit 107 rewrites the information of the editing target scene in the arrangement information so as to correspond to the above, and generates and stores the editing target scene again.
 図34(e)は、子画面配置の主シーンと副シーン92の境界部分をドラッグした場合である。このとき、ダイジェスト動画像編集制御部107は、副シーン92の表示サイズ(縦横の画素数)を変更する。副シーンの新たなサイズは、ドラッグの終点から導出されるユーザの任意によるサイズでもよいし、システムで既定した複数のサイズのうちドラッグの終点で表されるサイズに最も近いサイズにしてもよい。副シーン92の新しいサイズおよび面積については、操作前のサイズおよび面積より大きくても小さくてもよいが、上限および下限を設けるとよい。例えば、動画像全体の面積の4分の1を上限とし、16分の1を下限とする。これにより、主シーンの視聴を妨げない範囲で副シーン92のサイズを調整し、副シーン92を重ね合わせるバランスをユーザにとってより好適に設定できる。図34(f)は、主シーン94が副シーン93を背景として画面中央に配置されるパターンである。このように主シーンが副シーンに重なる配置についても、サイズの変更対象となるシーンが主シーンとなる以外は、基本的に同様である。図34(f)の場合、主シーン94と副シーン93の境界は主シーン左右の垂直の辺のみである。 FIG. 34 (e) shows a case where the boundary portion between the main scene and the sub scene 92 in the sub-screen arrangement is dragged. At this time, the digest moving image editing control unit 107 changes the display size (number of vertical and horizontal pixels) of the sub-scene 92. The new size of the sub-scene may be an arbitrary size derived from the end point of the drag, or may be a size closest to the size represented by the end point of the drag among a plurality of sizes determined by the system. The new size and area of the sub-scene 92 may be larger or smaller than the size and area before the operation, but an upper limit and a lower limit may be provided. For example, the upper limit is 1/4 of the area of the entire moving image, and the lower limit is 1/16. Thereby, the size of the sub-scene 92 can be adjusted within a range that does not hinder viewing of the main scene, and the balance for superimposing the sub-scene 92 can be set more suitably for the user. FIG. 34F shows a pattern in which the main scene 94 is arranged in the center of the screen with the sub-scene 93 as a background. The arrangement in which the main scene overlaps the sub-scene is basically the same except that the scene whose size is to be changed becomes the main scene. In the case of FIG. 34 (f), the boundary between the main scene 94 and the sub-scene 93 is only the right and left vertical sides of the main scene.
 図34(e)および図34(f)のいずれの場合にしても、基本的には要素シーンのサイズ変更は、ドラッグの方向によらず、要素シーンの元の画像アスペクト比を保持して拡大率あるいは縮小率のみ変更すれば、ユーザが理解しやすく操作も簡便である。一方、縦横比を維持しないで変更できるようにすれば、より柔軟なサイズの変更ができる。その場合は、ドラッグの始点によって操作を変えるとよい。ドラッグの始点が境界部分の角であれば、要素シーンの垂直方向と水平方向のサイズを、ドラッグ操作に従って同時に変更する。ドラッグの視点が角を除く四辺のいずれかであれば、垂直方向の辺のドラッグは水平方向のサイズを変更し、水平方向の辺の場合は垂直方向のサイズを変更する。このようなドラッグの結果指定された画像アスペクト比は、要素シーンに対応する入力画像データのアスペクト比と、多くの場合で異なる。その場合、前記入力画像データを縦横それぞれ異なる倍率でスケールするか、前記入力画像データを新たな画像アスペクト比に合わせてトリミングする。後者の場合、新たな画像アスペクト比で可能な限り大きな領域を切り出すには、以下のようにする。以下、要素シーンに対応する入力画像データのサイズをws0:hs0(水平:垂直)とし、新たなサイズをws1:hs1とする。 In either case of FIG. 34 (e) or FIG. 34 (f), basically, the size change of the element scene is performed while maintaining the original image aspect ratio of the element scene regardless of the drag direction. If only the rate or the reduction rate is changed, the user can easily understand and the operation is simple. On the other hand, if the size can be changed without maintaining the aspect ratio, the size can be changed more flexibly. In that case, the operation may be changed depending on the starting point of the drag. If the starting point of the drag is the corner of the boundary portion, the vertical and horizontal sizes of the element scene are simultaneously changed according to the drag operation. If the drag viewpoint is one of the four sides excluding the corner, dragging the vertical side changes the size in the horizontal direction, and if it is a horizontal side, changes the size in the vertical direction. The image aspect ratio specified as a result of such dragging is different in many cases from the aspect ratio of the input image data corresponding to the element scene. In that case, the input image data is scaled at different magnifications in the vertical and horizontal directions, or the input image data is trimmed in accordance with a new image aspect ratio. In the latter case, in order to cut out as large an area as possible with a new image aspect ratio, the following is performed. Hereinafter, the size of the input image data corresponding to the element scene is set to ws0: hs0 (horizontal: vertical), and the new size is set to ws1: hs1.
 ws0/hs0 < ws1/hs1 の場合、要素シーンに対応する入力画像データの上下を水平の帯状に削除し、画素数ws0:(hs1×ws1/ws0)の画像にして、ws1:hs1に縮小する。 If ws0 / hs0 <ws1 / hs1, the top and bottom of the input image data corresponding to the element scene are deleted in a horizontal band to form an image with the number of pixels ws0: (hs1 × ws1 / ws0) and reduced to ws1: hs1. .
 ws0/hs0 > ws1/hs1 の場合、要素シーンに対応する入力画像データの左右を垂直の帯状に削除し、画素数(ws1×hs1/hs0):hs0の画像にして、ws1:hs1に縮小する。なお、要素シーンに主要人物などの重要なオブジェクトが含まれている場合は、そのオブジェクトを中心に、上記の新たな画像アスペクト比になるようトリミングするだけでもよい。 If ws0 / hs0> ws1 / hs1, the left and right sides of the input image data corresponding to the element scene are deleted in a vertical band shape to form an image with the number of pixels (ws1 × hs1 / hs0): hs0 and reduced to ws1: hs1. . When an important object such as a main person is included in the element scene, the image may be simply trimmed so that the new image aspect ratio is obtained centering on the object.
 ダイジェスト動画像編集制御部107は、前記編集対象シーンにおける配置情報を、上記のようにしてサイズを調整した要素シーンを配置するよう変更し、要素シーンに対応する入力画像データを用いて編集対象シーンを再度生成する。サイズ変更の結果としていずれかの要素シーンが画面上から見えなくなるときは、当該要素シーンを編集対象シーンの配置情報から削除すれば、当該編集対象シーンを生成する際の負荷を低減できる。 The digest moving image editing control unit 107 changes the arrangement information in the editing target scene to arrange the element scene whose size has been adjusted as described above, and uses the input image data corresponding to the element scene to edit the editing target scene Is generated again. When any element scene disappears from the screen as a result of the size change, the load when generating the edit target scene can be reduced by deleting the element scene from the arrangement information of the edit target scene.
 図34(g)は、組合せシーンにおいてツイスト操作をする例である。ツイスト操作が行われる毎に、ダイジェスト動画像編集制御部107は、編集対象シーンの含む要素シーンに対応する入力画像データを用いて、とりうる配置パターンの中から別の配置パターンを選択し、編集対象シーンを再度生成する。本実施形態では入力画像データが全て保存されているため、図33(b)および図33(c)で示した例のように配置の並び順のみを入れ替えるほか、重なり方が変わる配置パターンに変更することも可能である。これにより、ユーザにとって好ましい配置パターンを容易に選択できる。 FIG. 34 (g) shows an example in which a twist operation is performed in a combination scene. Each time a twist operation is performed, the digest moving image editing control unit 107 selects another arrangement pattern from the possible arrangement patterns using the input image data corresponding to the element scene included in the scene to be edited, and edits it. The target scene is generated again. In the present embodiment, since all input image data is stored, only the arrangement order of arrangement is changed as in the examples shown in FIGS. 33B and 33C, and the arrangement pattern is changed to change the overlapping manner. It is also possible to do. Thereby, a preferable arrangement pattern for the user can be easily selected.
 なお、図34(g)の例でも、要素シーンの境界付近でツイスト操作が行われた場合には、境界付近の要素シーンのみが変更されるように配置パターンを変更するとよい。さらに、前記境界が主シーンと副シーンの境界であった場合、主シーンと副シーンの割当てを変更した配置も、とりうる配置パターンに含めるとよい。図34(g)の例であれば、子画面配置はそのままで、元の主シーンを新たな副シーンとし、元の副シーンを新たな主シーンして交換する。これにより、副シーンとされたシーンが主シーンよりも重要であるとユーザが感じた場合にも、容易に配置パターンを変更できる。 In the example of FIG. 34 (g), when the twist operation is performed near the boundary of the element scene, the arrangement pattern may be changed so that only the element scene near the boundary is changed. Further, when the boundary is a boundary between the main scene and the sub-scene, an arrangement in which the assignment of the main scene and the sub-scene is changed may be included in the possible arrangement pattern. In the example shown in FIG. 34 (g), the original main scene is used as a new sub-scene and the original sub-scene is replaced with a new main scene without changing the sub-screen layout. Thereby, even when the user feels that the scene set as the sub-scene is more important than the main scene, the arrangement pattern can be easily changed.
 ダイジェスト動画像編集制御部107は、新たな配置パターンに合わせて配置情報を変更し、前記配置情報に基づいて編集対象シーンを再度生成する。以上説明したように、ダイジェスト動画像生成に用いる入力画像データおよび配置情報を保存しておく映像編集装置によって、再生時の簡易な操作に基づいてダイジェスト動画像を再度生成でき、ユーザにとってより好ましいダイジェスト動画像へと修正できる。 The digest moving image editing control unit 107 changes the arrangement information according to the new arrangement pattern, and again generates the editing target scene based on the arrangement information. As described above, the video editing apparatus that stores the input image data and the arrangement information used for generating the digest moving image can generate the digest moving image again based on a simple operation at the time of reproduction, which is more preferable for the user. Can be modified to a moving image.
 なお、上述した実施形態における映像編集装置100,100a,100b、100c、100eの一部、例えば、画像データ分類部101、シーン情報生成部102、ダイジェスト動画像生成部103,103a,103b、103c、103d、イベント選択部104、出力制御部105、映像表示部106、ダイジェスト動画像編集制御部107、操作部108、対象画像データ抽出部109、再生時間候補導出部110および再生時間候補表示部111を、コンピュータで実現するようにしても良い。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、映像編集装置100,100a,100bに内蔵されたコンピュータシステムであって、OSや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、メモリカード、光磁気ディスク、CD-ROM、DVD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク、SSD等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んで良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that some of the video editing apparatuses 100, 100a, 100b, 100c, and 100e in the above-described embodiment, for example, the image data classification unit 101, the scene information generation unit 102, the digest moving image generation units 103, 103a, 103b, and 103c, 103d, event selection unit 104, output control unit 105, video display unit 106, digest moving image editing control unit 107, operation unit 108, target image data extraction unit 109, reproduction time candidate derivation unit 110, and reproduction time candidate display unit 111 It may be realized by a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. Here, the “computer system” is a computer system built in the video editing apparatuses 100, 100a, 100b, and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a portable medium such as a memory card, a magneto-optical disk, a CD-ROM, a DVD-ROM, a storage device such as a hard disk built in a computer system, an SSD. . Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In this case, it is possible to include a server or client that holds a program for a certain period of time, such as a volatile memory inside a computer system. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
 また、上述した実施形態における各映像編集装置の一部、または全部を、LSI(Large Scale Integration)等の集積回路として実現しても良い。前記映像編集装置の各機能ブロックは個別にプロセッサ化しても良いし、一部、または全部を集積してプロセッサ化しても良い。また、集積回路化の手法はLSIに限らず専用回路、または汎用プロセッサで実現しても良い。また、半導体技術の進歩によりLSIに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いても良い。 Further, a part or all of each video editing apparatus in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the video editing apparatus may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.
  (まとめ)
 本発明の態様1に係る映像編集装置は、動画像を含む画像データ群を、1つ以上のシーンに分割すると共に、シーン単位の特徴を示すシーン情報を生成するシーン情報生成部と、前記シーン情報に基づいて、前記画像データのダイジェスト動画像を生成するダイジェスト動画像生成部とを備える映像編集装置であって、前記ダイジェスト動画像生成部は、前記シーン情報に基づいて、ダイジェスト動画像を生成する際に各シーンを使用するか否か、複数のシーンを同一フレーム内に配置するか否か、および複数シーンを同一フレーム内に配置する際のシーンの空間的配置パターンを決定することを特徴としている。
(Summary)
The video editing apparatus according to the first aspect of the present invention includes a scene information generation unit that divides an image data group including a moving image into one or more scenes and generates scene information indicating features of a scene unit, and the scene A digest moving image generating unit that generates a digest moving image of the image data based on information, wherein the digest moving image generating unit generates a digest moving image based on the scene information Whether or not to use each scene, whether to place multiple scenes in the same frame, and the spatial arrangement pattern of scenes when placing multiple scenes in the same frame It is said.
 前記の構成によれば、大量・多数の静止画像や動画像を、手間をかけずに短時間で確認・観賞することができる。さらに、画像を表示する画面の大きさや形状に応じて見やすく、飽きずに画像を観賞することができる。 According to the above-described configuration, a large number and a large number of still images and moving images can be confirmed and viewed in a short time without trouble. Furthermore, it is easy to see according to the size and shape of the screen displaying the image, and the image can be viewed without getting tired.
 本発明の態様2に係る映像編集装置は、前記態様1において、前記ダイジェスト動画像生成部は、時間的に近接する複数シーンのシーン情報を比較し、比較結果に基づいてシーンの種別である主シーンと副シーンを決定し、さらに、時間的に近接するシーン同士のシーンの種別の関係に基づいて、前記複数シーンの空間的配置パターンとして、少なくとも、2つ以上の主シーン同士を同一フレーム内に配置する「並列配置」パターン、主シーンと副シーンを、主シーンを画面中央領域に配置し、副シーンを主シーンの領域周辺に位置するように配置する「中央配置」パターン、主シーンをフレーム全体に表示し、その一部の領域に副シーンが重畳されるように配置する「子画面配置」パターン、のいずれかから選択して決定してよい。 The video editing apparatus according to aspect 2 of the present invention is the video editing apparatus according to aspect 1, wherein the digest moving image generating unit compares scene information of a plurality of scenes that are temporally close to each other, and is a type of scene based on the comparison result. A scene and a sub-scene are determined, and at least two or more main scenes are included in the same frame as the spatial arrangement pattern of the plurality of scenes based on the relationship of scene types between scenes that are temporally close to each other. "Parallel arrangement" pattern, the main scene and the sub-scene are arranged in the center area of the screen, and the "center arrangement" pattern, the main scene is arranged so that the sub-scene is located around the main scene area. It may be selected and selected from any of “sub-screen arrangement” patterns that are displayed over the entire frame and arranged so that the sub-scene is superimposed on a part of the area.
 本発明の態様3に係る映像編集装置は、前記態様2において、前記ダイジェスト動画像生成部は、主シーンを画面中央領域に配置し、副シーンを主シーンの領域周辺に位置するように配置する「中央配置」パターンを選択した際に、副シーンに空間フィルタを適用して、画像の精鋭度もしくは色調に関して主シーンの領域と差をつけてよい。 In the video editing apparatus according to aspect 3 of the present invention, in the aspect 2, the digest moving image generating unit arranges the main scene in the center area of the screen and the sub-scene so as to be located around the area of the main scene. When the “center arrangement” pattern is selected, a spatial filter may be applied to the sub-scene to differentiate it from the main scene area in terms of image sharpness or color tone.
 本発明の態様4に係る映像編集装置は、前記態様1から3の何れかにおいて、前記ダイジェスト動画像生成部はさらに、ダイジェスト動画像の生成対象である画像データ群の単位で、ダイジェスト動画像の生成回数をカウントし、生成回数に応じて複数シーンを配置する際の配置パターンを変化させてよい。 The video editing apparatus according to Aspect 4 of the present invention is the video editing apparatus according to any one of Aspects 1 to 3, wherein the digest moving image generating unit is further configured to generate a digest moving image in units of image data groups to be generated as a digest moving image. The number of generations may be counted, and the arrangement pattern for arranging a plurality of scenes may be changed according to the number of generations.
 本発明の態様5に係る映像編集装置は、前記態様1から4の何れかにおいて、前記シーン情報生成部は、シーン単位で、画像フレーム内における特徴領域の数を示す情報である「領域数」と、前記特徴領域のうち面積が最大である領域の大きさを示す情報である「最大領域サイズ」と、前記特徴領域のうち面積が最大である領域の画像内の位置を示す情報である「最大領域位置」とを、前記シーン情報の一部として生成し、前記ダイジェスト動画像生成部は、複数のシーンを同一フレーム内に配置する際に、前記シーン情報で示される各情報に基づいて、主シーンとして切り出す画像領域や、副シーンに空間フィルタを適用する際のフィルタの強度を可変させてよい。 The video editing apparatus according to aspect 5 of the present invention is the video editing apparatus according to any one of aspects 1 to 4, wherein the scene information generation unit is information indicating the number of feature areas in the image frame in units of scenes. And “maximum region size” which is information indicating the size of the region having the largest area among the feature regions, and information indicating the position in the image of the region having the maximum area among the feature regions. `` Maximum region position '' is generated as a part of the scene information, and the digest moving image generation unit, when arranging a plurality of scenes in the same frame, based on each information indicated by the scene information, The strength of the filter when applying the spatial filter to the image area to be cut out as the main scene or the sub-scene may be varied.
 本発明の態様6に係る映像編集装置は、前記態様1から5の何れかにおいて、前記映像編集装置は、ダイジェスト動画像を出力する出力デバイスの特性を含む出力条件に基づいてダイジェスト動画像の生成方針を決定し、決定した生成方針を前記ダイジェスト動画像生成部へ通知する出力制御部をさらに備え、前記ダイジェスト動画像生成部は、前記生成方針および前記シーン情報に基づいて、ダイジェスト動画像を生成してよい。 The video editing apparatus according to aspect 6 of the present invention is the video editing apparatus according to any one of the aspects 1 to 5, wherein the video editing apparatus generates a digest moving image based on an output condition including characteristics of an output device that outputs the digest moving image. An output control unit for determining a policy and notifying the determined generation policy to the digest moving image generation unit, wherein the digest moving image generation unit generates a digest moving image based on the generation policy and the scene information; You can do it.
 本発明の態様7に係る映像編集装置は、前記態様1から6の何れかにおいて、前記映像編集装置は、画像データが有する撮影時の条件を示すメタデータに基づいて、画像データをイベント単位に分類する画像データ分類部と、イベント単位に分類された画像データから、メタデータが所定の条件に従う画像データで構成される画像データ群をダイジェスト動画像の生成対象として選択するイベント選択部とをさらに備え、前記ダイジェスト動画像生成部は、前記イベント選択部によって選択された画像データ群を入力としてダイジェスト動画像を生成してよい。 The video editing apparatus according to Aspect 7 of the present invention is the video editing apparatus according to any one of Aspects 1 to 6, wherein the video editing apparatus stores image data in units of events based on metadata indicating a shooting condition included in the image data. An image data classifying unit for classifying, and an event selecting unit for selecting, from the image data classified into event units, an image data group composed of image data whose metadata conforms to a predetermined condition as a digest moving image generation target The digest moving image generation unit may generate a digest moving image with the image data group selected by the event selection unit as an input.
 本発明の態様8に係る映像編集装置は、前記態様1において、ダイジェスト動画像を出力する出力デバイスの特性を含む出力条件に基づいてダイジェスト動画像の生成方針を決定し、決定した生成方針を前記ダイジェスト動画像生成部へ通知する出力制御部をさらに備え、前記ダイジェスト動画像生成部は、前記生成方針および前記シーン情報に基づいて、ダイジェスト動画像における前記シーンの空間的配置パターンを決定してよい。 The video editing apparatus according to aspect 8 of the present invention, in aspect 1, determines a digest moving image generation policy based on an output condition including characteristics of an output device that outputs a digest moving image, and the determined generation policy is An output control unit for notifying the digest moving image generation unit may be further provided, and the digest moving image generation unit may determine a spatial arrangement pattern of the scene in the digest moving image based on the generation policy and the scene information. .
 本発明の態様9に係る映像編集装置は、前記態様3において、前記シーン情報生成部は、シーン単位で、画像フレーム内における特徴領域の数を示す情報である「領域数」と、前記特徴領域のうち面積が最大である領域の大きさを示す情報である「最大領域サイズ」と、前記特徴領域のうち面積が最大である領域の画像内の位置を示す情報である「最大領域位置」とを、前記シーン情報の一部として生成し、前記ダイジェスト動画像生成部は、複数のシーンを同一フレーム内に配置する際に、前記シーン情報で示される「領域数」もしくは「最大領域サイズ」の大きさに基づいて、副シーンに空間フィルタを適用する際のフィルタの強度を可変させてよい。 The video editing apparatus according to aspect 9 of the present invention is the video editing apparatus according to aspect 3, wherein the scene information generation unit includes information indicating the number of feature areas in an image frame in units of scenes, and the feature area. "Maximum region size" that is information indicating the size of a region having the largest area, and "maximum region position" that is information indicating the position in the image of the region having the largest area among the feature regions Is generated as a part of the scene information, and the digest moving image generation unit sets the “number of areas” or “maximum area size” indicated by the scene information when arranging a plurality of scenes in the same frame. Based on the size, the strength of the filter when applying the spatial filter to the sub-scene may be varied.
 本発明の態様10に係る映像編集装置は、前記態様2または3において、前記ダイジェスト動画像生成部はさらに、ダイジェスト動画像の生成対象である画像データ群の単位で、ダイジェスト動画像の生成回数をカウントし、生成回数に応じて複数シーンを配置する際の配置パターンを変化させてよい。 In the video editing apparatus according to aspect 10 of the present invention, in the aspect 2 or 3, the digest moving image generation unit further determines the number of times that the digest moving image is generated in units of image data groups that are the generation targets of the digest moving image. The arrangement pattern at the time of arranging a plurality of scenes may be changed according to the number of generations.
 本発明の態様11に係る映像編集装置は、前記態様8において、前記ダイジェスト動画像生成部は、前記生成方針に基づいて、ダイジェスト動画像を符号化するか否か、および符号化する際の符号化品質を決定してよい。 The video editing apparatus according to aspect 11 of the present invention is the video editing apparatus according to aspect 8, wherein the digest moving image generation unit encodes a digest moving image based on the generation policy, and a code used when encoding the digest moving image. The conversion quality may be determined.
 本発明の態様12に係る映像編集装置は、画像データ群に基づいて、ダイジェスト動画像の再生時間候補を導出する再生時間候補導出部と、前記再生時間候補をユーザに提示し、ユーザイベントに基づいて指定時間を設定する再生時間候補表示部と、動画像を含む画像データ群を、1つ以上のシーンに分割するシーン情報生成部と、前記シーンに基づいて、画像クリップを生成し、前記画像クリップを時間的に結合することでダイジェスト動画像を生成するダイジェスト動画像生成部とを備える映像編集装置であって、前記ダイジェスト動画像生成部は、前記ダイジェスト動画像の再生時間が前記指定時間となるような調整を実施することを特徴としている。 The video editing apparatus according to the twelfth aspect of the present invention provides a playback time candidate derivation unit for deriving a playback time candidate of a digest video based on an image data group, presents the playback time candidate to the user, and based on a user event. A playback time candidate display unit that sets a designated time, a scene information generation unit that divides an image data group including moving images into one or more scenes, generates an image clip based on the scene, and A video editing apparatus including a digest video generation unit that generates a digest video by temporally combining clips, wherein the digest video generation unit includes a reproduction time of the digest video and the specified time. It is characterized by performing such adjustment.
 前記の構成によれば、大量・多数の静止画像や動画像を、手間をかけずに短時間で確認・観賞することができる。さらに、画像を多様な表示方法で観賞することができると共に、ユーザが所望する時間で鑑賞することができる。 According to the above-described configuration, a large number and a large number of still images and moving images can be confirmed and viewed in a short time without trouble. Furthermore, the image can be viewed by various display methods and can be viewed at a time desired by the user.
 本発明の態様13に係る映像編集装置は、前記態様12において、前記ダイジェスト動画像生成部は、前記指定時間が短くなるにつれて、動きの少ない前記画像クリップの再生時間を短くしてよい。 In the video editing apparatus according to aspect 13 of the present invention, in the aspect 12, the digest moving image generation unit may shorten the reproduction time of the image clip with less movement as the designated time becomes shorter.
 本発明の態様14に係る映像編集装置は、前記態様13において、前記ダイジェスト動画像生成部は、前記画像クリップのフレームを間引くことで再生時間を短くしてよい。 In the video editing apparatus according to aspect 14 of the present invention, in the aspect 13, the digest moving image generation unit may shorten the reproduction time by thinning out the frames of the image clip.
 本発明の態様15に係る映像編集装置は、前記態様12において、前記ダイジェスト動画像生成部は、前記指定時間が長くなるにつれて、動きの少ない前記画像クリップの再生時間を長くしてよい。 In the video editing apparatus according to aspect 15 of the present invention, in the aspect 12, the digest moving image generation unit may lengthen the reproduction time of the image clip with less movement as the designated time increases.
 本発明の態様16に係る映像編集装置は、前記態様15において、前記ダイジェスト動画像生成部は、前記画像クリップのフレームを補間することで再生時間を長くしてよい。 In the video editing apparatus according to aspect 16 of the present invention, in the aspect 15, the digest moving image generation unit may extend the reproduction time by interpolating the frame of the image clip.
 本発明の態様17に係る映像編集装置は、前記態様12において、前記ダイジェスト動画像生成部は、前記シーンを単独で使用する単独シーンか複数のシーンを組み合わせて使う複数シーンの何れかに分類し、前記指定時間が短くなるにつれて、前記ダイジェスト動画像を構成する前記複数シーンの割合が大きくなってよい。 In the video editing apparatus according to aspect 17 of the present invention, in the aspect 12, the digest moving image generating unit classifies the scene as either a single scene using the scene alone or a plurality of scenes using a combination of multiple scenes. As the specified time becomes shorter, the ratio of the plurality of scenes constituting the digest moving image may increase.
 本発明の態様18に係る映像編集装置は、前記態様12において、前記再生時間候補導出部は、前記画像データ群の総再生時間よりも前記再生時間候補を短くし、なお且つ、前記画像データ群の総再生時間が長くなるほど、前記再生時間候補を長くしてよい。 In the video editing apparatus according to aspect 18 of the present invention, in the aspect 12, the reproduction time candidate derivation unit makes the reproduction time candidate shorter than the total reproduction time of the image data group, and the image data group The longer the total playback time, the longer the playback time candidate may be.
 本発明の態様19に係る映像編集装置は、動画像を含む画像データ群を、1つ以上のシーンに分割すると共に、シーン単位の特徴を示すシーン情報を生成するシーン情報生成部と、ダイジェスト動画像の生成方針を決定し、決定した生成方針を前記ダイジェスト動画像生成部へ通知する出力制御部と、前記シーン情報および前記生成方針に基づいて、複数シーンを画面内に空間的に配置したシーン(以下、組合せシーンとする)を含む、前記画像データ群のダイジェスト動画像を生成するダイジェスト動画像生成部と、映像および操作用の情報を表示する映像表示部と、前記ダイジェスト動画像を再生して前記映像表示部に出力するダイジェスト動画像編集制御部と、外部からの操作入力を検出する操作部とを備える映像編集装置であって、前記操作部により検出された操作入力により前記ダイジェスト動画像の構成を変更することを特徴としている。 The video editing apparatus according to the nineteenth aspect of the present invention includes a scene information generation unit that divides an image data group including a moving image into one or more scenes and generates scene information indicating features of a scene unit, a digest video An output control unit that determines an image generation policy and notifies the digest moving image generation unit of the determined generation policy, and a scene in which a plurality of scenes are spatially arranged on the screen based on the scene information and the generation policy (Hereinafter, referred to as a combination scene), a digest moving image generating unit that generates a digest moving image of the image data group, a video display unit that displays video and operation information, and playing back the digest moving image A video editing apparatus comprising: a digest video editing control unit that outputs to the video display unit; and an operation unit that detects an operation input from the outside. It is characterized by changing the configuration of the digest moving image by the detected operation input by the operation unit.
 前記の構成によれば、大量・多数の静止画像や動画像を、手間をかけずに短時間で確認・観賞することができる。さらに、大量・多数の静止画像や動画像を見やすく確認・観賞できるよう構成した画像を、さらにユーザに好ましい構成になるよう再生時に簡便に修正できる。 According to the above-described configuration, a large number and a large number of still images and moving images can be confirmed and viewed in a short time without trouble. Furthermore, an image configured so that a large number and a large number of still images and moving images can be easily checked and viewed can be easily corrected at the time of reproduction so as to have a configuration preferable for the user.
 本発明の態様20に係る映像編集装置は、前記態様19において、前記映像編集装置は、前記操作入力により指定されたシーンもしくは組み合わせシーンを構成する一部のシーンを前記ダイジェスト動画像から削除してよい。 The video editing apparatus according to aspect 20 of the present invention is the video editing apparatus according to aspect 19, wherein the video editing apparatus deletes a part of a scene constituting a scene or combination scene specified by the operation input from the digest moving image. Good.
 本発明の態様21に係る映像編集装置は、前記態様19において、前記映像編集装置は、前記操作入力により指定された組合せシーンの空間的配置パターンを変更してよい。 In the video editing apparatus according to aspect 21 of the present invention, in the aspect 19, the video editing apparatus may change a spatial arrangement pattern of the combination scene designated by the operation input.
 本発明の態様22に係る映像編集装置は、前記態様19において、前記映像編集装置は、前記操作入力により指定された領域に対し、動画像にフィルタをかけてよい。 In the video editing apparatus according to aspect 22 of the present invention, in the aspect 19, the video editing apparatus may filter a moving image with respect to an area designated by the operation input.
 本発明の態様23に係る映像編集装置は、前記態様19において、前記映像編集装置は、前記操作入力により、新たに撮影した画像データを前記ダイジェスト動画像に追加してよい。 In the video editing apparatus according to aspect 23 of the present invention, in the aspect 19, the video editing apparatus may add newly captured image data to the digest moving image by the operation input.
 本発明の態様24に係る映像編集装置は、前記態様19において、前記映像編集装置は、前記操作入力により指定された組合せシーンから、前記組合せシーンを構成するいずれかのシーンを、単独のシーンとして前記ダイジェスト動画像において前記組合せシーンの時間的に前または後に挿入してよい。 The video editing apparatus according to aspect 24 of the present invention is the video editing apparatus according to aspect 19, wherein the video editing apparatus selects any scene constituting the combination scene as a single scene from the combination scene specified by the operation input. You may insert in the said digest moving image before or after the said combination scene in time.
 本発明の態様25に係る映像編集装置は、前記態様19から24の何れかにおいて、前記映像編集装置は、前記ダイジェスト動画像の再生時における操作入力により、前記ダイジェスト動画像の生成に用いた画像および、前記画像の空間的および時間的な配置を示す情報を用いて、前記ダイジェスト動画像の内容を変更してよい。 The video editing apparatus according to Aspect 25 of the present invention is the video editing apparatus according to any one of Aspects 19 to 24, wherein the video editing apparatus is an image used for generating the digest moving image by an operation input during reproduction of the digest moving image. And the content of the digest moving image may be changed using information indicating the spatial and temporal arrangement of the image.
 本発明の態様26に係る映像編集装置は、前記態様19において、前記操作入力により指定された、組合せシーンを構成する一部のシーンを当該組合せシーンから削除してよい。 In the aspect 19, the video editing apparatus according to aspect 26 of the present invention may delete a part of the scenes constituting the combination scene designated by the operation input from the combination scene.
 本発明の態様27に係る映像編集装置は、前記態様25において、前記操作入力により、組合せシーンの空間的配置パターンを変更してよい。 In the aspect 25, the video editing apparatus according to aspect 27 of the present invention may change a spatial arrangement pattern of the combination scene by the operation input.
 本発明の態様26に係る映像編集装置は、前記態様25において、前記操作入力により指定された組合せシーンから、前記組合せシーンを構成するいずれかのシーンを、単独のシーンとして前記ダイジェスト動画像において前記組合せシーンの時間的に前または後に挿入してよい。 In the video editing apparatus according to aspect 26 of the present invention, in the aspect 25, any one of the scenes constituting the combination scene from the combination scene specified by the operation input as the single scene in the digest moving image. It may be inserted before or after the combined scene.
 以上、図面を参照してこの発明の複数の実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiments of the present invention have been described in detail with reference to the drawings. However, the specific configuration is not limited to that described above, and various design changes can be made without departing from the scope of the present invention. Etc. are possible.
 本発明は、静止画像や動画像を入力としていわゆるダイジェスト動画像を生成する映像編集装置に好適に適用することができる。 The present invention can be suitably applied to a video editing apparatus that generates a so-called digest moving image by inputting a still image or a moving image.
100、100a、100b、100c…映像編集装置
101…画像データ分類部
102、102a…シーン情報生成部
103、103a、103b、103c、103d…ダイジェスト動画像生成部
104…イベント選択部
105…出力制御部
106…映像表示部
107…ダイジェスト動画像編集制御部
108…操作部
109…対象画像データ抽出部
110…再生時間候補導出部
111…再生時間候補表示部
301、302…画像データ群
200、303、400、800…シーン情報
304…選択情報
305…ダイジェスト動画像生成方針
307…ダイジェスト動画像
1031、1031b…対象画像抽出部
1032、1032a、1032d…シーン種類決定部
1033、1033a、1033b…シーン空間配置部
1034、1034b…シーン時間配置部
1035…ダイジェスト化制御部
1036…ダイジェスト動画像編集部
 
100, 100a, 100b, 100c ... Video editing apparatus 101 ... Image data classification unit 102, 102a ... Scene information generation unit 103, 103a, 103b, 103c, 103d ... Digest moving image generation unit 104 ... Event selection unit 105 ... Output control unit 106 ... Video display unit 107 ... Digest video editing control unit 108 ... Operation unit 109 ... Target image data extraction unit 110 ... Reproduction time candidate derivation unit 111 ... Reproduction time candidate display units 301, 302 ... Image data groups 200, 303, 400 , 800 ... Scene information 304 ... Selection information 305 ... Digest moving image generation policy 307 ... Digest moving images 1031, 1031b ... Target image extracting units 1032, 1032a, 1032d ... Scene type determining units 1033, 1033a, 1033b ... Scene space arranging unit 1034 , 103 b ... scene time arranging unit 1035 ... digest of the control unit 1036 ... digest moving image editing unit

Claims (20)

  1.  動画像を含む画像データ群を、1つ以上のシーンに分割すると共に、シーン単位の特徴を示すシーン情報を生成するシーン情報生成部と、
     前記シーン情報に基づいて、前記画像データ群のダイジェスト動画像を生成するダイジェスト動画像生成部とを備える映像編集装置であって、
     前記ダイジェスト動画像生成部は、前記シーン情報に基づいて、ダイジェスト動画像を生成する際に各シーンを使用するか否か、複数のシーンを同一フレーム内に配置するか否か、および複数シーンを同一フレーム内に配置する際のシーンの空間的配置パターンを決定することを特徴とする、映像編集装置。
    A scene information generation unit that divides an image data group including a moving image into one or more scenes and generates scene information indicating features of each scene;
    A video editing device comprising a digest video generation unit that generates a digest video of the image data group based on the scene information,
    The digest moving image generating unit determines whether to use each scene when generating a digest moving image based on the scene information, whether to arrange a plurality of scenes in the same frame, and a plurality of scenes. A video editing apparatus characterized by determining a spatial arrangement pattern of scenes when arranged in the same frame.
  2.  前記ダイジェスト動画像生成部は、時間的に近接する複数シーンのシーン情報を比較し、比較結果に基づいてシーンの種別である主シーンと副シーンを決定し、さらに、時間的に近接するシーン同士のシーンの種別の関係に基づいて、前記複数シーンの空間的配置パターンとして、少なくとも、
     2つ以上の主シーン同士を同一フレーム内に配置する「並列配置」パターン、
     主シーンと副シーンを、主シーンを画面中央領域に配置し、副シーンを主シーンの領域周辺に位置するように配置する「中央配置」パターン、
     主シーンをフレーム全体に表示し、その一部の領域に副シーンが重畳されるように配置する「子画面配置」パターン、のいずれかから選択して決定することを特徴とする、請求項1に記載の映像編集装置。
    The digest moving image generation unit compares scene information of a plurality of scenes that are temporally close to each other, determines a main scene and a sub-scene that are types of scenes based on the comparison result, and further, scenes that are temporally close to each other As a spatial arrangement pattern of the plurality of scenes based on the relationship of the scene types, at least,
    A “parallel arrangement” pattern in which two or more main scenes are arranged in the same frame,
    A “center placement” pattern that places the main scene and the sub-scene so that the main scene is placed in the center area of the screen and the sub-scene is located around the area of the main scene.
    The main scene is displayed over the entire frame, and is selected and determined from any of the “small screen layout” patterns arranged so that the sub-scene is superimposed on a part of the area. The video editing device described in 1.
  3.  前記ダイジェスト動画像生成部は、主シーンを画面中央領域に配置し、副シーンを主シーンの領域周辺に位置するように配置する「中央配置」パターンを選択した際に、副シーンに空間フィルタを適用して、画像の精鋭度もしくは色調に関して主シーンの領域と差をつけることを特徴とする、請求項2に記載の映像編集装置。 The digest moving image generator generates a spatial filter for the sub-scene when selecting the “center arrangement” pattern in which the main scene is arranged in the center area of the screen and the sub-scene is arranged so as to be positioned around the area of the main scene. 3. The video editing apparatus according to claim 2, wherein the video editing apparatus is applied to make a difference from an area of the main scene in terms of image sharpness or color tone.
  4.  前記ダイジェスト動画像生成部はさらに、ダイジェスト動画像の生成対象である画像データ群の単位で、ダイジェスト動画像の生成回数をカウントし、生成回数に応じて複数シーンを配置する際の配置パターンを変化させることを特徴とする、請求項1から3の何れか1項に記載の映像編集装置。 The digest moving image generation unit further counts the number of times the digest moving image is generated in units of the image data group that is the target of the digest moving image generation, and changes the arrangement pattern when multiple scenes are arranged according to the number of generations. The video editing apparatus according to any one of claims 1 to 3, wherein the video editing apparatus is characterized in that:
  5.  前記シーン情報生成部は、シーン単位で、画像フレーム内における特徴領域の数を示す情報である「領域数」と、前記特徴領域のうち面積が最大である領域の大きさを示す情報である「最大領域サイズ」と、前記特徴領域のうち面積が最大である領域の画像内の位置を示す情報である「最大領域位置」とを、前記シーン情報の一部として生成し、
     前記ダイジェスト動画像生成部は、複数のシーンを同一フレーム内に配置する際に、前記シーン情報で示される各情報に基づいて、主シーンとして切り出す画像領域や、副シーンに空間フィルタを適用する際のフィルタの強度を可変させることを特徴とする、請求項1から4の何れか1項に記載の映像編集装置。
    The scene information generating unit is information indicating the “number of regions” that is information indicating the number of feature regions in an image frame in units of scenes and the size of a region having the largest area among the feature regions. Generating a `` maximum region size '' and a `` maximum region position '' that is information indicating a position in the image of the region having the largest area among the feature regions, as a part of the scene information;
    The digest moving image generation unit applies a spatial filter to an image region to be cut out as a main scene or a sub-scene based on each information indicated by the scene information when a plurality of scenes are arranged in the same frame. 5. The video editing apparatus according to claim 1, wherein the strength of the filter is variable.
  6.  前記映像編集装置は、ダイジェスト動画像を出力する出力デバイスの特性を含む出力条件に基づいてダイジェスト動画像の生成方針を決定し、決定した生成方針を前記ダイジェスト動画像生成部へ通知する出力制御部をさらに備え、
     前記ダイジェスト動画像生成部は、前記生成方針および前記シーン情報に基づいて、ダイジェスト動画像を生成することを特徴とする、請求項1から5の何れか1項に記載の映像編集装置。
    The video editing apparatus determines a digest moving image generation policy based on an output condition including characteristics of an output device that outputs a digest moving image, and notifies the digest moving image generation unit of the determined generation policy Further comprising
    6. The video editing apparatus according to claim 1, wherein the digest moving image generating unit generates a digest moving image based on the generation policy and the scene information.
  7.  前記映像編集装置は、画像データが有する撮影時の条件を示すメタデータに基づいて、画像データをイベント単位に分類する画像データ分類部と、
     イベント単位に分類された画像データから、メタデータが所定の条件に従う画像データで構成される画像データ群をダイジェスト動画像の生成対象として選択するイベント選択部とをさらに備え、
     前記ダイジェスト動画像生成部は、前記イベント選択部によって選択された画像データ群を入力としてダイジェスト動画像を生成することを特徴とする、請求項1から6の何れか1項に記載の映像編集装置。
    The video editing apparatus includes: an image data classification unit that classifies image data in units of events based on metadata indicating image capturing conditions included in the image data;
    An event selection unit that selects, from the image data classified into event units, an image data group composed of image data whose metadata conforms to a predetermined condition as a digest moving image generation target;
    7. The video editing apparatus according to claim 1, wherein the digest moving image generation unit generates a digest moving image with the image data group selected by the event selection unit as an input. 8. .
  8.  画像データ群に基づいて、ダイジェスト動画像の再生時間候補を導出する再生時間候補導出部と、
     前記再生時間候補をユーザに提示し、ユーザイベントに基づいて指定時間を設定する再生時間候補表示部と、
     動画像を含む画像データ群を、1つ以上のシーンに分割するシーン情報生成部と、
     前記シーンに基づいて、画像クリップを生成し、前記画像クリップを時間的に結合することでダイジェスト動画像を生成するダイジェスト動画像生成部と
    を備える映像編集装置であって、
     前記ダイジェスト動画像生成部は、
    前記ダイジェスト動画像の再生時間が前記指定時間となるような調整を実施すること
    を特徴とする、映像編集装置。
    A playback time candidate derivation unit for deriving a digest video playback time candidate based on the image data group;
    A playback time candidate display unit that presents the playback time candidates to a user and sets a designated time based on a user event;
    A scene information generation unit that divides an image data group including a moving image into one or more scenes;
    A video editing device comprising: a digest moving image generating unit that generates an image clip based on the scene and generates a digest moving image by temporally combining the image clips;
    The digest moving image generating unit
    An image editing apparatus, wherein adjustment is performed such that a reproduction time of the digest moving image becomes the specified time.
  9.  前記ダイジェスト動画像生成部は、
    前記指定時間が短くなるにつれて、動きの少ない前記画像クリップの再生時間を短くすること
    を特徴とする、請求項8に記載の映像編集装置。
    The digest moving image generating unit
    9. The video editing apparatus according to claim 8, wherein the playback time of the image clip with little motion is shortened as the designated time is shortened.
  10.  前記ダイジェスト動画像生成部は、
    前記画像クリップのフレームを間引くことで再生時間を短くすること
    を特徴とする、請求項9に記載の映像編集装置。
    The digest moving image generating unit
    The video editing apparatus according to claim 9, wherein the playback time is shortened by thinning out frames of the image clip.
  11.  前記ダイジェスト動画像生成部は、
    前記指定時間が長くなるにつれて、動きの少ない前記画像クリップの再生時間を長くすること
    を特徴とする、請求項8に記載の映像編集装置。
    The digest moving image generating unit
    9. The video editing apparatus according to claim 8, wherein the playback time of the image clip with little motion is lengthened as the designated time becomes longer.
  12.  前記ダイジェスト動画像生成部は、
    前記画像クリップのフレームを補間することで再生時間を長くすること
    を特徴とする、請求項11に記載の映像編集装置。
    The digest moving image generating unit
    The video editing apparatus according to claim 11, wherein a reproduction time is extended by interpolating a frame of the image clip.
  13.  前記ダイジェスト動画像生成部は、
    前記シーンを単独で使用する単独シーンか複数のシーンを組み合わせて使う複数シーンの何れかに分類し、
    前記指定時間が短くなるにつれて、前記ダイジェスト動画像を構成する前記複数シーンの割合が大きくなること
    を特徴とする、請求項8に記載の映像編集装置。
    The digest moving image generating unit
    Classify the scene as either a single scene using a single scene or a plurality of scenes using a combination of multiple scenes.
    The video editing apparatus according to claim 8, wherein a proportion of the plurality of scenes constituting the digest moving image increases as the designated time becomes shorter.
  14.  動画像を含む画像データ群を、1つ以上のシーンに分割すると共に、シーン単位の特徴を示すシーン情報を生成するシーン情報生成部と、
     ダイジェスト動画像の生成方針を決定し、決定した生成方針をダイジェスト動画像生成部へ通知する出力制御部と、
     前記シーン情報および前記生成方針に基づいて、複数シーンを画面内に空間的に配置したシーン(以下、組合せシーンとする)を含む、前記画像データ群のダイジェスト動画像を生成するダイジェスト動画像生成部と、
     映像および操作用の情報を表示する映像表示部と、
     前記ダイジェスト動画像を再生して前記映像表示部に出力するダイジェスト動画像編集制御部と、
     外部からの操作入力を検出する操作部とを備える映像編集装置であって、
     前記操作部により検出された操作入力により前記ダイジェスト動画像の構成を変更することを特徴とする映像編集装置。
    A scene information generation unit that divides an image data group including a moving image into one or more scenes and generates scene information indicating features of each scene;
    An output control unit that determines a digest moving image generation policy, and notifies the digest moving image generation unit of the determined generation policy;
    Based on the scene information and the generation policy, a digest moving image generating unit that generates a digest moving image of the image data group including a scene in which a plurality of scenes are spatially arranged in a screen (hereinafter referred to as a combined scene). When,
    A video display unit for displaying video and operation information;
    A digest video editing control unit that reproduces the digest video and outputs the digest video to the video display unit;
    A video editing device including an operation unit for detecting an operation input from the outside,
    A video editing apparatus, wherein the configuration of the digest moving image is changed by an operation input detected by the operation unit.
  15.  前記映像編集装置は、前記操作入力により指定されたシーンもしくは組み合わせシーンを構成する一部のシーンを前記ダイジェスト動画像から削除することを特徴とする、請求項14に記載の映像編集装置。 15. The video editing apparatus according to claim 14, wherein the video editing apparatus deletes a part of a scene constituting a scene designated by the operation input or a combination scene from the digest moving image.
  16.  前記映像編集装置は、前記操作入力により指定された組合せシーンの空間的配置パターンを変更することを特徴とする、請求項14に記載の映像編集装置。 15. The video editing apparatus according to claim 14, wherein the video editing apparatus changes a spatial arrangement pattern of a combination scene designated by the operation input.
  17.  前記映像編集装置は、前記操作入力により指定された領域に対し、動画像にフィルタをかけることを特徴とする、請求項14に記載の映像編集装置。 15. The video editing apparatus according to claim 14, wherein the video editing apparatus filters a moving image with respect to an area designated by the operation input.
  18.  前記映像編集装置は、前記操作入力により、新たに撮影した画像データを前記ダイジェスト動画像に追加することを特徴とする、請求項14に記載の映像編集装置。 15. The video editing apparatus according to claim 14, wherein the video editing apparatus adds newly captured image data to the digest moving image by the operation input.
  19.  前記映像編集装置は、前記操作入力により指定された組合せシーンから、前記組合せシーンを構成するいずれかのシーンを、単独のシーンとして前記ダイジェスト動画像において前記組合せシーンの時間的に前または後に挿入することを特徴とする、請求項14に記載の映像編集装置。 The video editing device inserts any scene constituting the combination scene from the combination scene specified by the operation input as a single scene before or after the combination scene in the digest moving image. The video editing apparatus according to claim 14, wherein:
  20.  前記映像編集装置は、前記ダイジェスト動画像の再生時における操作入力により、前記ダイジェスト動画像の生成に用いた画像および、前記画像の空間的および時間的な配置を示す情報を用いて、前記ダイジェスト動画像の内容を変更することを特徴とする、請求項14から19の何れか1項に記載の映像編集装置。 The video editing apparatus uses the image used for generating the digest moving image and information indicating the spatial and temporal arrangement of the image by an operation input at the time of reproduction of the digest moving image, and the digest moving image. 20. The video editing apparatus according to claim 14, wherein the content of the image is changed.
PCT/JP2015/054406 2014-02-20 2015-02-18 Video image editing apparatus WO2015125815A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016504128A JPWO2015125815A1 (en) 2014-02-20 2015-02-18 Video editing device

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
JP2014030430 2014-02-20
JP2014-030430 2014-02-20
JP2014061382 2014-03-25
JP2014-061382 2014-03-25
JP2014-063798 2014-03-26
JP2014063798 2014-03-26
JP2014065062 2014-03-27
JP2014-065062 2014-03-27
JP2014181027 2014-09-05
JP2014-181027 2014-09-05

Publications (1)

Publication Number Publication Date
WO2015125815A1 true WO2015125815A1 (en) 2015-08-27

Family

ID=53878315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/054406 WO2015125815A1 (en) 2014-02-20 2015-02-18 Video image editing apparatus

Country Status (2)

Country Link
JP (1) JPWO2015125815A1 (en)
WO (1) WO2015125815A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP1568793S (en) * 2016-04-27 2017-02-06
JP2017130816A (en) * 2016-01-20 2017-07-27 ヤフー株式会社 Information display program, information display device, information display method, and distribution device
JP2019184946A (en) * 2018-04-16 2019-10-24 株式会社デンソーテン Deposit removal system and deposit removal method
JP2020096235A (en) * 2018-12-10 2020-06-18 株式会社ソニー・インタラクティブエンタテインメント Information processing apparatus and content editing method
JP2021044779A (en) * 2019-09-13 2021-03-18 株式会社デンソーテン Image display device, image display method, and image display system
WO2021162019A1 (en) * 2020-02-14 2021-08-19 ソニーグループ株式会社 Content processing device, content processing method, and content processing program
JP2021132328A (en) * 2020-02-20 2021-09-09 株式会社エクサウィザーズ Information processing method, information processing device, and computer program
JP7118379B1 (en) 2021-02-19 2022-08-16 株式会社Gravitas VIDEO EDITING DEVICE, VIDEO EDITING METHOD, AND COMPUTER PROGRAM

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10112835A (en) * 1996-10-04 1998-04-28 Matsushita Electric Ind Co Ltd Video image summarizing method and video image display method
JP2000115690A (en) * 1998-10-06 2000-04-21 Nec Corp Structured display system for video image and structured display method therefor
JP2000253351A (en) * 1999-03-01 2000-09-14 Mitsubishi Electric Corp Animation summarizing device, computer-readable recording medium recording animation sammarizing program, animation reproducing device and computer- readable recording medium recording animation reproducing program
JP2002262228A (en) * 2001-03-02 2002-09-13 Sharp Corp Digest producing device
JP2005086218A (en) * 2003-09-04 2005-03-31 Ntt Comware Corp Method, apparatus and program for processing animation
JP2007228604A (en) * 1999-03-12 2007-09-06 Fuji Xerox Co Ltd Method summarizing video content
JP2007228334A (en) * 2006-02-24 2007-09-06 Fujifilm Corp Moving picture control apparatus and method, and program
JP2008236729A (en) * 2007-02-19 2008-10-02 Victor Co Of Japan Ltd Method and apparatus for generating digest
JP2010245856A (en) * 2009-04-07 2010-10-28 Panasonic Corp Video editing device
JP2010258768A (en) * 2009-04-24 2010-11-11 Canon Inc Image display device and control method thereof, program and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10112835A (en) * 1996-10-04 1998-04-28 Matsushita Electric Ind Co Ltd Video image summarizing method and video image display method
JP2000115690A (en) * 1998-10-06 2000-04-21 Nec Corp Structured display system for video image and structured display method therefor
JP2000253351A (en) * 1999-03-01 2000-09-14 Mitsubishi Electric Corp Animation summarizing device, computer-readable recording medium recording animation sammarizing program, animation reproducing device and computer- readable recording medium recording animation reproducing program
JP2007228604A (en) * 1999-03-12 2007-09-06 Fuji Xerox Co Ltd Method summarizing video content
JP2002262228A (en) * 2001-03-02 2002-09-13 Sharp Corp Digest producing device
JP2005086218A (en) * 2003-09-04 2005-03-31 Ntt Comware Corp Method, apparatus and program for processing animation
JP2007228334A (en) * 2006-02-24 2007-09-06 Fujifilm Corp Moving picture control apparatus and method, and program
JP2008236729A (en) * 2007-02-19 2008-10-02 Victor Co Of Japan Ltd Method and apparatus for generating digest
JP2010245856A (en) * 2009-04-07 2010-10-28 Panasonic Corp Video editing device
JP2010258768A (en) * 2009-04-24 2010-11-11 Canon Inc Image display device and control method thereof, program and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017130816A (en) * 2016-01-20 2017-07-27 ヤフー株式会社 Information display program, information display device, information display method, and distribution device
JP1568793S (en) * 2016-04-27 2017-02-06
JP2019184946A (en) * 2018-04-16 2019-10-24 株式会社デンソーテン Deposit removal system and deposit removal method
US11130450B2 (en) 2018-04-16 2021-09-28 Denso Ten Limited Deposit removal system, and deposit removal method
JP7134082B2 (en) 2018-12-10 2022-09-09 株式会社ソニー・インタラクティブエンタテインメント Information processing device and content editing method
JP2020096235A (en) * 2018-12-10 2020-06-18 株式会社ソニー・インタラクティブエンタテインメント Information processing apparatus and content editing method
US11727959B2 (en) 2018-12-10 2023-08-15 Sony Interactive Entertainment Inc. Information processing device and content editing method
JP2021044779A (en) * 2019-09-13 2021-03-18 株式会社デンソーテン Image display device, image display method, and image display system
WO2021162019A1 (en) * 2020-02-14 2021-08-19 ソニーグループ株式会社 Content processing device, content processing method, and content processing program
JP2021132328A (en) * 2020-02-20 2021-09-09 株式会社エクサウィザーズ Information processing method, information processing device, and computer program
JP2022127469A (en) * 2021-02-19 2022-08-31 株式会社Gravitas Video editing device, video editing method, and computer program
JP7118379B1 (en) 2021-02-19 2022-08-16 株式会社Gravitas VIDEO EDITING DEVICE, VIDEO EDITING METHOD, AND COMPUTER PROGRAM
US11942115B2 (en) 2021-02-19 2024-03-26 Genevis Inc. Video editing device, video editing method, and computer program

Also Published As

Publication number Publication date
JPWO2015125815A1 (en) 2017-03-30

Similar Documents

Publication Publication Date Title
WO2015125815A1 (en) Video image editing apparatus
WO2007126096A1 (en) Image processing device and image processing method
US8839110B2 (en) Rate conform operation for a media-editing application
JP5817400B2 (en) Information processing apparatus, information processing method, and program
US8416332B2 (en) Information processing apparatus, information processing method, and program
US8782563B2 (en) Information processing apparatus and method, and program
JP5768126B2 (en) Determining key video snippets using selection criteria
JP5552769B2 (en) Image editing apparatus, image editing method and program
WO2020107297A1 (en) Video clipping control method, terminal device, system
US8004594B2 (en) Apparatus, method, and program for controlling display of moving and still images
WO2007126097A1 (en) Image processing device and image processing method
US20060114327A1 (en) Photo movie creating apparatus and program
JP2016537744A (en) Interactive graphical user interface based on gestures for video editing on smartphone / camera with touchscreen
JPWO2007111206A1 (en) Image processing apparatus and image processing method
JP2007079641A (en) Information processor and processing method, program, and storage medium
WO2013136792A1 (en) Content processing device, content processing method, and program
JP2009529726A (en) Content access tree
US11792504B2 (en) Personalized videos
JP2011182118A (en) Display controlling apparatus and control method for the same
JP2009004999A (en) Video data management device
CN104205795B (en) Color grading preview method and apparatus
US20150348588A1 (en) Method and apparatus for video segment cropping
JP2006101076A (en) Method and device for moving picture editing and program
KR102066857B1 (en) object image tracking streaming system and method using the same
JP3523784B2 (en) Interactive image operation display apparatus and method, and program storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15751630

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016504128

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15751630

Country of ref document: EP

Kind code of ref document: A1