WO2006016590A1 - 情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 - Google Patents
情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 Download PDFInfo
- Publication number
- WO2006016590A1 WO2006016590A1 PCT/JP2005/014597 JP2005014597W WO2006016590A1 WO 2006016590 A1 WO2006016590 A1 WO 2006016590A1 JP 2005014597 W JP2005014597 W JP 2005014597W WO 2006016590 A1 WO2006016590 A1 WO 2006016590A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- predetermined
- data
- audio
- signal
- image
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/147—Scene change detection
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/78—Television signal recording using magnetic recording
- H04N5/782—Television signal recording using magnetic recording on tape
- H04N5/783—Adaptations for reproducing at a rate different from the recording rate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/91—Television signal processing therefor
Definitions
- the present invention performs predetermined band compression processing such as MPEG (Moving Picture Export Group) on video and audio data such as a video signal and an audio signal in a broadcast program, for example, and produces a magneto-optical disk and a hard disk (HDD). ),
- An information signal processing method, an information signal processing device, and an information signal processing method for performing a special reproduction operation such as a predetermined summary reproduction (digest reproduction) process in a recording / reproducing apparatus that records and reproduces data on a recording medium such as a semiconductor memory It relates to the computer program recording medium.
- predetermined feature data is extracted based on features appearing in image / audio data (image / audio information signal, image / audio signal, image / audio information data) of the broadcast program to be recorded, and the predetermined feature data is used. Detects key frame sections that are considered to be key frames (important frames), and sequentially selects and plays back predetermined key frame sections according to predetermined rules that are determined in advance. Summary playback (digest playback) may be performed within a predetermined time shorter than the recording time.
- the ability to automatically generate position information data indicating a reproduction position at a certain time interval for example, every 3 minutes, 5 minutes, 10 minutes, or the like, or manually desired by the user
- So-called chapter data is generated to generate position information data at the position, and skip playback, editing operation, and thumbnail image display are performed using the position information data (chapter data).
- the feature data described above can extract feature data for a plurality of types of features for each image signal and sound signal, and each feature data is subjected to an extraction process when recording image sound data, for example. And record the feature data along with the image and sound data on the recording medium.
- the recorded feature data is read and signal processing is performed to determine the section for summary playback (digest playback) by a predetermined rule process.
- the number of files increases, and handling of files during signal processing becomes complicated, which is not efficient.
- the object of the present invention is to efficiently process feature data and perform effective summary playback (digest playback) operation or chapter processing using feature data. It is an object to provide an information signal processing method, an information signal processing apparatus, and a computer program recording medium for efficiently performing various operations using summary reproduction (digest reproduction) and chapter data.
- the information signal processing method detects an audio level or a predetermined audio characteristic for each predetermined section of an audio signal from a predetermined audio / video information signal or an audio / video information signal obtained by subjecting the signal to a predetermined band compression process. Then, the audio signal is processed as a predetermined segment section according to the detection result and the predetermined set value, predetermined characteristic data for each predetermined section of the image signal is extracted from the image audio information signal, and the predetermined section data is extracted from the characteristic data. Predetermined image feature data indicating image features is generated, and the image audio information signal is segmented according to the signal in the image feature data and the audio segment processing and the predetermined time length or section length setting data, and the image audio information signal Based on the segment! Data and image feature data are recorded in a predetermined recording medium or a predetermined data memory.
- the information signal processing method detects the audio level or the predetermined audio characteristics for each predetermined section of the audio signal from the predetermined audio / video information signal or the audio / video information signal obtained by subjecting the signal to a predetermined band compression process. Then, the audio signal is processed as a predetermined segment interval according to the detection result and the predetermined set value, and predetermined characteristic data for each predetermined interval of the image signal is extracted from the image audio information signal, and the predetermined interval is obtained from the characteristic data table.
- Predetermined image feature data indicating the image features of the image is generated, and the image audio information signal is segmented according to the signal in the segment processing of the image feature data and the audio and the predetermined time length or section length setting data, and the image audio The audio signal force is extracted based on the segment of the information signal.
- the data or data file on the predetermined recording medium on which the data is recorded or in the predetermined data memory is read, and predetermined data corresponding to a predetermined reproduction section determination process or a predetermined reproduction time point setting process is generated.
- the information signal processing method detects a sound level or a predetermined sound characteristic for each predetermined section of an audio signal from a predetermined image / audio information signal or an image / audio information signal obtained by subjecting the signal to a predetermined band compression process. Then, the audio signal is processed as a predetermined segment interval according to the detection result and a predetermined set value, and predetermined characteristic data for each predetermined interval of the image signal is extracted from the image audio information signal, and the predetermined characteristic data is extracted from the characteristic data.
- Predetermined image feature data indicating the image feature of the section is generated, the image audio information signal is segmented according to the signal in the segment processing of the image feature data and the sound, and the predetermined time length or section length setting data, and the image Using the predetermined voice feature data and image feature data that have been subjected to voice signal force extraction processing based on the segment of the voice information signal, or using the voice feature data and image feature data.
- An information signal processing apparatus detects a sound level or a predetermined sound characteristic for each predetermined section of a sound signal from a predetermined image / audio information signal or an image / audio information signal obtained by subjecting the signal to a predetermined band compression process.
- An audio signal processing unit that processes the audio signal as a predetermined segment interval according to the detection result and a predetermined set value, and extracts predetermined characteristic data for each predetermined interval of the image signal from the image audio information signal, Its characteristic data force Image feature data processing unit that generates predetermined image feature data indicating image features in a predetermined section, image feature data processing unit force signal and audio signal processing unit force signal, predetermined time length or interval length
- An information signal segment processing unit that performs segment processing on the signal from the audio signal processing unit or the image audio information signal according to the setting data, and the sound based on the signal from the information signal segment processing unit.
- the information signal processing apparatus detects a sound level or a predetermined sound characteristic for each predetermined section of the sound signal from a predetermined image / audio information signal or an image / audio information signal obtained by subjecting the signal to a predetermined band compression process.
- an audio signal processing unit for processing the audio signal as a predetermined segment interval according to the detection result and a predetermined set value, and extracting predetermined characteristic data for each predetermined interval of the image audio information signal power image signal
- An image feature data processing unit that generates predetermined image feature data indicating image characteristics of a predetermined section from the characteristic data; a signal from the image feature data processing unit; a signal from the audio signal processing unit; and a predetermined time length or section length Based on the signal from the information signal segment processing unit and the information signal segment processing unit that performs segment processing on the signal from the audio signal processing unit or the video / audio information signal according to the setting data.
- a data generation unit that generates predetermined data according to a predetermined reproduction section determination process or a predetermined reproduction time point setting process.
- the information signal processing apparatus provides a sound for each predetermined section of an audio signal from a predetermined image / audio information signal or an image / audio information signal obtained by subjecting the signal to a predetermined band compression process.
- a voice signal processing unit that detects a voice level or a predetermined voice characteristic and processes the voice signal as a predetermined segment section according to the detection result and a predetermined set value, and an image voice information signal for each predetermined section of the image signal
- An image feature data processing unit that extracts predetermined characteristic data and generates predetermined image feature data indicating an image feature of a predetermined section from the characteristic data; a signal from the image feature data processing unit and a signal from the audio signal processing unit And an audio signal based on the signal from the information signal segment processing unit and the information signal segment processing unit that performs segment processing on the signal from the audio signal processing unit or the image audio information signal according to the predetermined time length or section length setting data Using the predetermined audio feature data and image feature data extracted from or on a predetermined recording medium on which the predetermined
- the program recording medium detects a sound level or a predetermined sound characteristic for each predetermined section of an audio signal from a predetermined image / audio information signal or an image / audio information signal obtained by subjecting the signal to a predetermined band compression process.
- the audio signal is processed as a predetermined segment interval according to the predetermined set value, predetermined characteristic data for each predetermined interval of the image signal is extracted from the image audio information signal, and the image feature of the predetermined interval is extracted from the characteristic data table.
- the predetermined image feature data is generated, the image audio information signal is segmented in accordance with the signal in the image feature data and the audio segment processing, and the predetermined time length or section length setting data, and based on the segment of the image audio information signal
- the program recording medium detects a sound level or a predetermined sound characteristic for each predetermined section of the sound signal from a predetermined image sound information signal or an image sound information signal obtained by subjecting the signal to a predetermined band compression process.
- Audio signal according to detection result and predetermined set value Is processed as a predetermined segment interval, predetermined characteristic data for each predetermined interval of the image signal is extracted from the image / audio information signal, and predetermined image feature data indicating the image characteristic of the predetermined interval is generated from the characteristic data, Segmented audio / video information signal based on feature data and signal in segment processing of audio and predetermined time length or section length setting data, and extracted audio signal force based on segment of image / audio information signal.
- a control program that generates predetermined data corresponding to the processing or predetermined reproduction time point setting processing is recorded so that it can be read and executed by a computer. It is.
- the program recording medium detects the audio level or the predetermined audio characteristic for each predetermined section of the audio signal from the predetermined audio / video information signal or the audio / video information signal obtained by subjecting the signal to a predetermined band compression process.
- the audio signal is processed as a predetermined segment interval according to the detection result and the predetermined set value, predetermined characteristic data for each predetermined interval of the image signal is extracted from the image audio information signal, and image characteristics of the predetermined interval are extracted from the characteristic data.
- Predetermined image feature data is generated, the image audio information signal is segmented according to the signal in the image feature data and the audio segment processing, and the predetermined time length or section length setting data, and the segment of the image audio information signal is generated.
- Audio signal force based on the predetermined audio feature data and image feature data extracted, or the audio feature data and image feature data
- the data or data file on the predetermined recording medium on which the data is recorded is read, and the read data is used to set a plurality of predetermined playback sections and playback sections in the audio / video information signal according to the predetermined playback section determination process.
- the predetermined data corresponding to the processing or the predetermined reproduction time point setting process is generated, and the generated data or the data read from the predetermined recording medium on which the generated data is recorded or the predetermined data memory is used.
- a control program for reproducing a predetermined section or displaying a predetermined time point is recorded so as to be readable and executable by a computer.
- a plurality of feature data such as camera features, telop features, scene features, color features, etc. as image features, silence features, sound quality features (for example, whether or not a voice is spoken), etc. as voice features can be efficiently used in a predetermined format. It can be processed as a data file well, and it can be recorded on a predetermined recording medium together with video and audio data for efficient file management and file processing during signal processing.
- the recording capacity occupied by the file can be reduced as compared with the case where a file is provided for each feature data on the recording medium.
- the purchased device even after a user purchases a recording / playback device that does not have a function, when the user desires that function, the purchased device itself can easily operate the function. can do.
- FIG. 1A to FIG. 1G are diagrams showing operations of summary reproduction and chapter processing in a recording / reproducing apparatus to which the present invention is applied.
- FIG. 2 is a diagram showing an example of display by chapter processing.
- FIG. 3 is a block diagram showing an example of a processing process in the recording / reproducing apparatus.
- FIG. 4 is a block diagram showing rule processing in the recording / reproducing apparatus.
- FIG. 5A is a diagram showing an example of the relationship between the semanticizing process and the feature data in the recording / reproducing apparatus.
- FIG. 5B is a diagram showing an example of the relationship between the semanticizing process and the feature data in the recording / reproducing apparatus.
- FIG. 6A to FIG. 6C are diagrams showing an example of a rule file format in the recording / reproducing apparatus.
- FIG. 7 is a diagram showing an example of an evaluation value calculation processing method in the recording / reproducing apparatus.
- FIG. 8A to FIG. 81 are graphs showing an example of a time correction function in the recording / reproducing apparatus.
- FIG. 9 is a graph showing an example of a general type of time correction function in the recording / reproducing apparatus.
- FIG. 10 is a diagram showing an example of the structure of video data in the recording / reproducing apparatus.
- FIG. 11 is a diagram of an example of a connection relationship between playback units in the recording / playback apparatus.
- FIGS. 12A and 12B are diagrams illustrating an example of a meaning assignment process between playback units in the recording / playback apparatus.
- FIG. 13A and FIG. 13B are diagrams showing an example of rule 2 processing in the recording / reproducing apparatus.
- FIG. 14 is a graph showing an example of a time correction function in the recording / reproducing apparatus.
- FIG. 15A and FIG. 5B are explanatory diagrams of an example of the configuration of a rule file in the recording / reproducing apparatus.
- FIG. 16A to FIG. 16D are diagrams showing an example of the processing process of the present invention in the recording / reproducing apparatus.
- FIG. 17 is a block circuit diagram showing a configuration example of a recording / reproducing apparatus to which the present invention is applied.
- FIG. 18 is a diagram showing an example of various predetermined data recording states in the recording / reproducing apparatus.
- FIG. 19 is a diagram showing an example of display on the recording / reproducing apparatus.
- FIG. 20 is a block circuit diagram showing another configuration example of the recording / reproducing apparatus to which the present invention is applied.
- FIG. 21 is a block circuit diagram showing an example of the configuration of an audio feature extraction processing system in the recording / reproducing apparatus.
- Fig. 22 shows another example of the configuration of the audio feature extraction processing system in the recording / reproducing apparatus. It is a block circuit diagram.
- FIG. 23 is a block circuit diagram showing an example of a configuration of a video system feature extraction processing system in the recording / reproducing apparatus.
- FIG. 24 is a diagram showing scene change processing in the recording / reproducing apparatus.
- FIG. 25 is a diagram showing an example of a telop and color feature detection area in the recording / reproducing apparatus.
- FIG. 26 is a diagram showing an example of similar image characteristics in the recording / reproducing apparatus.
- FIG. 27 is a diagram showing an example of a person feature detection area in the recording / reproducing apparatus.
- FIG. 28 is a diagram showing an example of a person detection process in the recording / reproducing apparatus.
- FIG. 29 is a diagram showing an example of person detection (number of persons determination) processing in the recording / reproducing apparatus.
- FIG. 30 is a diagram showing an example of the number of people detection process in the recording / reproducing apparatus.
- FIG. 31 is a diagram showing an example of the number of people detection process in the recording / reproducing apparatus.
- FIG. 32 is a diagram showing an example of the number of people detection process in the recording / reproducing apparatus.
- FIG. 33 is a diagram showing an example of the number of people detection process in the recording / reproducing apparatus.
- FIG. 34A to FIG. 34E are diagrams showing an example of playback unit processing in the recording / playback apparatus.
- FIG. 35A and FIG. 35B are diagrams showing an example of playback unit processing in the recording / playback apparatus.
- FIG. 36 is a diagram showing an example of CM (commercial) detection processing in the recording / reproducing apparatus.
- FIG. 37 is a block diagram showing a configuration example of a playback unit processing system in the recording / playback apparatus.
- FIG. 38 is a diagram showing an example of the structure of a feature data file in the recording / reproducing apparatus.
- FIG. 39 is a diagram showing an example of a configuration of a feature data file in the recording / reproducing device.
- FIG. 40 is an explanatory diagram showing an example of the structure of a feature data file in the recording / reproducing apparatus. It is.
- FIG. 41 is a diagram showing an example of a hierarchical structure of reproduction unit data in the recording / reproduction device.
- FIG. 42 is a diagram showing an example of a hierarchical structure of reproduction unit data in the recording / reproduction device.
- FIG. 43 is a diagram showing an example of a configuration of playback unit video feature data in the recording / playback apparatus.
- FIG. 44A and FIG. 44B are diagrams showing an example of playlist (summary) data in the recording / reproducing apparatus.
- FIG. 45 is a flowchart showing an example of the operation of the recording / reproducing apparatus.
- FIG. 46 is a diagram showing an example of the relationship between recording time and selectable summary playback time in the recording / playback apparatus.
- FIG. 47 is a diagram showing an example of a recording time and the number of automatically set chapters in the recording / reproducing apparatus.
- FIG. 48 is a flowchart showing an example of a recording operation of the recording / reproducing apparatus.
- FIG. 49 is a flowchart showing an example of a reproducing operation of the recording / reproducing apparatus.
- FIG. 50 is a flowchart showing another example of the reproducing operation of the recording / reproducing apparatus.
- playlist data generation and chapter data are generated and processed together even if not specifically described. May be.
- Figure 1 is an explanatory diagram of summary playback (digest playback) and chapter processing using feature data.
- This video / audio data series includes broadcast programs, movie software, etc., using a predetermined recording medium such as a hard disk (HDD), magneto-optical disk, large-capacity semiconductor memory, etc.
- a predetermined recording medium such as a hard disk (HDD), magneto-optical disk, large-capacity semiconductor memory, etc.
- Recording and playback processing shall be performed using predetermined band compression signal processing such as PEG (Moving Picture Export Group).
- PEG Motion Picture Export Group
- Fig. 1B shows a conceptual diagram of a predetermined section in which a predetermined meaning is set in an image / audio data sequence and divided into a predetermined video structure (semantic video structure) according to scene changes, audio segments, and the like.
- each predetermined section predetermined all sections
- predetermined evaluation value for the interval is set.
- Each section in which this evaluation value is set is defined as a predetermined evaluation value section (evaluation data section).
- all sections recorded within a predetermined time refers to all sections of the image / audio data when there is image / audio data for a predetermined time without being limited by the frame of the program.
- the “predetermined program section” indicates the entire section of the program frame when there is video / audio data of a certain program.
- a higher evaluation value (evaluation data) is set as the predetermined evaluation value becomes a predetermined key frame interval (important frame interval, important (image and audio) interval) in all predetermined intervals.
- Fig. 1C shows the outline of the predetermined evaluation value interval.
- each of the fl to f2, f4 to f5, and f7 to f8 intervals is set to the threshold value Th.
- predetermined summary playback digest playback
- FIG. 1E is a conceptual diagram when setting a chapter point, and continues as described above, at the beginning of the predetermined key frame section (important frame section) or its vicinity, and at the end of the key frame section. Set a chapter point at or near the beginning of a section that is not the keyframe section (connected last).
- FF playback fast forward playback
- REW playback fast forward reverse playback
- chapter point setting processing predetermined time setting processing, predetermined position setting processing
- chapter points can be automatically set at or near the beginning of a section that is connected to the end of a key frame section or is not the last key frame section, it is more effective than conventional chapter processing.
- Effective chapter setting, effective editing operation using this chapter processing (editing processing), FF playback, R EW playback can be performed.
- FIG. 2 shows a conceptual diagram in a case where the automatically set chapter points shown in FIG. 1F are displayed on a predetermined image monitor as thumbnail images of a predetermined size.
- fl, f4, and f7 are the key frames after the beginning of or near the specified key frame section Al, A2, and A3, respectively
- f3, f6, and f9 are the key frames after the sections of Al, A2, and A3, respectively.
- a broadcast recorded on a hard disk which is a recording medium of a recording / reproducing device, can be obtained by viewing the display screen as shown in FIG. 2 at the beginning of or near the beginning of a section Bl, B2, B3 that is not a section.
- the key frame sections Al, A2, and A3 shown in Fig. 1D are cut out and recorded on a disc recording medium such as a DVD (Digital Versatile Disc) or at the time of fl, f4, and f7.
- An operation such as skip playback is assumed.
- Fig. 1G As shown in Fig. 1G as an example of a conventional preset point in time (chapter point, preset position set point), set at regular intervals such as 5 minutes, 10 minutes, etc. Points (chapter points) are set, but they are not always set to key frames (important frames) so that the forces shown in Fig. 1C and Fig. 1G are also divided!
- predetermined chapter points predetermined set points or predetermined break points
- segment processing automatically using the feature data in the present invention
- FIG. 3 shows an example of the processing process in the present invention.
- the processing process shown in FIG. 3 includes a feature extraction process (2) for extracting image and audio feature data from MPEG image and audio stream data.
- the MPEG stream (1) (MPEG data) is recorded on a predetermined recording medium or is assumed to have data recorded on the predetermined recording medium.
- a predetermined transmission system ( The present invention can be similarly applied to image / audio data transmitted in a wired system or a wireless system.
- the feature extraction process (2) is a force that can be performed simultaneously with the recording process.
- the recording medium force is reproduced to perform the predetermined feature extraction process. It can also be done.
- rule processing (rule processing) will be described.
- This rule processing is performed using a rule file or rule data in which rules are described in a predetermined format.
- rule file for example, a rule based on feature data according to the program genre is described.
- This rule file and a PU feature data file (reproduction unit feature data file) in which each feature data of a predetermined section is described, and As a result of this calculation, a predetermined playlist file is generated.
- (*) is assumed to be a predetermined operator using data of a predetermined file.
- the rule file Rf (n) is described in a predetermined format, for example, as described below, and data of predetermined parameters such as a predetermined time correction function, meaning, and weighting coefficient (evaluation value, importance) of meaning. Etc.
- PU processing (3) (playback unit processing), which is one of the features of the present invention, is performed.
- each characteristic data is recorded (stored) in a predetermined recording medium or buffer memory as predetermined data (PU characteristic data file) at the PU (playback unit) t and the separator (4). Is done.
- the PU feature data file is subjected to PU semantic processing by the prescribed rule 1 processing (5).
- rule 1 processing (5) As will be explained later, the outline of rule 1 processing (5) is as follows.
- the designated PU (6) is subjected to the prescribed evaluation value processing in the prescribed rule 2 processing (7).
- the rule 2 processing (7) the importance of the following (processing 1) and (processing 2) is set. Then, all the evaluation values are processed.
- a predetermined evaluation value is given by a PU alone or a PU group in which several PUs are connected.
- rule switching processing system 900 performs genre A rule data, genre as rule processing data corresponding to a plurality of program genres. B rule data, genre C rule data, ... and some rule processing data (rule processing data), rule 1 processing according to program genre information data input to system controller system 20 (5) , Rule 2 processing (7), or switch either rule processing.
- a number of rule processing data are provided for individual switching.
- the personal 1 rule processing data, the personal 2 rule processing data, and the personal 3 rule processing data are set by a predetermined user input to the system controller. Any one of these is subjected to selection processing via the system controller system 20, and predetermined rule processing is performed based on the selected rule processing data.
- individual playback operations such as normal playback or special playback are performed for each individual, and the playback status, playback position, etc.
- Operation information, operation position information, etc. are stored in a predetermined memory means so that they can be reflected in predetermined individual rule processing, and the information data is stored as individual rule processing data by predetermined learning processing at a predetermined timing.
- rule switching processing system 901 switches between individual rule processing (rule processing), and rule 1 processing (5), rule 2 processing (7), or either rule. Switch processing.
- the semantically processed PU is described in association with predetermined image / audio feature data by setting the following English characters and meaning, for example, assuming a certain broadcast program.
- the meaning of characters is defined as a scene that may be assumed to be a key frame (important frame, important scene) in the broadcast program, or a predetermined recording and playback section that is assumed to be effective for summary playback, chapter setting, etc. Select and describe.
- a scene desired by the user is described.
- a rule desired by the user can be described in a predetermined adjustment mode.
- Table 1 An example of a news (report) program
- a rule for extracting an announcer's scene is described by a.
- all assumed scenes of a are described.
- Scene cannot be extracted, so it should be described in several rules.
- the definition character can be set to @ and set as shown in Table 3 below.
- the rule 1 processing for the defined characters (set characters, meaning characters) set as described above will be specifically described by taking a news program as an example.
- the voice feature attribute is speaker voice, color.
- a predetermined color is detected in feature detection area 2 or detection area 3, and the frequency of similar image information It can be assumed that the first or second place is detected and detected in detection area 1 or detection area 2 or detection area 5 of the human feature, and the camera feature is stationary.
- each defined character and each feature data is described according to a predetermined format in order to perform predetermined processing, that is, rule 1 processing and rule 2 processing.
- Fig. 6A shows an example of this, assuming that it is a vector component.
- each feature data shown in FIG. 5A and FIG. 5B is an attribute of a voice feature, and when the attribute is speaker voice, Al, when the attribute is music, A2, and when the attribute is other A3.
- the color feature of the video feature is that area 1 is Bl and area 2 is B2.
- (A1) represents the case where the attribute is speaker voice and the voice feature.
- 1.0 of 1.0 (A1) 100 is a weighting coefficient for (A1), and here, for convenience, a range of 0 to 1.0 is assumed.
- the weighting coefficient is a convenient coefficient for performing a predetermined calculation, the weighting coefficient is set (described) in the range of 0 to 100 or 0 to 10. (Detection rate coefficient)
- (A1) 100 is the detection ratio coefficient for (A1), and 100 (A1) 100 satisfies the condition when 100% is detected in the playback unit section. .
- 1.0 (A1) 50 means that 50% is detected in the playback unit section.
- the detection ratio coefficient assumes a range of 0 to L00.
- the detection ratio coefficient is a convenient coefficient for performing a predetermined calculation, so set it in the range of 0 to 1 or set (describe) it in the range of 0 to: L0.
- the detection ratio coefficient can be a ratio at which the characteristic can be detected in the playback unit section.
- a predetermined processing unit called a playback unit (PU) that is set according to the audio segment feature and the scene change feature is used.
- a processing concept for setting intervals is introduced.
- the ratio of each predetermined characteristic described above is calculated based on the ratio at which the predetermined feature data is detected with respect to the entire PU section.
- the detection ratio F of feature data P in this case Can be calculated by the following equation (3).
- the ideal value and detection result are processed as follows.
- the above-described evaluation value processing is an example of a processing method, and processing in which the correspondence between the detected feature data or the ratio detected in the playback unit section and the set “meaning” has a predetermined validity. If it is a method, other processing methods than the above can be used.
- the detection ratio of each feature is set as shown in Table 4 below, and the detection ratio coefficient and weighting coefficient are unified. It is shown in the beginning.
- the value averaged by the type of feature data can be used as the evaluation value.
- the evaluation value is as shown in the following equation (7). You can also.
- the averaging process is performed on feature data type 5.
- r 0.8
- the averaging process is performed on feature data type 5.
- t can also be an evaluation value.
- feature data having different attributes is expressed by a logical product operator (*), but may be a logical sum operator (+).
- the processing is performed based on the concept of the logical sum coefficient w as described in the evaluation value calculation method (3) above.
- the evaluation value processing is a concept introduced for the purpose of evaluating the value of an expression that combines the set meaning with each feature data, various coefficients, etc., so the range, value, etc. of each coefficient in the above evaluation expression are as described above. Not limited to the case described in the description, it can also be set to be small or large.
- the evaluation value of each section of the playback unit described in the rule is determined by the rule file by calculating the evaluation value as follows, for example, In summary playback mode, PU sections with large evaluation values are selected according to the summary playback time, and PU sections with small evaluation values are selected gradually so as to be as close as possible to the summary time.
- w (M) * k and each term of the evaluation formula as the detection rate det weight coefficient w and detection rate factor k of each predetermined feature data.
- w (n) be the weighting factor of the feature data n of, and let it be an arithmetic function P and an operator *.
- the intermediate processing value d (n) is processed to 100 or 0 according to the detection det (n) and the set detection ratio k (n).
- the feature data is markedly characterized compared to the case where the processed value is a difference value! It is effective for / ⁇ cases.
- t can be an evaluation value.
- the evaluation value processing method has several methods. The method described here is not limited to this.
- Rule 1 it is an example of a method of expressing the appearance pattern (meaning) of the data to be described.
- the meaning is a, b, c- ⁇ ', etc. , B, C,... And * can be used as a wild card.
- the processing is performed in consideration of the connection in the meaning of the playback unit that is the predetermined section implied in the rule 1 processing.
- temporal correction using the time correction function that is, temporal weighting processing is performed.
- temporal weighting processing For example, in the above rule 1, if the evaluation value of meaning a is 70 and the evaluation value of meaning b is 80, ( The evaluation value g of ab) is
- the weighting of the time correction function is, for example, that (ab) described above can be detected at a certain time t, the evaluation value is g, and the time correction coefficient (weighting coefficient) at t is w.
- the evaluation value is, for example, that (ab) described above can be detected at a certain time t, the evaluation value is g, and the time correction coefficient (weighting coefficient) at t is w.
- the evaluation value is, for example, that (ab) described above can be detected at a certain time t, the evaluation value is g, and the time correction coefficient (weighting coefficient) at t is w.
- the time correction function describes the change point (information data of the change point coordinate system) in the rule file at the specified description location of rule 2 according to the specified description rule.
- This time correction function can be used to perform summary time correction in a predetermined program genre in the rule file.
- the first half or the second half of the broadcast time has been reproduced mainly depending on a predetermined broadcast program.
- the time in some cases the time in some cases for a given playback section for digest playback (digest playback) for a given playback section for digest playback (digest playback) ) Can be weighted.
- FIG. 8A to FIG. 81 show an example of a time correction function for performing the time weighting described above.
- Fig. 8A shows a flat characteristic in which time is not weighted for a given summary playback section.
- FIG. 8B shows a case where weighting is performed to increase the weight of playback as the importance in summary playback by comparing the first half with the second half within a predetermined section.
- FIG. 8C shows a case where weighting is performed to increase the weight of reproduction as the importance in summary reproduction by comparing the latter half with the first half within a predetermined section.
- FIG. 8D shows a case where weighting is performed to increase the weight of reproduction as importance in summary playback by comparing the first half and the latter half with the middle in a predetermined section. In this section, the middle part is compared with the first half and the latter half, and weighting is performed to increase the weight of reproduction as the importance in summary reproduction.
- Figure 8F is like connecting the two correction functions of different shapes shown in Figure 8D.
- the first half, the first half and the middle, the middle, the middle and the second half, and the second half are weighted. In addition, each weight is made different.
- Fig. 8G is like connecting two correction functions of different shapes as shown in Fig. 8E.
- the first half, the first half and the middle, the middle, the middle and the second half, and the second half are weighted. In addition, each weight is made different.
- FIG. 8H shows the combination functions shown in FIGS. 8C and 8D
- FIG. 81 shows the combination functions shown in FIGS. 8D and 8B.
- Figure 9 shows the general time correction function.
- the coordinates of the start point, change point, and end point are PO (ts, s3), Pl (tl, s3), ..., Pe, respectively. (te, sO).
- the y component of the coordinate represents weighting, here, for convenience, the maximum value is assumed to be 100, the minimum value is assumed to be 0, and a value between 0 and L00 is assumed, and the X coordinate is the position information.
- the meaning of the reproduction unit (PU) can be set from feature data obtained by a predetermined feature extraction process.
- Each segment (shot) is composed of individual frames.
- a scene break is a scene change.
- a segment may be a group of similar images or scenes of similar image (video) characteristics for each scene.
- Segments and scenes with different meanings can be regarded as a video structure that composes the program together.
- the pitcher pitches, the batter hits, and the batter strikes the “pitcher image scene”, “batter image scene”, “batter striker image” It can capture the connection between image scenes, each of which has a meaning called a “scene”.
- image feature data and audio feature data are processed for each PU described above in a predetermined program (program), and the meaning of the PU is set according to the feature data.
- the caster In the case of a scene that reads out a youth item (news program headline), the scene (image) is characterized by one or two person features, telop (Tip feature), and voice features.
- Tip feature telop
- voice features When there are audio and news programs, there are several scenes that read news in the -youth program, so there are several scenes that are similar to the -youth reading scene.
- a feature that is, a specific scene ID appears more frequently.
- the meaning of the PU can be set according to the person feature, voice feature, telop feature, similar image feature, and other predetermined feature data.
- a PU connection with a predetermined meaning is assumed as in the example of the baseball program described above. That is, it can be a predetermined connection between PUs having predetermined feature data or characteristic data.
- Figure 11 shows the connection relationships of PUs that have the above-mentioned predetermined meaning, that is, for which the predetermined meaning is set.
- the predetermined meanings a to d are set in a certain program (weft thread), and in a section 1 3 1; (11) to 1 3 1; (11+ 2)
- the connection relationship shows that the meaning a of PU (n), the meaning of PU (n + l) b, and the meaning c of PU (n + 2) c are the most natural connections.
- connection relationship can be a character sequence that defines the meaning of abc. If this a be sequence is a key frame, search for abc in a program (program), The setting process can be performed using the first and last or the vicinity thereof as a predetermined set point.
- the playback units in a certain section are “throwing”, “hit”, “no meaning”, “score”, and “no meaning” Except for ⁇ , '' PUs determined to have three meanings, ⁇ throw '', ⁇ hit '' and ⁇ score '' are combined into one It is possible to assume a set of predetermined PUs such as “ball, hit, score”.
- PUs with “no meaning” can be included even if they are determined to be meaningless.
- the above four PUs are grouped into one, and the given PU of “throw, hit, meaningless, score” Can be organized.
- no meaning is taken as an example because the predetermined meaning processing is performed from the predetermined meanings in the predetermined evaluation data from the predetermined feature data in the processing of rule 1 described above. This is because it is possible to assume a case where a certain meaning cannot be given based on predetermined signal processing.
- connection of aabb that is, the connection power of “announcer scene”, “announcer scene”, “site scene”, “site scene” is reasonable and reasonable.
- Figure 12B shows the sumo program described above.
- FIG. 13A and 13B show the case where the above-described program genre is a use program, and as shown in FIG. 13B, the reference pattern (reference character sequence) is “aabb” described above as shown in FIG. 13A.
- the section “aabb” is searched for in the predetermined program recording section, and the sections Al and A2 match “aabb”, indicating that the search has been completed.
- the first positions pl and p3 and the last positions p2 and p4 of the searched “aabb” section are set as predetermined setting positions, and the playlist chapter data described later is used.
- a predetermined process is performed as (position information data). For example, in the summary playback mode, the playback control process is performed so that the set positions pl to p2 and p3 to p4 are played.
- predetermined time point setting predetermined position setting
- the predetermined processing is performed with each of the time points of pl, p2, p3, and p4, or a position in the vicinity of each point as the set position.
- the predetermined feature data force has the meaning of the predetermined PU, and if it has the predetermined meaning, the PU is set to the PU, and the meaning of the PU force is set by determining the meaning. Assuming a connection relationship, processing can be performed assuming a predetermined number of PU connections or a set of a predetermined number of PUs according to a predetermined meaning.
- Rule 2 processing shown in Fig. 6B it is an example of a method for expressing the appearance pattern (meaning) of the data to be described, and the meaning is a force such as a, b, c--' You can use A, B, C, ... as a negation, or * as a wildcard.
- A is other than the “announcer scene”
- b is “Scene on site” t ⁇ will be detected, and two “scenes on site” will be detected in addition to “announcer scene”.
- the following processing is an example of the evaluation value calculation method.
- the reproduction unit group is (abc)
- the detection ratio (value) of abc and the weighting coefficient may be as shown in Table 5 below according to the above equation (1).
- the factor multiplied by 100 is the force that takes into account the percentage (%).
- the scale of the evaluation value may be determined according to a predetermined evaluation process and without any problem in a predetermined calculation process.
- the above (aabb) may be Gal and a connection such as (GalGal) may be used.
- processing similar to rule 1 is performed for the evaluation value of Gal.
- an evaluation value calculation method in this case for example, the average of the sums of the evaluation values of the reproduction units of each meaning or the average of the products of the evaluation values of the reproduction units of the respective meanings can be obtained.
- the evaluation value of Gal is the sum of
- rule 3 processing Normally, the processing up to rule 2 is sufficient as shown in FIG. 15A. However, when feature data is provided for a plurality of programs, for example, when time weighting processing is performed for each program. Furthermore, as rule processing, rule 3 processing is provided as shown in FIG. 15A.
- Fig. 6C shows an example of weighting and time correction for news programs (news) and sports programs (sports).
- the news program is weighted 100% and the start point Ps (ts, s4), change point Pl (tl, s4), end point Pe (te, s3) are used as the time correction function.
- 70% weighting is performed and corrections are made with a start point Ps (ts, s4), a change point Pl (tl, s4), and an end point Pe (te, s3) as time correction functions.
- each scene is subjected to some meaning processing based on various predetermined feature data by the rule 1 processing.
- an evaluation value is set to each scene given by rule 2 by a predetermined process as shown in FIG. 16B.
- a scene (image) having the highest evaluation value is selected from the above scenes, and a scene with a high evaluation value is as close to tl as possible. Select the force and set the position information to play the selected section.
- the set position information is stored in a predetermined data memory, and when performing reproduction control, the position information is read and reproduction of a predetermined section is performed.
- a predetermined summary reproduction (digest reproduction) is performed by sequentially reproducing each section (skip reproduction).
- the PU section is selected so that the evaluation value is as large as possible and the predetermined PU section is selected, and as close as possible to the predetermined playback time.
- a predetermined PU interval is selected.
- a predetermined position (chapter) is set at the beginning (or its vicinity) of a section with a high evaluation value and the end (or its vicinity) of a section with a high evaluation value. Therefore, it can be used to perform predetermined operations such as editing processing of the section, pause playback pause processing, repeated playback processing, and the like.
- the recorded video and audio data is broadcast program data, and is subjected to predetermined band compression processing by MPEG (Moving Picture Export Group).
- MPEG Motion Picture Export Group
- wavelet transform, fractal analysis signal processing, and the like may be used as other band compression signal processing.
- the DCT coefficient of image data corresponds to the force analysis coefficient in multi-resolution analysis in the case of wavelet transform, and the same signal processing can be performed.
- FIG. 17 shows an example of the overall block configuration of the recording / reproducing apparatus 30 to which the present invention is applied.
- a predetermined broadcast program is received by the receiving antenna system 1 and the receiving system 2, and the audio signal is converted into a predetermined AZD conversion at a predetermined sampling frequency and a predetermined number of quantization bits by the audio AZD conversion processing system 3.
- Signal processing is performed and then input to the speech encoder processing system 4.
- signal processing is performed by a predetermined band compression method such as MPEG audio or AC 3 audio (Dolby AC 3 or Audio Code number 3).
- the video signal is subjected to predetermined AZD conversion signal processing at a predetermined sampling frequency and a predetermined number of quantization bits in the video AZD conversion processing system 8, and then input to the image encoder processing system 9.
- the image encoder processing system 9 performs signal processing using a predetermined band compression method such as MPEG video or wavelet transform.
- Audio data and image data processed by the audio encoder processing system 4 and the image encoder processing system 9 are input to the recording processing system 6 via the multiplexing processing system 5.
- a part of the signal input to the audio encoder processing system 4 or a part of the signal in the signal processing process in the predetermined encoder signal processing is special.
- the force input to the feature extraction processing system 10 from the speech encoder processing system 4 is applied to the speech encoder processing system 4. It may be input to the feature extraction processing system 10 at the same time.
- a part of the signal input to the video encoder processing system 9 or a part of the signal in the signal processing process in the predetermined encoder signal processing is the feature extraction processing system 10. Is input.
- a signal is input from the video encoder processing system 9 to the feature extraction processing system 10 as a part of the signal input to the video encoder processing system 9. Make sure that both are input to the processing system 9 and input to the feature extraction processing system 10.
- the feature data is sequentially detected for each predetermined section, and is recorded in a predetermined recording area of the predetermined recording medium 7 together with the image / audio data subjected to the predetermined encoder processing.
- the generation system 19 performs predetermined signal processing.
- the generation of playlist data and chapter data can be performed by the following signal processing process (processing a or processing b).
- playlist data generation processing can be performed to determine where the key frame corresponding to the predetermined summary playback time td in the time length t is. That is, the feature data processed in the time length t is accumulated (stored or recorded) in a predetermined memory area of the memory system or system controller system.
- the playlist data generation process When the playlist data generation process is completed, it is ready to perform a predetermined summary playback operation, and a predetermined summary playback (digest playback) can be performed using this playlist data.
- playlist data Since the playlist data has already been generated for the predetermined feature data, it is possible to perform signal processing so that playlist data is no longer generated, in which case it is deleted. Playlist data If the data is generated again, such as by correcting the feature, the feature data can be recorded and left as is!
- the feature data is stored in the playlist / chapter generation processing system 19 after the feature data of the predetermined section is accumulated via the system controller system 20, and the playlist data for predetermined summary playback (digest playback) is generated.
- the generated playlist data is recorded in a predetermined recording area of the recording medium 7 after being subjected to a predetermined recording process in the recording processing system 6.
- the playlist data is composed of a data force that is a pair of playback start point information and playback end point information for each predetermined playback section for skip playback of a predetermined recorded section, for example, for each predetermined section.
- This consists of data such as the playback start frame number and playback end frame number.
- playlist data is used for the process of performing summary playback (digest playback) by skipping a predetermined required section in the recorded program, in addition to the frame data as described above, It may be time code data or time stamp data such as MPEG PTS (Presentation Time Stamp) or DTS (Decode Time Stamp).
- Playlist data is recorded in a recording mode for recording image / audio information data such as a broadcast program as described above.
- the playlist data is set in a playback mode described later.
- FIG. 17 for example, when an image or audio data that has already been subjected to a predetermined encoding process such as MPEG is recorded, the audio encoder processing system 4 is used.
- the audio encoder processing system 4 is used.
- the system controller system 20 it is detected by the system controller system 20 whether the digital image and audio data are directly input and recorded, and whether the analog signal is input by the receiving system 2 and recorded after a predetermined encoding process.
- the analog input system or the digital input system can be set by a user's predetermined operation via the user input IZF system 21.
- the signal from the audio encoder processing system 4 or the audio AZD conversion processing system 3, the video encoder processing system 9 or the image AZD conversion processing system 8 and the digital image and audio data subjected to the predetermined encoding process are directly It can be automatically detected by the system controller system 20.
- predetermined encoded digital data is detected and no data is detected by the audio encoder system 4 or the audio A ZD conversion processing system 3, the video encoder processing system 9 or the image AZD conversion processing system 8, the digital image subjected to the predetermined encoding process It can be determined that voice data is being input.
- video encoder processing system 9 or image AZD conversion processing system 8 is detected by the system controller system 20 without detecting the predetermined encoded digital data Can be judged as analog input.
- the predetermined recording process may be performed with the analog input signal from the receiving system 2 as the initial setting (default setting).
- the DCT processing performed for normal recording processing can be used as the feature extraction processing when predetermined encoding processing is performed.
- the subband processing performed for normal recording processing is also used as the feature extraction processing. be able to.
- the feature extraction process is performed after the recording is completed as necessary.
- the feature extraction processing is performed after the recording is completed, even in the case of the analog input described above, it may be automatically performed when the predetermined recording is completed depending on the load of the signal processing system.
- the feature extraction process can also be performed by software processing, so depending on the performance of the system controller system, it cannot be performed simultaneously with each predetermined signal processing in the recording mode. Force to do.
- the system controller system 20 can be composed of a CPU, DSP (digital signal processor), and other various processors. However, the higher the performance, the more expensive the system controller system 20 is based on the processing capability as described above. It may be determined whether the processing is performed simultaneously with the recording processing or after completion.
- the predetermined feature extraction processing is performed after the end of a predetermined timer recording operation or at night when it can be assumed that the user is not normally operating the device.
- the time at which the apparatus is operating is stored by a predetermined memory means in the system controller system 20, and a predetermined learning is performed.
- a time for feature extraction processing may be automatically set as appropriate by the learning processing.
- the predetermined feature extraction processing can be performed while the time is not being operated.
- the place in the middle of the processing is stored in a predetermined memory means in the system controller system 20, and it is detected that the device is not operating normally, such as recording and playback. Then, if it is determined that there is enough time for processing, it is recommended to perform predetermined signal processing from the middle of the process.
- predetermined data recorded with predetermined video / audio data, feature data, etc. is reproduced from the recording medium 7 and is reproduced by the reproduction processing system 12 for predetermined reproduction. Processing is performed.
- Reproduced predetermined data is separated into predetermined data by a reproduction data separation processing system 13, and audio data is input to the audio decoding processing system 14, and a signal processing method in which band compression signal processing is performed at the time of recording. Is then input to the audio DZA processing system 15 for DZA conversion processing and then output as an audio signal.
- the image (video) data subjected to the predetermined classification processing is subjected to a predetermined decoding process corresponding to the signal processing method in which the band compression signal processing is performed at the time of recording in the video decoding processing system 16 and then the video DZA processing system 17
- the DZA conversion process is performed and output as a video signal.
- the signal processing method differs depending on whether or not feature data and playlist data are recorded on the recording medium together with the image and sound data.
- playlist data playlist data file
- chapter data corresponding to the cases of FIGS. 18A and 18B
- playlist data recording medium data recording medium
- An explanation will be given of a case where the recorded images can be played back in the summary playback mode, or the predetermined chapter images can be displayed as thumbnails in the chapter display mode.
- the playback data separation processing system 13 When data is separated and feature data, or parameter data, playlist data, chapter data, etc. are recorded, predetermined separated feature data, predetermined parameter data, predetermined playlist data, chapter data, etc. Is input to the system controller system 20.
- the playback data separation processing system 13 cannot separate feature data, parameter data, playlist data, and chapter data, the above data is not input to the system controller system 20, so playback is performed.
- the data separation processing system 13 and the system controller system 20 perform processing for determining whether or not the feature data, playlist data, predetermined chapter data, parameter data, etc. are recorded on the predetermined recording medium 7.
- the playlist data is composed of reproduction start information data and reproduction end information data of some predetermined reproduction sections in order to perform predetermined summary reproduction.
- the chapter data is the beginning or the vicinity of the predetermined feature section, the end or the vicinity of the predetermined feature section, the beginning or the vicinity of the section other than the feature section connected to the feature section, or the end of the section other than the feature section. Or the position information power in the vicinity of it.
- the system controller system 20 performs summary playback (digest playback) by performing skip playback according to the skip playback start data information and skip playback end data information of the play list data detected for playback.
- the image at or near the chapter point is stored in a predetermined sum by the specified chapter data.
- a predetermined display process is performed as a nail image by the display processing system 27, and a predetermined image is displayed.
- playlist data playlist data file
- chapter data corresponding to the cases shown in FIGS. 18C and 18D cannot be reproduced, that is, playlist data and chapter data are recorded (stored) on a recording medium or a storage medium.
- a series of chapter related processing such as displaying the thumbnail time point in the predetermined chapter mode and displaying the thumbnail and playing the chapter will be described.
- the recording media 25 is DVD software and the recording media processing system 26 and the playback processing system 12 are used to reproduce the video and audio data from other recording media.
- the recording media processing system 26 and the playback processing system 12 are used to reproduce the video and audio data from other recording media.
- the processing described here is applicable.
- playlist data or chapter data has not been generated and playback detection cannot be performed, or if it is desired to regenerate playlist data or chapter data that has been detected for playback, summary playback is performed from the specified feature data and parameter data that have been detected for playback. Play list data and predetermined chapter related mode chapter data can be generated.
- the display processing system 27 performs a predetermined display indicating that there is no playlist data as shown in FIG. Also good.
- the generated playlist data is input to the system controller system 20.
- the system controller system 20 controls the playback control system 18 so that a predetermined playback section is sequentially played back (skip playback) based on the playlist data in accordance with a predetermined summary playback time input by the user.
- the reproduction of the recording medium 7 is controlled by this control.
- the generated chapter data is input to the system controller system 20.
- the system controller system 20 responds to a predetermined chapter-related operation mode by user input. Based on the above chapter data, playback is performed so that operations related to the specified chapter can be performed, such as image thumbnail display at the specified chapter point, editing processing such as cutting and connecting of the chapter points, skip playback of the chapter points selected by the user, etc.
- the control system 18 is controlled, and the playback control of the recording medium 7 is performed by the control, and the display processing system 27 is controlled through the system controller system 20.
- an external recording medium such as a DVD
- the reproduction control system 18 uses the recording medium processing system. 26 is controlled to perform the predetermined summary reproduction process as described above.
- the reproduction control system 18 controls the recording medium processing system 26 to perform the predetermined signal processing as described above.
- the feature data power is also generated as playlist data and chapter data has been described.
- the feature data May not play.
- the audio / video data is played back from the recording medium A7 in the summary playback mode
- the data played back by the playback processing system 12 is input to the playback data separation processing system 13 and a predetermined band compression method is used during the separated recording.
- the image data and audio data processed in the above are input to the feature extraction processing system 10, and the DCT DC coefficient, AC coefficient, motion vector (motion vector) that is the image characteristic data, etc.
- Various predetermined characteristic data detection processes are performed.
- Sarako the above-mentioned various image / sound characteristic data and predetermined parameters.
- Data predetermined telop feature data (telop section judgment data), person feature data and other image feature data (image feature section judgment data), speaker voice feature data (speaker voice judgment data), applause cheer feature
- Various feature extraction processing of data is performed.
- the above-mentioned various image feature data and audio feature data are input to the system controller system 20, and it is determined that the feature extraction processing is completed when the predetermined feature extraction processing is completed for all of the predetermined program or the predetermined image audio section. .
- a signal indicating that the predetermined signal processing is completed is input from the system controller system 20 to the display processing system 27, and a predetermined display as shown in FIG. Let ’s do it.
- the above-mentioned feature data is stored in the memory system 11 for each predetermined feature extraction section, and when the processing of all the above-mentioned predetermined feature data is completed, it is input to the playlist / chapter generation processing system 19 and the predetermined playlist data Alternatively, chapter data is generated.
- the feature extraction processing data of a predetermined section may be sequentially input from the feature extraction processing system 10 directly to the playlist's chapter generation processing system 19. As described above, all predetermined sections and predetermined broadcast programs may be input.
- the playlist / chapter generation processing system 19 may perform the predetermined playlist data or chapter data generation processing as described above by a predetermined signal from the system controller system 20.
- the processed feature data from the feature extraction processing system may be subjected to signal processing so as to be input to the playlist chapter generation processing system 19 via the system controller system 20.
- a signal indicating that the predetermined processing has been completed is input to the system controller system 20 at a desired summary time.
- the corresponding summary reproduction or the predetermined chapter related operation using the predetermined chapter data can be performed.
- playlist data or chapter data can be generated as shown in FIG.
- the display processing system 27 is made to display a predetermined display indicating that this is the case, or to display a summary playback mode, a chapter-related predetermined operation mode, or the like.
- the summary playback time desired by the user such as the power to play it in 30 minutes, the playback in 20 minutes, etc. Therefore, it is necessary to generate playlist data corresponding to several summarization times according to the total time length of all sections extracted from image and audio data such as recorded broadcast programs. Can think.
- each playlist data is generated for summary playback of 40 minutes, 30 minutes, and 20 minutes.
- a summary playback operation corresponding to a predetermined summary time can be immediately performed.
- the recording medium processing system 26 detects the recording medium 25, the reproduction processing system 12 processes the reproduction signal, and the reproduction data separation processing system. In step 13, predetermined image / audio data is separated. Subsequent signal processing is the same as in the case of the recording medium 7 described above, and is therefore omitted.
- control program for executing the above-described series of processing is incorporated in dedicated hardware and can execute various functions by installing a computer or various programs. Installed from a recording medium on a personal computer.
- This recording medium is distributed to provide a program to the user separately from a computer that is not only a hard disk in which a control program is recorded, and is a magnetic disk, optical disk, magneto-optical disk, or semiconductor on which the program is recorded. Consists of package media consisting of memory.
- FIG. 20 will be described as another example of the recording / reproducing apparatus 30 shown in FIG.
- the recording / reproducing apparatus 30A shown in FIG. 20 differs from the recording / reproducing apparatus 30 described above in that a series of signal processing for performing feature extraction processing in the recording mode is performed in software in the system controller system 20.
- predetermined software is downloaded by the network system 24, and feature extraction processing, playlist processing (chapter generation processing (reproduction section, reproduction time position) by software processing as described below. Information generation processing))) is performed.
- the present invention can be applied in software after a while. If the time is not in time, the design and manufacturing side can provide the user with both a system with a simple configuration to which the present invention is not applied and a system to which the present invention is applied.
- the present invention can be applied by software processing, so that there is an advantage that functions can be added later.
- the user When installing the present invention by downloading software, the user connects to a predetermined Internet site via a network system 24 using a predetermined operation system (such as a remote control 22), and the present invention can be operated by a predetermined operation system. Download the software.
- a predetermined operation system such as a remote control 22
- the downloaded software of the present invention is subjected to predetermined decompression processing, installation processing, and the like in the system controller system 20, and will be described later, including feature extraction processing, playlist processing, chapter processing, etc. Equipped with processing function.
- the predetermined feature extraction process described above can be performed simultaneously with the predetermined recording process.
- the memory system 11 described above can also use a predetermined data storage memory provided in the system controller system 20.
- band compression of a predetermined image / sound is performed as the predetermined recording process.
- an MPU or CPU or DSP (digital 'signal processor) having a predetermined performance as described above can be used, and the same MPU or CPU or DSP performing this band compression processing is used.
- Predetermined feature extraction processing, playlist generation processing, and the like can be performed.
- the recording / reproducing apparatus 30A shown in FIG. 20 differs from the above-described recording / reproducing apparatus 30 in that a series of signal processing is performed in the system controller system 20 when the feature data cannot be detected and the feature extraction process is performed in the reproduction mode. To do it.
- a microprocessor MPU or CPU
- the predetermined feature extraction process described above can be performed simultaneously with the predetermined recording process.
- the memory system 11 described above can also use a predetermined data storage memory provided in the system controller system 20.
- the audio system feature extraction processing system As shown in FIG. 21, it is input to the MPEG video / audio stream decoder stream separation system 100, and the separated audio data is input to the audio data decoding system 101 for predetermined decoding processing. Is done.
- the decoded audio data (audio signal) is input to the level processing system 102, the data counter system 103, and the data buffer system 104.
- the level processing system 102 calculates the average power (or average level) Pav of a predetermined section of the audio data.
- the data is converted into an absolute value, and the voice data integration processing system 105 performs the integration processing until the data counter system 103 measures a predetermined number of sample data.
- the average power Pav can be obtained by the calculation of the following equation (32) with the value (level) of the audio data as Ad (n).
- Olsec (10msec) to lsec can be considered as the predetermined interval for calculating the average level.
- Fs 48KHz
- the integration calculation line of 480 to 48000 samples ! the number of samples Perform average processing with Sm to maintain the average level (average par) Pav.
- the data Pav output from the sound data integration processing system 105 is input to the determination processing system 106, where it is compared with the predetermined threshold value Ath set in the L threshold value setting system 107, and silence determination processing is performed.
- Ath fluctuates according to the average level of the predetermined voice interval in addition to the force fixed value AthO which can be set as the fixed value AthO. It is also possible to set the threshold value Athm.
- n the interval where processing is currently considered
- Pav (n—k) of the interval (n—k) before that As shown in the following equation (33) It is possible to make it.
- Athm (Pav (n- l) + Pav (n- 2)) / m (34)
- m is set in the range of about 2-20.
- the predetermined audio data accumulated in the data buffer system 104 is input to the frequency analysis processing system 108, and a predetermined frequency analysis process is performed.
- FFT Fast Fourier Transform
- the predetermined number of analysis sample data of the data from the data buffer system 104 is, for example, 512, 1024, 2048, etc.
- Predetermined analysis processing is performed with the number of samples.
- a signal (data) from the frequency analysis processing system 108 is input to the determination processing system 109, and a predetermined determination process is performed.
- the music (musical sound) discrimination process can be performed with the continuity of the spectrum peak in the predetermined frequency band.
- Japanese Unexamined Patent Application Publication No. 2002-116784 discloses such techniques.
- the waveform shows a certain steep rising or falling interval, and the predetermined rising or falling interval is detected.
- predetermined signal processing can be performed.
- the predetermined determination processing is performed in the baseband domain. It is possible to use the method to perform (signal analysis and judgment processing in the time domain).
- FIG. 22 shows a configuration example of an audio system feature extraction processing system in the case where signal attribute analysis is performed in the compression band without decoding audio signals (audio data).
- a data stream subjected to predetermined band compression signal processing for example, image audio data such as MPEG is input to the stream separation system 100 and separated into image data and audio data.
- the audio data is input to the stream data analysis system 110, and signal analysis processing such as a predetermined sampling frequency and the number of quantization bits is performed.
- the predetermined audio data is input to the subband analysis processing system 111.
- Predetermined subband analysis processing is performed in the subband analysis processing system 111, and predetermined signal processing similar to that described in the above equations (32) to (34) is performed on data in the predetermined subband band.
- a predetermined integration process is performed until a predetermined number of sampling data is detected by the data count system 103 after being input to the voice data integration processing system 105, and then a predetermined threshold set by the threshold setting system 107. Based on the value, the judgment processing system 106 performs a predetermined silence judgment process.
- this silence determination process it is possible to use a predetermined data band of approximately 3 KHz or less as a subband band in a band where a lot of energy is collected in consideration of the spectrum of voice data.
- the image data that has undergone the predetermined separation processing in the stream separation system is input to the stream data analysis system 200, and is subjected to predetermined detection such as rate detection, pixel number detection, etc.
- Data analysis is performed, and DCT coefficient processing system 201 performs DCT calculation processing (inverse DCT calculation processing) such as DCT DC coefficient detection and AC coefficient detection.
- DCT calculation processing inverse DCT calculation processing
- Motion vector detection processing is performed.
- the scene change detection processing system 202 for example, it is divided into predetermined screen areas and each area is divided.
- the average value of Y (luminance data), Cb, and Cr (color difference data) of DCT DC coefficient data is calculated for each area, and the difference calculation between frames or the difference calculation between fields is performed for each area and compared with the predetermined threshold value. Then, a predetermined scene change is detected.
- the difference data between frames (or fields) in each region can be detected when there is a scene change that is smaller than a predetermined threshold value.
- the screen division area is, for example, an area that divides the effective screen into 16 as shown in FIG.
- the method of screen division to be calculated is not limited to the case of Fig. 24, but the number of divisions can be increased or decreased, but if it is too small, the accuracy of scene change detection becomes insensitive, and if the number of divisions is large, the accuracy is high. Since it is considered too sharp, an appropriate predetermined number of divisions is set within a range of about 256 (16 X 16) or less.
- the color feature detection processing system 203 can also detect a color feature based on the average value of Y, Cb, and Cr data of a DCT DC coefficient in a predetermined region.
- the predetermined area for example, an area as shown in FIG. 25 can be used.
- the effective screen is divided into four in the horizontal direction to provide detection areas 1 to 4, and four in the vertical direction to provide detection areas 5 to 8.
- Each detection area is given an area ID, and the data of each detection area is identified by the area ID.
- detection areas 1 to 4 only in the horizontal direction or detection areas 5 to 8 only in the vertical direction are provided.
- a grid-like division method such as 5 ⁇ 5 or 6 ⁇ 6 can be used.
- this color feature is combined with, for example, a voice attribute feature, the probability of a “scene where the effort starts” from “soil scene” + “speech attribute or other (or speaker voice)” is high. Therefore, such a scene section can be set as a key frame section.
- the voice level increases due to the cheering of the audience at the start scene of the approach, or data in a voice frequency band different from the normal state is detected.
- the similar image detection processing system 204 is a process of assigning (adding) (or assigning) a predetermined ID (identification number or identification symbol) to each image (scene) for each similar scene (similar image, similar video). Similar images (scenes) are assigned (assigned) with the same ID.
- a predetermined ID identification number or identification symbol
- Similar images (scenes) are assigned (assigned) with the same ID.
- Japanese Patent Application Laid-Open No. 2002-344872 discloses the technique.
- This process of adding (applying) records the ID in a one-to-one correspondence with the position information (frame number, PTS, recording time, etc.) of the image (scene) or image (scene).
- the position information and ID of the image (scene) have a one-to-one correspondence, and it goes without saying that the image (scene) itself and its position information are also included. Since it corresponds one-to-one, various predetermined operations using ID can be performed, for example, similar image classification such as displaying images with the same ID or skip playback of image scenes with the same ID. it can.
- detection appearance ranks such as first and second detection frequencies can be set.
- the screen is divided into a plurality of parts (for example, 25 parts), the DCT average DC coefficient of the area corresponding to each divided screen area is calculated, and the calculated average DC coefficient is calculated.
- the predetermined vector distance is predetermined and smaller than the value! /
- the image (scene) corresponding to the place is defined as a similar image (similar scene), and the similar image (similar scene) is the same predetermined. This is the process of assigning an ID (scene ID).
- the maximum ID value plus 1 is used as the new ID, and the image (scene) Assigned).
- a processing method such as calculating the appearance frequency of an ID in a predetermined section and detecting the first to second frequencies.
- the first and second appearance frequencies are considered to have a high probability of being able to detect an announcer scene that can be assumed to have a high appearance frequency.
- Fig. 26 shows the outline for explaining the calculation method of the appearance frequency of ID.
- ID1 which is the same ID in four sections, fl-1 to f2, f3 to f4, f5 to f6, f7 to f8, is shown. Detected. That is, a similar scene appears in this section.
- a section with the same ID in a given section is counted as one, and the force with several such sections is calculated.
- the person detection processing system 205 can determine whether a person appears on the screen by dividing the screen area as shown in FIG. 27 and detecting a predetermined specific color in each area.
- the effective screen is divided into 2 regions of 4 ⁇ 4 regions 1 to 4 and 5 regions near the center of the screen.
- the probability that an announcer's face appears in area 5 is high. It is possible.
- the announcer's face may appear in region 1 or region 2. In that case, it can be assumed that a flip or telop appears in Region 2 or Region 1.
- the screen size is set to 720X480.
- Detection condition from luminance signal AC coefficient (Contour detection condition of person, face, etc.) Based on the judgment condition shown in the above formulas (37) and (38)! , Detect data in x and y directions.
- covariance processing is performed from the detected data.
- the heel portion is a detection point, for example, as follows.
- the data is larger than the predetermined threshold number Lth.
- the detection condition of the correctness of the shape is observed.
- data whose difference or ratio is within a predetermined range (0 to Dth or ethl to eth2) is detected.
- calculation is performed on xl (O) and yl (O).
- the shape of the person's face is considered, and the aspect ratio is calculated assuming that the face is approximated by a quadrilateral.
- the object in the region of xl (0) and yl (O) in FIG. 28 has a high probability of a human face.
- the following (Process 5) It is possible to determine the continuity of detected data.
- the detection time continuity (stability of detection) of the above (Process 1) to (Process 4) is determined.
- equation (48) is also used to set the detected value S (N) in picture N.
- an I picture can be used as a picture to be detected.
- N + l, N + 2, and N + 3 it is possible to continuously detect N + l, N + 2, and N + 3 by using any one or some of the detection values of (Process 1) to (Process 3) described above as detection data for picture N. Or Make sure to specify.
- the detected value at frame N is the detected value at frame N
- condition determination may be made by calculating an average value of the detected data of N to (N + 2) pictures.
- the average value of the detected three picture data is AvCol
- condition determination may be made by calculating an average value of the detected data of N to (N + 2) pictures.
- the average value of the detected three picture data is Avxh and Avyh
- condition determination may be made by calculating an average value of the detected data of N to (N + 2) pictures.
- the average value of the detected three picture data is Avxl and Avyl
- the data density ⁇ 1, ie the number of data per unit data point ⁇ 1 is
- the threshold value Mth is the threshold value Mth
- region (1) and region (2) satisfy the condition from equations (81) and (85), and it is determined that the probability that a person has been detected is high.
- one xl (0) is detected in the X direction and one yl (0) is detected in the y direction.
- the average value of yh (y) is yhav and the number of data is m
- the average value is xhav number of data n
- FIG. 36 has a larger data variance value.
- a predetermined threshold value Bth and a threshold value dl, d2 corresponding to the number of detected objects are set for the dispersion value, and the number of detected objects can be detected by determining the following conditions.
- the threshold can be set and determined.
- two xl (0) and xl (l) are detected in the X direction, and two yl (0) and yl (l) are detected in the y direction.
- the number of detected data is the number of detected data
- ⁇ a is a predetermined value !, which is less than the value, so two people are detected in the region specified by xl (0) and (yl (0) + yl (l)) From the equation (109), it can be determined that one person is detected.
- the probability that two persons are detected in the region Rc is low. From the expressions (109) and (115) to (117), a person is eventually detected.
- the number of persons can be detected by the determination process as described above, which is the area specified by xl (0) and yl (0) and the area specified by 1 (1) and 1 (1). It can be carried out.
- person detection can be performed by sequentially determining whether or not a predetermined threshold condition is satisfied in the X direction (0 to 44) and the y direction (0 to 29).
- the size and the position of the detected object are simultaneously divided.
- a person is approximated by a quadrilateral, and the size of the quadrilateral is sequentially changed to determine whether the data in the quadrangular area satisfies a predetermined condition.
- the person can be detected by.
- a quadrangular area of (2 X 2), (3 X 3), and (4 X 4) is set. Move the quadrangular areas with different sizes as described above one by one from the smaller quadrangles in order, and determine the force that the data in the area satisfies the condition. The same processing is performed for a quadrangle of size.
- the telop detection determination processing system 206 detects the average value of the DCT AC coefficient in the screen area as shown in FIG.
- a telop containing character information of a predetermined size within the screen in a predetermined area has a relatively clear outline, and if a telop image appears in any of the areas in FIG. AC coefficient can be detected, and telop detection can be performed.
- an edge detection method can be used in the baseband domain (time domain signal). For example, the edge is detected by the difference between frames of the luminance data of the image. Try to detect.
- multi-resolution analysis is performed by wavelet transform, and the average value of the area corresponding to Fig. 25 is calculated using data in a predetermined multi-resolution area including predetermined high-frequency component data.
- a telop is not limited to a light-colored area for flipping, but is, for example, text information that appears at the bottom of a news video.
- the appearance area is generally a force depending on the program genre. In the case of right side, etc.
- the camera feature determination processing system 209 is a feature relating to camera operations such as zooming, panning, and the like.
- a motion vector ( The motion vector can be used for determination.
- Japanese Unexamined Patent Publication No. 2002-535894 discloses a technique relating to camera characteristics.
- summary playback is an important playback section of several strengths within a predetermined section by predetermined signal processing using each feature data of audio system feature data and video system feature data by predetermined signal processing.
- Keyframe period can be selected (selected), and each period can be skip-played sequentially.
- skip playback For example, when skipping in the middle of a speaker's voice section, even if there is no sense of incongruity when looking on the screen, some users may experience a sense of incongruity when the voice is interrupted. Since a case is assumed, a section below a predetermined sound level (volume) is set as a silent section, and a predetermined time point in that section is set as a skip time candidate.
- volume volume
- a scene change of a video is considered to be a topical break point in broadcast programs, movies, and other video playbacks. Therefore, a scene change point or its vicinity can be used as a skip point candidate.
- processing is performed by setting a predetermined playback unit (hereinafter referred to as playback unit or play unit Play Unit (or PU)) for the sake of convenience.
- playback unit or play unit Play Unit (or PU)
- Predetermined image system feature data and predetermined audio system feature data are processed in the playback unit (PU) set in this way, and predetermined summary playback (digest playback) is performed according to the video, audio feature data, and summary playback time.
- predetermined summary playback digest playback
- a section is set and skip playback is performed in a predetermined summary playback mode, so that a predetermined summary playback is executed.
- the chapter point can be displayed as a thumbnail by predetermined signal processing, and the user can perform operations such as editing while viewing the thumbnail display.
- a near break is defined as a playback unit break.
- the scene change detection point closest to 15 seconds is the scene change detection point.
- the playback unit is separated.
- the playback unit reaches 20 seconds regardless of the audio segment or scene change. Break at that point.
- CM commercial
- the CM detection point is set as the break point of the playback unit.
- the CM section length of a broadcast program is a predetermined time length (usually 15 seconds, 30 seconds, or 60 seconds), and there is a scene change at the CM breakpoint (start and end points).
- CM can be detected as shown in FIG.
- the starting force of the playback unit is 20 seconds, regardless of the scene change detection point. A breakpoint at the time.
- the initial value of the start point of the playback unit is the start time when the program (broadcast program) is recorded.
- a predetermined playback unit corresponding to a predetermined audio feature and a predetermined video feature can be played back.
- FIG. 37 shows a block configuration example of the processing system generated by the playback unit described above and the unitized feature data processing system for inserting feature data into the playback unit, which will be described later.
- Predetermined time setting processes such as summary playback and chapter point setting are set at the start point and end point of the playback unit, so the process is performed in association with the feature data for each playback unit described above.
- processing is performed to reflect each predetermined feature data, audio feature data, and video feature data extracted for each predetermined section based on the playback unit section.
- the silence determination information data is input to the time measurement system 301, the predetermined interval (time length) based on the playback unit processing described above is measured, and the processing output is the playback unit. Input to processing system 302.
- the playback unit processing system 302 also receives scene change judgment information data and CM detection judgment information data, performs signal processing as described in the explanation of each processing method of playback unit processing, and generates a predetermined playback unit. To do.
- the CM detection system 304 performs silent feature detection information data and scene change feature information data. And channel information for determining whether the channel is a program channel is input, and CM detection processing is performed by a predetermined signal processing method as described with reference to FIG.
- the playback unit feature data processing system 303 includes voice feature data such as voice attribute information and silence information, scene change feature, color feature, similar image feature, person feature, and telop feature.
- Each feature data such as a person feature is input, and the feature data is inserted into the playback unit as described later.
- PU feature data files include audio feature data and video (image) feature data.
- This feature data processing is data (data file) as a result of performing processing for inserting the audio system and video system feature data extracted into the playback unit described above, and various feature data for each playback unit. Are recorded on a predetermined recording medium.
- each feature data detected in accordance with a predetermined detection section is recorded on a predetermined recording medium, and then the above-mentioned predetermined section of the playback unit is recorded. Processing is performed on the feature data according to.
- Feature data is obtained by extracting predetermined characteristic data (characteristic signal) from audio signals (audio data) and image (video) signals (image (video) data), and subjecting the extracted signals (data) to predetermined processing.
- characteristic signal characteristic data
- Data characteristic data
- the video (image) signal is the characteristic data of the MPEG stream, the luminance signal (Y signal) in the I picture, the DC signal of the DCT of the color signal (color difference signal) (Cb, Cr signal), the movement of the B or P picture Vector (motion vector) data and DCT AC coefficients are extracted, and the scene change feature (sen feature), camera operation feature (camera feature) ( cam feature), similar image feature (similar scene feature or scene ID feature) (sid feature), telop feature (tip feature), color feature (color feature) ) (col feature), person feature (Person feature), etc.
- the average level of audio signals is calculated approximately every 20 ms as characteristic data processing, and the attribute (type) and average power (average level) of the audio signal in a predetermined section are calculated from the calculated data and a predetermined value. ) And other audio features (seg features).
- speech attributes such as speaker speech, music (musical sound), and cheers in sports programs are assumed.
- the configuration example 1 of the feature data file shown in Fig. 38 shows the audio system feature data, scene change feature (sen feature), camera feature (cam feature), similar scene feature (sid feature), and telop feature (tip feature).
- video feature data such as color feature (col feature), person feature (Person feature), etc. are used as separate feature data files.
- Each feature data file is written in text format data or binary format data.
- these characteristic data are temporarily stored (recorded) on a predetermined recording medium (such as a semiconductor memory) as normal data, and will be described later. It is also conceivable to read out and use it for predetermined processing such as generation of summary list data and generation of predetermined set points (generation of chapter points). The same applies to FIGS. 39 and 40 described below.
- Example 2 shown in Fig. 39 all the audio system feature data described above is collected as a single file in text format or binary format, and all the video system feature data described above is combined into one text format or noinary format. This is an example of collecting files.
- Example 3 shown in FIG. 40 is an example in which all the audio system feature data described above and all the video system feature data described above are collected as one file in text format or binary format.
- the feature data file is stored in the case shown in Example 3 in Fig. 40.
- Example 3 shown in FIG. 40 is the same as Example 2 shown in FIG. 39, in which all audio feature data is described in binary format and all video feature data is described in binary format. Will be.
- the audio feature data processing method in the following description of the feature data file can be applied to the audio feature data shown in FIG. 39
- the video feature data processing method ( The description method can be applied to the video feature data in Example 2 of Fig. 39.
- Figure 41 shows the hierarchical structure of feature data in units of playback units.
- predetermined feature data processing in a predetermined processing unit (reproduction unit).
- the feature data includes feature data header information, program 1 feature data, program 2 feature data, and the like.
- the characteristic data header information includes predetermined data capabilities such as the total recording time of the entire program such as program 1, program 2, recording start time, recording end time, number of programs (number of programs), and other information. It is composed.
- program (program) feature data will be described using program 1 feature data as an example.
- the program 1 feature data includes program 1 information, playback unit 1 information, playback unit 2 information, and the like.
- the program 1 information is composed of predetermined data such as a program recording time, a program start time, an end time, a program genre (program genre), and other information.
- the playback unit 1 information is composed of audio feature data and video feature data.
- the audio system feature data is composed of sequence number information, start / end position information, audio attribute information, feature data, and other information data.
- video feature data is composed of predetermined feature information data such as scene change features, color features, similar image features, person features, telop features, camera features, and the like.
- each feature data such as scene change feature, color feature, similar image feature, person feature, telop feature, power feature, etc.
- the feature data of each item is recorded (written) in a given recording medium in every given section.
- predetermined data processing is performed so that the data is recorded (written) on a predetermined recording medium.
- the predetermined feature data is not written when the threshold data is smaller than the threshold V.
- a predetermined recording (writing) process is performed, and when knowing what number of feature data detectability from the beginning, it can be found from the sequence number information described below. .
- FIG. 43 it includes sequence number information, start / end position information, feature data, and other data.
- sequence number information is information indicating the order in which scene changes occur from the beginning of 0, 1, 2, 3,... And the program (method number thread).
- the start / end position information is information data indicating the start / end positions of the above-described scene changes, and information data such as a frame (field) number, PTS, DTS, and time can be used.
- sequence number information As shown in FIG. 43, there are also powers such as sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.
- sequence number information is 0, 1, 2, 3, ... and the beginning of the program (method number thread). This is information indicating the order of color feature detection.
- the start / end position information is information data indicating the start / end position where the feature detection of each area was performed in the color feature detection in each of the above order, and information data such as a frame (field) number, PTS, DTS, and time are used. Can do.
- the feature data includes, for example, data such as RGB, Y, Cb, and Cr.
- FIG. 43 it consists of sequence number information, frequency information start / end position information, feature data, and other data.
- sequence number information is information indicating the order of similar image feature detection from the beginning of 0, 1, 2, 3,... And its program (method number thread).
- the feature data includes the DCT average DC coefficient of each divided area obtained by dividing the effective screen as described above into a predetermined number of areas (for example, divided into 25).
- sequence number information As shown in FIG. 43, there are also powers such as sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.
- sequence number information is information indicating the order of similar image feature detection from the beginning of 0, 1, 2, 3,... And its program (method number thread).
- sequence number information As shown in FIG. 43, there are also powers such as sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.
- sequence number information is information indicating the order of telop feature detection from the beginning of 0, 1, 2, 3,... And its program (method number thread).
- sequence number information As shown in FIG. 43, there are also powers such as sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.
- sequence number information is information indicating the camera feature detection order from the beginning of 0, 1, 2, 3,... And its program (method number thread).
- PU and feature data can be used in the same way as for program 1 described above when recording other programs 2 and 3, etc. .
- desired summary playback is performed by performing skip playback processing on the above-described predetermined playback section in units of PUs.
- This file contains predetermined data indicating which of the PUs or PU joints (PU aggregates or PU concatenations) specified according to the above characteristic data is to be selected for playback processing. It is data in which information is described according to a predetermined format.
- the data may be temporarily stored in a predetermined memory means. Conceivable.
- FIGS. 44A and 44B An example of a playlist file is shown in FIGS. 44A and 44B.
- the vertical data series (a) in Example 1 shown in Fig. 44A is the data of the start position information of the playback section.
- PTS Presentation
- time time
- stream compressed video and audio data
- Time'stamp time'stamp
- DTS decoding time'stamp
- the vertical data series of (b) in Example 1 shown in FIG. 44A is data of the end position information of the playback section, and corresponds to the data of (a) of Example 1, corresponding to the data of Example (a), frame number, time (time), stream ( Pressure Reduced image / sound data) Predetermined information data such as PTS (Presentation 'Time' Stamp) or DTS (Decoding Time'Stamp) by Tatsumi et al.
- PTS Presentation 'Time' Stamp
- DTS Decoding Time'Stamp
- the vertical data series (c) in Example 1 shown in FIG. 44A indicates the importance of the PU (reproduction unit) or reproduction unit group (PU group).
- Example 1 shown in FIG. 44A (d) vertical data series is character data having a meaning defined or set by the summary rule.
- Example 2 shown in Fig. 44B describes semantic characters and evaluation values (importance) for all PU sections, and identification data of "1" and "0" to indicate predetermined points in time such as playback sections and chapter settings. This is an example in the case of providing.
- Example 2 shown in FIG. 44B the first start point 0 and the end point 229 are continuously connected to the next start point 230.
- the vertical data series (e) in Example 2 shown in Fig. 44B is flag information data indicating whether or not summary playback is to be performed. When “1”, playback is performed, and when “0” is played, playback is performed. If not.
- first time point of “1” and the first time point of “0” can be regarded as a predetermined time point set point (chapter point).
- FIG. 45 is an example of an operation flowchart of the present invention, which will be described.
- step S1 it is first determined in step S1 whether the recording mode or the playback mode.
- step S2 it is determined in step S2 whether the playback mode is digest playback (digest playback) or normal playback mode. If the playback mode is normal playback mode, the process proceeds to normal playback processing (P).
- predetermined feature data is recorded on a predetermined recording medium in step S3. Judgment processing is performed to detect whether it has been recorded! / Whether it has been recorded or whether it has been recorded as a predetermined file data in a predetermined recording area of the recording medium.
- predetermined playlist data (data file) is recorded in a predetermined recording area of a predetermined recording medium in step S4! If it is detected that playlist data (playlist file) is detected, the predetermined playlist data is read out in step S5.
- step S8 If it is determined in step S3 that the predetermined feature data is not detected, in step S8, the image / sound data (program, broadcast program) to be summarized and reproduced is read and a predetermined feature extraction process is performed. In S9, it is determined whether the process has been completed. If the process has not been completed, the process returns to step S8 until the process is completed.
- step S9 If it is determined in step S9 that the predetermined feature extraction process has been completed, the process proceeds to step S6, where a predetermined playlist data generation process is performed.
- step S4 If it is determined in step S4 that the predetermined playlist data (file) is not detected, the predetermined feature data recorded or stored in the predetermined recording area of the predetermined recording medium in step S6 is read. Then, the specified playlist data (file) is generated and written to the specified area of the specified recording medium sequentially or after the processing is completed, and all the playlist generation processing is completed in step S7. If the process is not completed, the process returns to step S6 and repeats the process. If it is determined in S7 that all the predetermined playlist data has been generated, the playlist data written in step S5 is read and processed. .
- the sequentially generated playlist data may be sequentially recorded in a predetermined recording area on the same recording medium in which image / audio information data such as the broadcast program is recorded, Alternatively, the information may be written to a recording medium different from the one where the image / audio data is recorded, for example, a predetermined memory means that can be attached and detached.
- data may be sequentially written (stored), and all the playlist data generated after the predetermined playlist data has been generated and processed. The It is also possible to record (memorize) all together.
- the playlist data includes a plurality of playlist data corresponding to the recording time so that the user can select a plurality of summary playback times. You can generate list data.
- the summary playback time is manipulated according to the evaluation value. It is out.
- step S10 the playback time selection mode is set, and in step S11, the user selects the playback time immediately or the summary playback mode is selected and then the playlist data detection process ends and the user sets the playback time within a predetermined time tmod. It is determined whether the selection process has been performed. If the selection process has not been performed, it is determined in step S12 whether the playback stop has been selected by the user. If the playback stop is selected by the user in step S12, the process ends. If not, the process returns to step S10 and repeats the predetermined process.
- step S11 when the user selects the playback time immediately, or when the playback time is not selected within the predetermined time tm od, the process proceeds to the digest playback operation processing in step SI 3.
- the user selects the playback time. If the predetermined time tmod has elapsed without selecting the playback time, the predetermined default playback time (predetermined playback time) tpbO is set.
- the user may be allowed to arbitrarily select the summary playback time, or based on the recorded program recording time and play list data, it can be selected from the set playback time. You may do it.
- the default summary playback time is set according to the recording time, for example, as shown in FIG. You can also.
- the summary playback mode can be set only when the recording time is longer than the predetermined recording time (Trecmin).
- the predetermined recording time Trecmin if the recording time Tree is less than 10 minutes, the time is short.
- the summary playback is not set and only normal playback is set. As an example, if the recording time tree is 60 minutes from FIG. 46, the summary playback time that can be selected by the user is 10, 15, 30, 40 minutes, and the default setting time is 30 minutes.
- the summary by skip playback processing is used. If the total number of sections skipped during playback increases, information will be lost accordingly, and it may be impossible to grasp the playback content.Therefore, the number of selections is reduced so that an appropriate summary time can be selected.
- the recording time is long, the amount of information is large, so the number of selections is increased so that the user can perform effective and effective operations.
- Information such as a list of summary playback times that can be selected by the user and default playback times is stored in a predetermined display means in the recording / playback apparatus to which the present invention is applied, or in a predetermined display means connected to the apparatus, or It is conceivable to display on a predetermined display screen such as liquid crystal on the remote control of the device.
- the chapter setting process can be performed simultaneously with the playlist generation process. As shown in FIG. 44 according to the recording time, a predetermined chapter setting process is automatically performed according to the number of settable chapters. Done.
- predetermined signal processing is performed so that 5 to 40 chapters are set.
- step S13 the power at which the summary playback operation is performed.
- the PU section with the highest evaluation value is sequentially selected with the highest priority, and the section with the smaller evaluation value is sequentially selected compared to the highest priority evaluation value so that it is as close as possible to the selected summary playback time. I will do it.
- step S14 it is determined whether or not the playback operation is finished. If it is finished, the process is finished. If not finished, it is judged whether or not the predetermined program (program) played is finished in step S15. If the process ends and does not end, go to step S16 and restart. Determine whether to change the lifetime.
- step S16 If the playback time is changed in step S16, the process returns to step S10, and the above processing is repeated. If not changed, the process returns to step S13 to repeat the summary playback operation.
- FIG. 1 An example of an operation flowchart in the recording mode is shown in FIG.
- step S1 of the flowchart shown in FIG. 45 it is determined whether the timer recording mode or the normal recording mode is selected in step R1 of the flowchart shown in FIG. 48.
- the normal recording operation is performed.
- step R9 the process proceeds to predetermined recording signal processing, and in step R10, predetermined feature extraction processing is performed from the image / audio data to be subjected to predetermined encoding processing such as MPEG or the encoded image / audio data. Is called.
- the recording signal processing and the feature extraction signal processing can be performed simultaneously.
- the image / audio data to be subjected to the predetermined encoding process is to perform a predetermined feature extraction process using the image / audio data being subjected to the predetermined encoding process.
- a predetermined feature extraction process using the image / audio data being subjected to the predetermined encoding process.
- the DC coefficient data of the DCT signal processing, AC coefficient data, etc. can be extracted, and by performing predetermined signal processing using the predetermined data, each predetermined characteristic described above, such as scene change characteristic detection (cut point characteristic detection), telop characteristic detection, etc. Perform extraction signal processing.
- signal processing such as speaker voice and music (musical tone) detection detection can be performed by using data in a predetermined subband band in predetermined subband signal processing in predetermined band compression signal processing. it can.
- the determination processing can be performed by determining data continuity in a predetermined subband band.
- baseband image / audio data can be used. For example, using a baseband signal of an image, a scene change detection process by frame (or field) difference signal processing or a telop by edge detection by the difference signal. Other predetermined feature extraction signal processing such as feature signal processing can be performed.
- each image and the feature data subjected to the audio feature extraction signal processing are recorded in the same predetermined recording medium on which the image / audio data is recorded, or predetermined data storage means (data recording means) such as a predetermined buffer memory.
- step Rl1 the normal recording mode end force is determined. If not, the process returns to step R9, and the above operation is repeated. If the process is ended, the process proceeds to step R12 to generate playlist data generation processing (or chapter data generation processing). Migrate to
- step R1 If the timer recording mode is set at step R1, the recording start and recording end times are set at step R2, the predetermined operation time is determined at step R3, and if it is not the predetermined time, the operation waits at step R7, If it is determined at R8 that the user has interrupted the timer operation and the timer operation is continued, return to step R3 and repeat the above operation.
- step R8 If the timer operation is canceled in step R8, the process returns to step S1 in Fig. 45 to perform the first operation mode selection process.
- step R3 If it is determined in step R3 that the predetermined recording operation time has come, the recording operation is started, and the same operations as in steps R9 to R11 described above are performed in steps R4 to R6.
- Image and audio feature extraction signal-processed feature data (feature extraction data) is recorded in the same predetermined recording medium on which image / audio data is recorded, or predetermined data storage means (data recording means) such as a predetermined buffer memory. If it is determined in step R6 that the recording end time is reached, the process proceeds to step R12 to perform playlist data generation processing or chapter data generation processing.
- step R12 feature data subjected to various predetermined feature extraction processing (predetermined feature data subjected to feature extraction processing is subjected to predetermined processing, predetermined signal processing, and predetermined determination processing is performed using these data. (Including performed data) is read from a predetermined recording medium, and predetermined playlist data (file) generation processing and chapter data generation processing are performed.
- step R13 The generated playlist data and chapter data are recorded on a predetermined recording medium, subjected to the force determination processing that has been generated in step R13, and if not completed, step R12 Return to and repeat the above processing operations. If it is determined that the processing has been completed in step R13, the operation is terminated.
- the playlist data and the chapter data are sequentially recorded on a predetermined recording medium simultaneously with the data generation process, and in addition to the predetermined broadcast program, program, or predetermined recording section to be processed. After all the generation processes of the predetermined playlist data and chapter data are completed, they may be recorded together on a predetermined recording medium.
- playlist data (chapter) processing is performed in parallel (simultaneously) with feature extraction processing
- predetermined feature extraction is performed simultaneously with recording processing of video / audio information data such as predetermined broadcast programs and programs.
- FIG. 49 shows an example of an operation flowchart for performing predetermined signal processing from the audio segment detection point and the scene change detection point in the case of the PU signal processing described above.
- step P1 the audio data and the predetermined number of sample data of the image data are read out from the predetermined recording medium on which the audio / video information data is recorded, and the scene change detection processing described later is performed.
- the data read in P2 is stored in a data buffer which is a predetermined recording means such as a memory (write processing and recording processing).
- step P3 If it is determined in step P3 that data of a predetermined number of samples has been recorded in the buffer, the process proceeds to step P4. If it is determined that predetermined sample data has not yet been recorded, the process returns to step P2 and the operation is repeated.
- the predetermined number of sample data in Step P2 is about 0.1 second to 1 second.
- the buffer processing corresponding to the number of data corresponding to the predetermined interval is performed. For example, if the sampling frequency is 48KHz, 48000 sample data is recorded in one second, and if it is 0.1 second, 4800 sample data is recorded in the buffer.
- step P4 the audio data is read from the buffer.
- step P5 the audio level is calculated in the predetermined section as described above.
- step P6 the audio level is compared with the predetermined level.
- a silence detection (silence determination) process is performed by determining whether the level is lower than the level.
- step P6 If it is determined in step P6 that the section is a silent section, the information is stored (recorded) in a predetermined memory (buffer) in step P7. If it is determined that the section is not silent and there is sound, the process proceeds to step P8. If the voice buffer processing of the buffer data read in step P1 is completed, the force determination processing is completed. If not completed, the processing returns to step P2 and the above processing is repeated. If the processing ends, the processing proceeds to step P9.
- step P9 the voice segment information data processed in step P8 is read, and in the step P10, the segment processing of the short silent section, the voiced section, the long silent section, and the voiced section described above is performed.
- step PI 1 DCT processing data of image data of a predetermined number of data samples is recorded in a predetermined buffer memory (predetermined data recording means), and in step P12, it is determined whether recording of the predetermined data amount is completed, If it is not the predetermined amount of data, the process returns to step P11 to repeat the writing process to the buffer memory system, and if it is determined in step P12 that the predetermined amount of data has been written, the process proceeds to step P13. .
- predetermined buffer memory predetermined data recording means
- step P13 the predetermined DCT data in which the predetermined buffer memory system power is recorded (written) is read out.
- step P14 predetermined signal processing such as interframe difference is performed, and predetermined scene change detection processing is performed. .
- step P15 whether or not a predetermined scene change has occurred is determined. If it is determined that there has been a scene change, in step P16, a predetermined memory means (data recording means, data buffer means, etc.) Store (write) location information data Process), the process proceeds to step PI 7, and if it is determined in step P15 that there is no scene change, the process proceeds to step P17.
- a predetermined memory means data recording means, data buffer means, etc.
- step P17 Store (write) location information data Process
- step P17 the force determination process is completed when the scene change detection process for the predetermined amount of data in the predetermined data buffer is completed. If not, the process returns to step P11 and the signal processing is repeated, and it is determined that the process is completed in step P17. If so, proceed to Step P18.
- Step P18 the scene change position information recorded (stored) in the predetermined buffer memory means is read, and in Step P19 the scene change detection is detected such that the section that is too short, such as shorter than the predetermined section length, is joined with the front and rear sections. Perform section correction processing.
- step P20 the audio segment position information data and scene change position information data generated and processed in the predetermined section are read out.
- step P21 the audio segment position, audio segment section length, scene change position, scene change section length, etc.
- Predetermined PU information data such as location information and section information for a given PU is generated from the fixed information data.
- step P22 from the PU information processed in step P21, feature data corresponding to the PU section (or feature extraction data, or a signal obtained by performing predetermined signal processing on the feature data) is stored in a predetermined recording medium, Alternatively, a write process is performed on a predetermined data buffer.
- these recording media are separated from the predetermined recording area on the same predetermined recording medium on which the image / audio information data of the predetermined section such as the broadcast program or program to be processed is recorded. Recording (storing and writing processes) on the specified recording medium is not considered.
- Step P23 the force determination process is completed after a series of signal processing such as the above audio segment processing, scene change processing, and PU processing for a predetermined amount of data. If it is determined that the processing has ended, the processing ends, and it is determined that it has not ended If so, return to Step P1 and repeat the above process.
- the image / audio data such as the recorded predetermined broadcast program or program is recorded. Audio data segment processing is performed sequentially for each predetermined section of the data, and then the image scene change detection processing is performed.
- the broadcast program that is the target of the current processing is not the processing of the predetermined section. It is also possible to perform all scene change detection processing after the audio segment processing for all the predetermined sections of the program is completed, and perform predetermined PU processing after all the scene change detection processing is completed. .
- FIG. 50 shows another example of an operation flowchart for performing predetermined signal processing from the audio segment detection point and the scene change detection point in the case of the PU signal processing described above.
- the predetermined voice segment processing as described in step P1 to step P9 in the flowchart shown in FIG. 49 is performed.
- the audio data is obtained by sequentially reading a predetermined data sample amount of data into a predetermined buffer memory.
- step T2 The segment position information data subjected to the audio segment processing in step T2 is recorded in a predetermined memory means (data storage means, data recording means).
- step T3 the broadcast program to be processed now! If it is judged that the predetermined segment processing has been completed for the audio data of all the predetermined sections such as programs, and if it is determined not to end, the process returns to step T1 and the above processing is repeated. Move on to step T4.
- step T4 the predetermined scene change process as described in step P11 to step P18 in the flowchart of FIG. 49 is performed.
- the DCT data of the image is obtained by sequentially reading a predetermined data sample amount of data into a predetermined buffer memory.
- Step T5 Data of the scene change position information that has undergone the predetermined scene change processing in step T5 is recorded in predetermined memory means (data storage means, data recording means).
- step T6 the broadcast program or program that is currently processed is recorded. It is determined whether or not the predetermined scene change processing has been completed for the DCT data of all images in the predetermined section. If it is determined that the processing has not ended, the process returns to step T4 and the above processing is repeated to determine that the processing has ended. If yes, go to Step T7.
- step T7 the predetermined memory means power data of predetermined audio segment position information and data of predetermined scene change position information are read out, and in step T8, predetermined PU processing is performed.
- step T9 it is determined whether or not the predetermined PU processing has been completed over all the predetermined sections of the broadcast program, program, etc. that are currently being processed. If it is determined that the processing has ended, the process is terminated. If it is determined not to end, return to ⁇ 7 and repeat the above operation.
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020077003225A KR101385087B1 (ko) | 2004-08-10 | 2005-08-09 | 정보 신호 처리 방법, 정보 신호 처리 장치 및 컴퓨터프로그램 기록 매체 |
US11/659,792 US8634699B2 (en) | 2004-08-10 | 2005-08-09 | Information signal processing method and apparatus, and computer program product |
CN200580030347XA CN101053252B (zh) | 2004-08-10 | 2005-08-09 | 信息信号处理方法和设备 |
EP05770540A EP1784012A4 (en) | 2004-08-10 | 2005-08-09 | INFORMATION SIGNAL PROCESSING METHOD, INFORMATION SIGNAL PROCESSING DEVICE AND COMPUTER PROGRAM RECORDING MEDIUM |
JP2006531663A JP4935355B2 (ja) | 2004-08-10 | 2005-08-09 | 情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-233943 | 2004-08-10 | ||
JP2004233943 | 2004-08-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006016590A1 true WO2006016590A1 (ja) | 2006-02-16 |
Family
ID=35839359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/014597 WO2006016590A1 (ja) | 2004-08-10 | 2005-08-09 | 情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 |
Country Status (6)
Country | Link |
---|---|
US (1) | US8634699B2 (ja) |
EP (1) | EP1784012A4 (ja) |
JP (1) | JP4935355B2 (ja) |
KR (2) | KR20120068050A (ja) |
CN (1) | CN101053252B (ja) |
WO (1) | WO2006016590A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1898392A2 (en) * | 2006-09-07 | 2008-03-12 | Sony Corporation | Reproduction apparatus, reproduction method and reproduction program |
CN101766024A (zh) * | 2007-07-27 | 2010-06-30 | 思科技术公司 | 数字视频记录器合作和相似媒体段确定 |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4698453B2 (ja) * | 2006-02-28 | 2011-06-08 | 三洋電機株式会社 | コマーシャル検出装置、映像再生装置 |
JP4428424B2 (ja) * | 2007-08-20 | 2010-03-10 | ソニー株式会社 | 情報処理装置、情報処理方法、プログラムおよび記録媒体 |
KR101435140B1 (ko) * | 2007-10-16 | 2014-09-02 | 삼성전자 주식회사 | 영상 표시 장치 및 방법 |
JP4577412B2 (ja) * | 2008-06-20 | 2010-11-10 | ソニー株式会社 | 情報処理装置、情報処理方法、情報処理プログラム |
US8345750B2 (en) * | 2009-09-02 | 2013-01-01 | Sony Computer Entertainment Inc. | Scene change detection |
KR20110110434A (ko) * | 2010-04-01 | 2011-10-07 | 삼성전자주식회사 | 저전력 오디오 재생장치 및 방법 |
JP5634111B2 (ja) * | 2010-04-28 | 2014-12-03 | キヤノン株式会社 | 映像編集装置、映像編集方法及びプログラム |
JP5714297B2 (ja) * | 2010-10-29 | 2015-05-07 | 株式会社キーエンス | 画像処理装置、画像処理方法および画像処理プログラム |
US9558165B1 (en) * | 2011-08-19 | 2017-01-31 | Emicen Corp. | Method and system for data mining of short message streams |
CN102999621B (zh) * | 2012-11-29 | 2016-01-27 | 广东欧珀移动通信有限公司 | 一种外观主题的设置方法及装置 |
CN103594103B (zh) * | 2013-11-15 | 2017-04-05 | 腾讯科技(成都)有限公司 | 音频处理方法及相关装置 |
CN104185066B (zh) * | 2014-03-04 | 2017-05-31 | 无锡天脉聚源传媒科技有限公司 | 一种自动校验电子节目菜单的方法及装置 |
US10002641B1 (en) * | 2016-10-17 | 2018-06-19 | Gopro, Inc. | Systems and methods for determining highlight segment sets |
CN108174138B (zh) * | 2018-01-02 | 2021-02-19 | 上海闻泰电子科技有限公司 | 视频拍摄方法、语音采集设备及视频拍摄系统 |
KR102650138B1 (ko) * | 2018-12-14 | 2024-03-22 | 삼성전자주식회사 | 디스플레이장치, 그 제어방법 및 기록매체 |
CN112231464B (zh) * | 2020-11-17 | 2023-12-22 | 安徽鸿程光电有限公司 | 信息处理方法、装置、设备及存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003283993A (ja) * | 2002-03-27 | 2003-10-03 | Sanyo Electric Co Ltd | 映像情報記録再生装置及び映像情報記録再生方法 |
JP2005269510A (ja) * | 2004-03-22 | 2005-09-29 | Seiko Epson Corp | ダイジェスト画像データの生成 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6160950A (en) * | 1996-07-18 | 2000-12-12 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for automatically generating a digest of a program |
US5956026A (en) * | 1997-12-19 | 1999-09-21 | Sharp Laboratories Of America, Inc. | Method for hierarchical summarization and browsing of digital video |
JP2002535894A (ja) | 1999-01-12 | 2002-10-22 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | カメラ動きパラメータを推定する方法 |
JP4165851B2 (ja) * | 2000-06-07 | 2008-10-15 | キヤノン株式会社 | 記録装置及び記録制御方法 |
JP2002116784A (ja) | 2000-10-06 | 2002-04-19 | Sony Corp | 情報信号処理装置、情報信号処理方法、情報信号記録再生装置及び情報信号記録媒体 |
JP3631430B2 (ja) | 2000-11-08 | 2005-03-23 | 株式会社東芝 | 自動チャプタ作成機能付き記録再生装置 |
JP4913288B2 (ja) | 2001-05-14 | 2012-04-11 | ソニー株式会社 | 情報信号処理装置及び情報信号処理方法 |
US7143354B2 (en) * | 2001-06-04 | 2006-11-28 | Sharp Laboratories Of America, Inc. | Summarization of baseball video content |
JP4546682B2 (ja) * | 2001-06-26 | 2010-09-15 | パイオニア株式会社 | 映像情報要約装置、映像情報要約方法および映像情報要約処理プログラム |
US7203620B2 (en) * | 2001-07-03 | 2007-04-10 | Sharp Laboratories Of America, Inc. | Summarization of video content |
US6931201B2 (en) | 2001-07-31 | 2005-08-16 | Hewlett-Packard Development Company, L.P. | Video indexing using high quality sound |
US20030108334A1 (en) * | 2001-12-06 | 2003-06-12 | Koninklijke Philips Elecronics N.V. | Adaptive environment system and method of providing an adaptive environment |
JP2003298981A (ja) | 2002-04-03 | 2003-10-17 | Oojisu Soken:Kk | 要約画像作成装置、要約画像作成方法、要約画像作成プログラム、及び要約画像作成プログラムを記憶したコンピュータ読取可能な記憶媒体 |
US7286749B2 (en) * | 2002-04-16 | 2007-10-23 | Canon Kabushiki Kaisha | Moving image playback apparatus, moving image playback method, and computer program thereof with determining of first voice period which represents a human utterance period and second voice period other than the first voice period |
US20040088723A1 (en) * | 2002-11-01 | 2004-05-06 | Yu-Fei Ma | Systems and methods for generating a video summary |
US7274741B2 (en) * | 2002-11-01 | 2007-09-25 | Microsoft Corporation | Systems and methods for generating a comprehensive user attention model |
WO2005001715A1 (en) * | 2003-06-30 | 2005-01-06 | Koninklijke Philips Electronics, N.V. | System and method for generating a multimedia summary of multimedia streams |
US8250058B2 (en) * | 2005-10-18 | 2012-08-21 | Fish Robert D | Table for storing parameterized product/services information using variable field columns |
-
2005
- 2005-08-09 KR KR1020127014701A patent/KR20120068050A/ko not_active Application Discontinuation
- 2005-08-09 KR KR1020077003225A patent/KR101385087B1/ko not_active IP Right Cessation
- 2005-08-09 WO PCT/JP2005/014597 patent/WO2006016590A1/ja active Application Filing
- 2005-08-09 JP JP2006531663A patent/JP4935355B2/ja not_active Expired - Fee Related
- 2005-08-09 US US11/659,792 patent/US8634699B2/en not_active Expired - Fee Related
- 2005-08-09 EP EP05770540A patent/EP1784012A4/en not_active Ceased
- 2005-08-09 CN CN200580030347XA patent/CN101053252B/zh not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003283993A (ja) * | 2002-03-27 | 2003-10-03 | Sanyo Electric Co Ltd | 映像情報記録再生装置及び映像情報記録再生方法 |
JP2005269510A (ja) * | 2004-03-22 | 2005-09-29 | Seiko Epson Corp | ダイジェスト画像データの生成 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1784012A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1898392A2 (en) * | 2006-09-07 | 2008-03-12 | Sony Corporation | Reproduction apparatus, reproduction method and reproduction program |
EP1898392A3 (en) * | 2006-09-07 | 2009-06-17 | Sony Corporation | Reproduction apparatus, reproduction method and reproduction program |
US8588945B2 (en) | 2006-09-07 | 2013-11-19 | Sony Corporation | Reproduction apparatus, reproduction method and reproduction program |
CN101766024A (zh) * | 2007-07-27 | 2010-06-30 | 思科技术公司 | 数字视频记录器合作和相似媒体段确定 |
US8526784B2 (en) * | 2007-07-27 | 2013-09-03 | Cisco Technology, Inc. | Digital video recorder collaboration and similar media segment determination |
Also Published As
Publication number | Publication date |
---|---|
KR20120068050A (ko) | 2012-06-26 |
CN101053252A (zh) | 2007-10-10 |
EP1784012A4 (en) | 2011-10-26 |
JPWO2006016590A1 (ja) | 2008-07-31 |
US20070286579A1 (en) | 2007-12-13 |
JP4935355B2 (ja) | 2012-05-23 |
CN101053252B (zh) | 2011-05-25 |
EP1784012A1 (en) | 2007-05-09 |
KR101385087B1 (ko) | 2014-04-14 |
US8634699B2 (en) | 2014-01-21 |
KR20070047776A (ko) | 2007-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006016590A1 (ja) | 情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 | |
JP4882746B2 (ja) | 情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 | |
TWI259719B (en) | Apparatus and method for reproducing summary | |
US20080044085A1 (en) | Method and apparatus for playing back video, and computer program product | |
US20030063130A1 (en) | Reproducing apparatus providing a colored slider bar | |
US8532800B2 (en) | Uniform program indexing method with simple and robust audio feature enhancing methods | |
JP3891111B2 (ja) | 音響信号処理装置及び方法、信号記録装置及び方法、並びにプログラム | |
JP2005210234A (ja) | 映像内容認識装置、録画装置、映像内容認識方法、録画方法、映像内容認識プログラム、および録画プログラム | |
US20060285818A1 (en) | Information processing apparatus, method, and program | |
JPWO2010073355A1 (ja) | 番組データ処理装置、方法、およびプログラム | |
KR100612874B1 (ko) | 스포츠 동영상의 요약 방법 및 장치 | |
JP4835439B2 (ja) | 情報信号処理方法、情報信号処理装置及びコンピュータプログラム記録媒体 | |
JP4432823B2 (ja) | 特定条件区間検出装置および特定条件区間検出方法 | |
JP4341503B2 (ja) | 情報信号処理方法、情報信号処理装置及びプログラム記録媒体 | |
JP2006054622A (ja) | 情報信号処理方法、情報信号処理装置及びプログラム記録媒体 | |
JP2006270233A (ja) | 信号処理方法及び信号記録再生装置 | |
JP4470638B2 (ja) | 情報信号処理方法、情報信号処理装置及びプログラム記録媒体 | |
JP2010081531A (ja) | 映像処理装置及びその方法 | |
JP4683277B2 (ja) | 再生装置および方法、並びにプログラム | |
JP2006054621A (ja) | 情報信号処理方法、情報信号処理装置及びプログラム記録媒体 | |
JP5056687B2 (ja) | 再生装置及びコンテンツ再生プログラム | |
JP2006303868A (ja) | 信号属性判定装置、信号属性判定方法、情報信号記録装置、情報信号記録方法、情報信号再生装置、情報信号再生方法、情報信号記録再生装置および情報信号記録再生方法並びに記録媒体 | |
JP2006333279A (ja) | 記録装置および方法、並びにプログラム | |
JP2006352631A (ja) | 情報処理装置および方法、並びにプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006531663 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11659792 Country of ref document: US Ref document number: 1020077003225 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005770540 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580030347.X Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2005770540 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11659792 Country of ref document: US |