WO2004002144A1 - メタデータ作成装置、その作成方法および検索装置 - Google Patents
メタデータ作成装置、その作成方法および検索装置 Download PDFInfo
- Publication number
- WO2004002144A1 WO2004002144A1 PCT/JP2003/007908 JP0307908W WO2004002144A1 WO 2004002144 A1 WO2004002144 A1 WO 2004002144A1 JP 0307908 W JP0307908 W JP 0307908W WO 2004002144 A1 WO2004002144 A1 WO 2004002144A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- metadata
- unit
- file
- voice
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/433—Content storage operation, e.g. storage operation in response to a pause request, caching operations
- H04N21/4334—Recording operations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440236—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8106—Monomedia components thereof involving special audio data, e.g. different tracks for different languages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
Definitions
- Metadata creation device creation method thereof, and retrieval device
- the present invention relates to a metadata creation device and a metadata creation method for creating metadata related to content such as produced video and audio. Also, the present invention relates to a search device that searches for content by using the created metadata. Background art
- the present invention has been made to solve the above-mentioned problems, and has been developed to solve the above-mentioned problem. It is intended to provide an evening creation device and a metadata creation method. It is another object of the present invention to provide a search device that can easily search for the content of the content by using the metadata created as described above.
- the metadata creation device of the present invention includes: a content reproduction unit that reproduces and outputs a content; a voice input unit; a voice recognition unit that recognizes a voice signal input from the voice input unit; and a voice recognition unit.
- a metadata generation unit that converts the information recognized by the metadata into metadata, and identification information for identifying each part in the content, obtained from the reproduced content supplied from the content reproduction unit, and And an identification information assigning unit for assigning the generated metadata to each part in the content.
- the information related to the content is input by voice, the input voice signal is subjected to voice recognition by a voice recognition device, the voice-recognized information is converted into metadata, and the The identification information given to the content for identifying each part of the metadata is added to the metadata, and the generated metadata is associated with each part in the content.
- the metadata search device of the present invention comprises: a content base for reproducing and outputting content; and an audio input unit for converting an input keyboard audio signal into data with a clock synchronized with a reproduced synchronizing signal of the content.
- a voice recognition unit for recognizing a keyword from voice signal data converted into data by the voice input unit; and a time code indicating a time position of an image signal included in the content by using a keypad output from the voice recognition unit.
- a file processing unit for creating a metadata file by combining with a content file; a content information file processing unit for generating a control file for managing the relationship between the recording position of the content file and the metadata file;
- a recording unit that records the file, the metadata file, and the control file, and identifies the metadata file including the input search keyword, and refers to the control file to determine the content file.
- a search unit for extracting a recording position corresponding to the keyword.
- the recording position of the content file is a recording position in the recording unit.
- FIG. 1 is a block diagram showing a configuration of a metadata creation device according to Embodiment 1 of the present invention.
- FIG. 2 is a diagram showing an example of the time code assignment metadata according to the first embodiment of the present invention. .
- FIG. 3 is a block diagram showing a configuration of a metadata creation device according to Embodiment 2 of the present invention.
- FIG. 4 is a diagram showing an example of a still image content / metadata display unit in the same device.
- FIG. 5 is a block diagram illustrating another configuration of the metadata creation device according to the second embodiment of the present invention.
- FIG. 6 is a block diagram showing a configuration of a metadata creation device according to Embodiment 3 of the present invention.
- FIG. 7 is a configuration diagram showing an example of the dictionary DB in the device according to the embodiment.
- FIG. 8 is a diagram showing a recipe as an example of a content scenario applied to the apparatus of the embodiment.
- FIG. 9 is a TEXT format data diagram showing an example of a metadata file created by the apparatus according to the embodiment.
- FIG. 10 is a block diagram showing a configuration of a metadata creation device according to Embodiment 4 of the present invention.
- FIG. 11 is a configuration diagram showing an example of an information file created by the device of the embodiment.
- FIG. 12 is a block diagram showing a configuration of a metadata search device according to Embodiment 5 of the present invention.
- FIG. 13 is a block diagram showing a configuration of a metadata creation device according to Embodiment 6 of the present invention.
- metadata or tags are created by voice input using voice recognition, and metadata or tags are created.
- the association between the tag and the content time or scene is performed. This makes it possible to automatically create metadata that was previously created by inputting using the keyboard by voice input.
- metadata means a set of tags, and in the present invention, the term “metadata” includes the meaning of evening itself.
- content is used to mean anything including what is generally called content, such as produced video, audio content, still image content, database-based video, and audio content.
- the metadata creation device of the present invention preferably further comprises a dictionary relating to the content, and when the voice recognition unit recognizes the voice signal input from the voice input unit, recognizes the voice signal in association with the dictionary. It is configured as follows. According to this configuration, a keyword extracted in advance from a scenario or the like of the produced content is input as an audio signal, and a keyword is input based on the scenario. By setting dictionary fields and prioritizing keywords, metadata can be generated efficiently and accurately from speech recognition means.
- the information processing apparatus further includes an information processing unit including a keyboard, and the metadata can be modified via the information processing unit in response to an input from the keyboard.
- the identification information time code information given to the content can be used.
- a content address, number, or frame number assigned to the content may be used as the identification information.
- the content is a still image content, and each address of the still image content can be used as the identification information.
- the following metadata creation device can be configured. That is, the content reproduction unit is configured by a content database, and the audio input unit converts the input audio signal of the keyword into data using a clock synchronized with a synchronization signal supplied from the content database. Supply it to the voice recognition unit.
- the voice recognition unit is configured to recognize a keyword from voice signal data digitized by the voice input unit.
- the metadata generation unit uses a time code indicating a time position of an image signal included in the content as the identification information, and combines a keyword output from the voice recognition unit with the time code to generate a metadata file. It is configured as a file processing unit that creates
- this configuration further includes a recording unit that records the content supplied from the content database as a content file together with the metadata file.
- the content file further includes a content information file processing unit that generates a control file that manages a relationship between a recording position at which the content file is to be recorded and the metadata file, and the recording unit includes the content file and the metadata file.
- the control file is recorded.
- a dictionary database is further provided, and the speech recognition unit is configured to be able to select a dictionary of a genre suitable for the content from a plurality of genre-specific dictionaries. More preferably, a keyword associated with a content can be supplied to the speech recognition unit, and the speech recognition unit is configured to preferentially recognize the keyword.
- information related to the content is input by voice while the content is reproduced and displayed on a monitor.
- a dictionary associated with the content is used, and the input voice signal is recognized by the voice recognition device in association with the dictionary.
- time code information given to the content is used as the identification information.
- a still image content can be used as the content, and each address of the still image content can be used as the identification information.
- the metadata search device of the present invention by using the control file and the metadata indicating the recording position of the content and the metadata file indicating the time code and the like, a desired portion of the content can be determined based on the metadata. You can search at high speed.
- the control file output from the report file processing unit is A table is provided for specifying the content recording position in the recording unit according to the recording time, so that the recording position of the content can be searched from the time code.
- the apparatus further comprises a dictionary database and a keyword supply unit that supplies a keyword associated with the content to the voice recognition unit, wherein the voice recognition unit is configured to select a genre suitable for the content from a plurality of genre-based dictionaries.
- the dictionary can be selected, and the key is preferentially recognized.
- the apparatus further comprises a dictionary database, wherein the voice recognition unit can select a dictionary of a genre suitable for the content from a plurality of genre-specific dictionaries, and the search unit includes the voice recognition unit It is configured to perform a search using the keyword selected from the common dictionary used in the above.
- FIG. 1 is a block diagram showing a configuration of a metadata creation device according to Embodiment 1 of the present invention.
- the content reproduction unit 1 is an element for confirming the produced content when creating Metade.
- the output of the content reproducing unit 1 is supplied to a video monitor 2, an audio monitor 3, and a time code providing unit 7.
- Microphone 4 is provided as a voice input unit for creating metadata.
- the voice input from the microphone 4 is input to the voice recognition unit 5.
- a speech recognition dictionary 8 is connected to the speech recognition unit 5 so that the dictionary can be referred to.
- the recognition output of the voice recognition unit 5 is supplied to a metadata generation unit 6, and the created metadata is supplied to a time code addition unit 7 and can be output from the time code addition unit 7 to the outside.
- the content reproducing unit 1 includes, for example, a video / audio signal reproducing device such as a VTR, a hard disk device, and an optical disk device, a video / audio signal reproducing device using a memory means such as a semiconductor memory as a recording medium, or transmission / transmission.
- a video / audio playback device that reproduces the video / audio signal supplied by the computer is used.
- the reproduced video signal is supplied to the video monitor 2 from the video signal output terminal 1 a of the content reproduction unit 1.
- the reproduced audio signal is supplied to the audio monitor 3 from the audio signal output terminal 1 b.
- the time code output terminal 1 c supplies the reproduced time code to the time code adding unit 7.
- the video monitor 2 and the audio monitor 3 are not indispensable as elements of the metadata creation device, but may be connected and used as needed.
- the operator confirms one or both of the video monitor 2 and the audio monitor 3 and sometimes refers to the scenario or the narrative manuscript, etc. And input with microphone 4.
- the voice signal output from the microphone 4 is supplied to the voice recognition unit 5.
- the data of the dictionary 8 for voice recognition is referred to by the voice recognition unit 5 as needed.
- the speech data recognized by the speech recognition unit 5 is supplied to the metadata generation unit 6 and is converted into metadata.
- the metadata generated in this manner is obtained from the reproduced content in the time code providing unit 7 in order to provide information for associating the time of each part of the content or the relationship with the scene.
- the supplied time code information is provided.
- the voice recognition unit 5 refers to the dictionary 8 and reads “1 spoonful of salt”. " And the metade / night generator 6 converts them into evenings of “salt” and “one spoon”, respectively.
- the configuration of the voice recognition unit 5 is not particularly limited, and voice recognition is performed using various commonly used voice recognition means, and data such as "Shio” and "Hotspoon” are converted. You only need to recognize it. It should be noted that, in general, “meday and night” means an aggregate of such tags.
- metadata 9 a is output from the voice recognition unit 5 and supplied to the time code adding unit 7.
- the time code adding unit 7 generates packet data composed of a time code added message 10 with a time code added based on the time code signal 9 b supplied from the content reproducing unit 1. .
- the generated image data may be output as it is, or may be stored in a recording medium such as a hard disk.
- FIG. 3 is a block diagram showing a configuration of a metadata creation device according to Embodiment 2 of the present invention.
- This embodiment is an example in which a still image content is to be created as a metadata.
- a configuration is used in which the generated metadata is associated with the still image content using the address of the content corresponding to the time code in the case of a moving image.
- camera 11 is an element for producing still image content.
- the output of the camera 11 is recorded with address information added thereto by the still image content recording unit 12.
- the still image content and the address information recorded here are supplied to the still image content .meta-demo recording unit 13 for creating metadata.
- the address information is further supplied to a metadata address assigning section 19.
- the microphone 16 is used for voice input of information related to a still image, and its output is input to the voice recognition unit 17.
- the speech recognition unit 17 is connected to a dictionary 20 for speech recognition, and can refer to the data.
- the recognition output of the voice recognition unit 17 is supplied to a metadata generation unit 18, and the created metadata is supplied to a metadata address assignment unit 19.
- the still image content and metadata recorded in the still image content / metadata recording unit 13 are reproduced by the still image content / metadata reproducing unit 14 and are stored in the still image content / metadata display unit 15. Is displayed.
- the operation of the metadata creating apparatus having the above configuration will be described in more detail.
- the still image content shot by the camera 11 is recorded on a recording medium (not shown) by the still image content recording unit 12 and is provided with address information.
- the address information is also recorded on the recording medium.
- the recording medium is generally composed of a semiconductor memory, but is not limited to a semiconductor memory.For example, various recording media such as a magnetic memory, an optical recording medium, and a magneto-optical recording medium can be used. It is.
- the recorded still image content is output via the output terminal 12a and the input terminal 13a.
- the address information is output via the output terminal 12b and the input terminal 13b. Supplied to parts 13.
- the address information is further supplied to the metadata address assigning section 19 via the output terminal 12b and the input terminal 19b.
- information related to a still image captured by the camera 11 is input to the voice recognition unit 17 via the microphone 16.
- Information related to a still image includes, for example, the title, date and time of shooting, the photographer, the shooting location (where), the subject (who), and the subject (what).
- the speech recognition unit 17 also supplies data of the dictionary 20 for speech recognition as needed.
- the voice data recognized by the voice recognition unit 17 is sent to the metadata generation unit 18 Supplied and converted to metadata or tags.
- the title is information related to the content, such as the title, the time of shooting, the photographer, the shooting location (where), the subject (who), and the subject (what ) Means a collection of tags such as.
- the metadata or tag generated in this way is supplied to the metadata address assigning unit 19 in order to add information for associating the content of the still image content or the relationship with the scene.
- the address information supplied from the still image content recording unit 12 is added to the metadata in the metadata overnight address assignment unit 19.
- the address assignment metadata to which the address information is assigned in this manner is supplied to the still image content / metadata recording unit 13 via the output terminal 19c and the input terminal 13c.
- Still image content ⁇ In the recording section 13, still image content of the same address is recorded in association with metadata of the same address.
- FIG. 4 shows an example of the result of reproduction and display by the still image content / metadata display unit 15.
- the screen of the still image content / metadata display section 15 in FIG. 4 is an example, but is composed of a still image content display section 21, an address display section 22, and a metadata display area 23. Is done.
- the metadata display area 23 includes, for example, 1) title description section 23a, 2) date / time description section 23b, 3) photographer description section 23c, 4) shooting location description section 23. It consists of d.
- the above-mentioned operation is not necessarily performed when the metadata is generated before, at almost the same time as, or immediately after shooting the still image content. This relates to the case where confirmation of still image content is not required.
- a still image content / address reproduction unit 24 is provided between the still image content recording unit 12 and the still image content / metadata recording unit 13. Further, a monitor 25 to which the output of the still image content / address reproduction unit 24 is supplied is provided.
- the still image content shot by the camera 11 and supplied to the still image content recording unit 12 is recorded on a recording medium (not shown), is assigned an address, and the address is also recorded on the recording medium.
- a recording medium is supplied to the still image content / address reproduction unit 24. Therefore, in this way, the meta data creation device used for reproducing the produced still image content and generating metadata for the monitored still image content has a camera 11 and a still image content recording device. Parts 12 are not required.
- Still picture content The still picture content reproduced by the address reproduction section 24 is supplied to the monitor 25. Similarly, the reproduced address information is supplied to the metadata address assigning section 19 via the output terminal 24b and the input terminal 19b.
- the person in charge of generating the metadata checks the still image content displayed on the monitor 25 and inputs words necessary for generating the metadata through the microphone 16.
- information related to the still image captured by the camera 11 is input to the voice recognition unit 17 via the microphone 16.
- the information related to the still image includes, for example, the title, date and time of photographing, photographer, photographing place (where), photographed person (who), photographed object (what), and the like. Subsequent operations are the same as those described for the configuration in FIG. (Embodiment 3)
- FIG. 6 is a block diagram showing a configuration of a metadata creation device according to Embodiment 3 of the present invention.
- the present embodiment is an example in which general digital data content is to be created for Metadata overnight. In order to identify the digital data content, it has a configuration that associates the digital data content with the generated metadata using the address or number of the content.
- reference numeral 31 denotes a content database (hereinafter, referred to as a content DB).
- the output reproduced from the content DB 31 includes an audio input unit 32, a file processing unit 35, and a recording unit. Supplied to 37.
- the output of the voice input unit 32 is supplied to the voice recognition unit 33.
- the data of the dictionary database (hereinafter, referred to as dictionary DB) 34 can also be supplied to the voice recognition unit 33.
- Metadata is output from the voice recognition unit 33 and input to the file processing unit 35.
- the file processing unit 35 uses the time code value supplied from the content DB 31 to perform a file conversion process in a format in which predetermined data is added to the metadata output from the voice recognition unit 33.
- the metadata file output from the file processing unit 35 is supplied to the recording unit 37, and is recorded together with the content output from the content DB 31.
- the voice input unit 32 has a voice input terminal 39
- the dictionary DB 34 has a dictionary field selection input terminal 40.
- the playback output from the content DB 31 and the playback output from the recording unit 37 can be displayed on the video monitor 41.
- the content DB 31 is supplied by, for example, a video / audio signal reproducing device such as a VTR, a hard disk device, or an optical disk device, a video / audio signal reproducing device using a memory means such as a semiconductor memory as a recording medium, or transmitted or broadcast
- a video / audio signal reproducing device such as a VTR, a hard disk device, or an optical disk device
- a video / audio signal reproducing device using a memory means such as a semiconductor memory as a recording medium
- the video and audio signal to be recorded and played back first. It has a configuration that has a function to play back produced content, such as a live device, while generating a time code that matches the content.
- the operation of the metadata creation device will be described below.
- the video signal with the time code reproduced from the content DB 31 is supplied to the video monitor 41 and projected.
- the voice signal is input to the voice input section 32 via the voice input terminal 39.
- the worker checks the content or time code displayed on the video monitor 41, and extracts a content management keyword extracted based on the scenario, the narrative manuscript, or the content. It is preferable to utter.
- a keyword limited in advance from a scenario or the like as the input speech signal it is possible to improve the recognition rate in the subsequent speech recognition unit 33.
- the audio input unit 32 converts the audio signal input from the audio input terminal 39 into data using a clock synchronized with the vertical synchronization signal output from the content DB 1.
- the voice signal data converted into data by the voice input unit 32 is input to the voice recognition unit 33, and at the same time, a dictionary necessary for voice recognition is supplied from the dictionary DB 34.
- the dictionary for speech recognition used in the dictionary DB 34 can be set from the dictionary field selection input terminal 40.
- the field to be used is set from the dictionary field selection input terminal 40 (for example, a keyboard terminal capable of key input).
- the field of the dictionary DB 34 can be set from the terminal 40, for example, cooking, Japanese cooking, cooking method, and stir-fried vegetables.
- a scenario, a scenario manuscript, or a keyword extracted from the content of the content can be input from the dictionary field selection terminal 40 in FIG.
- the dictionary DB 34 sets the recognition priority of the recipe words input from terminal 40 to Specify and give priority to voice recognition. For example, if “persimmon” and “shell oyster j” are in the dictionary, if the recipe word input from terminal 40 is only “shell oyster”, priority is given to “shell oyster”. Be killed.
- the voice recognition unit 33 recognizes the voice of “Kaki”, it recognizes it as “shell oyster” that specifies the priority 1 of the word set in the dictionary DB 34.
- the dictionary DB 34 limits the words in the field input from the terminal 40, and furthermore, inputs the scenario from the terminal 40 and specifies the priority of the words, thereby enabling the speech recognition unit 33 Can improve the recognition rate.
- the voice recognition unit 33 in FIG. 6 recognizes voice signal data input from the voice input unit 32 according to the dictionary supplied from the dictionary DB 34, and generates message data.
- the metadata output from the voice recognition unit 33 is input to the file processing unit 35.
- the audio input section 32 converts the audio signal into data in synchronization with the vertical synchronization signal reproduced from the content DB 1.
- the file processing unit 35 uses the synchronization information from the audio input unit 32 and the time code value supplied from the content DB 31 to, for example, in the case of the cooking program described above, as shown in FIG. Output a TEXT format metadata file like this. That is, the file processing unit 35 adds the metadata output from the voice recognition unit 33 to the metadata Creates a file in a format that includes TM-ENT (second), which is the reference time per second, TM-OFFSET, which indicates the number of frame offsets from the reference time, and time code.
- TM-ENT second
- TM-OFFSET which indicates the number of frame offsets from the reference time, and time code.
- the recording unit 37 records the metadata file output from the file processing unit 35 and the content output from the content DB 31.
- the recording unit 37 includes an HDD, a memory, an optical disk, and the like, and also records the content output from the content DB 31 in a file format.
- FIG. 10 is a block diagram showing a configuration of a metadata creation device according to Embodiment 4 of the present invention.
- a content information file processing unit 36 is added to the configuration of the third embodiment.
- the content information file processing unit 36 generates a control file indicating the recording position relationship of the content recorded in the recording unit 37, and records the control file in the recording unit 37.
- the content information file processing unit 36 Axis information and information indicating the address relationship of the content recorded in the recording unit 37 are generated, de-multiplexed, and output as a control file.
- TM-ENT #j indicating the time axis reference of the content is pointed at equal time axis intervals to the recording media address indicating the recording position of the content.
- TM-ENT #j is pointed to the recording media address every 1 second (30 frames for NTSC signal).
- the metadata file includes TM-ENT (second), which is the reference time every second from the start of the file, TM-OFFSET, which indicates the number of frame offsets from the reference time, and time Code and metadata are recorded in TEXT format. Therefore, if you specify the date and time 1 in the metadata file, the time code, reference time, and frame offset value will be known, so the recording in the recording unit 37 from the control file shown in Figure 11 The position will be known immediately.
- isochronous axis interval of TM-ENT #j is not limited to the point every one second as described above, and may be described in accordance with a GOP unit used in MPEG 2 compression or the like. .
- the vertical sync signal is 60 / 1.001Hz, so the time code matched to the drop frame mode and the vertical sync signal (60/1. 001 Hz) can be used.
- non-drop time code is represented by TM-ENT #j
- TC-ENT #j is represented by time code corresponding to the drop frame.
- control file can be converted to data using an existing language such as SMIL2, and if the function of SMIL2 is used, the related content and the file name of the metadata file can also be converted. Together, they can be stored in a control file.
- SMIL2 existing language
- the related content and the file name of the metadata file can also be converted. Together, they can be stored in a control file.
- Fig. 11 shows a configuration in which the recording address of the recording unit is directly displayed, but instead of the recording address, the data capacity from the beginning of the content file to the time code is displayed, and the data capacity and file
- the recording address of the time code in the recording unit may be calculated from the recording address of the system and detected.
- the correspondence table between TM-ENT tj and time code is stored.
- the same effect can be obtained by storing the corresponding table of TM-ENT #j and time code in the control file instead of the format stored in the evening data file.
- FIG. 12 is a block diagram showing a configuration of a message search apparatus in Embodiment 5 of the present invention.
- a search unit 38 is added to the configuration of the fourth embodiment.
- the search unit 38 selects and sets the key word of the scene to be searched from the same dictionary DB 34 used for detecting the metadata by voice recognition.
- the search unit 38 searches for a metadata item of the metadata file, and displays a list of title names and content scene positions (time codes) that match the keyword.
- the recording media address in the control file is automatically detected from the reference time TM—ENT (seconds) and the number of frame offsets TM—OFFSET of the metadata file.
- the content scene recorded at the recording media address from the recording unit 37 is reproduced and displayed on the monitor 41.
- thumbnail file linked to the content If a thumbnail file linked to the content is prepared, a representative thumbnail image of the content can be played back and displayed when displaying the list of content names that match the key described above. It is.
- FIG. 13 shows the structure of the present invention.
- FIG. 17 is a block diagram showing a configuration of a metadata creation device according to a sixth embodiment.
- the imaging output of the camera 51 is recorded in the content DB 54 as video content.
- the GPS 52 detects the location where the camera is shooting, and the position information (latitude and longitude values) is converted into an audio signal by the audio synthesizing unit 53, and is recorded as the position information on the audio channel of the content DB 54.
- the camera 51, GPS 52, voice synthesizing unit 53, and content DB 54 can be integrally configured as a camera 50 with a recording unit.
- the content DB 54 inputs the position information of the audio signal recorded on the audio channel to the audio recognition unit 56.
- the dictionary data is supplied from the dictionary DB 55 to the speech recognition unit 56.
- the dictionary DB 55 can be configured to select and restrict a region name, a landmark, and the like by a keyboard input or the like from a terminal 59 and output the selected name to the voice recognition unit 56.
- the voice recognition unit 56 detects a region name and a landmark using the recognized background numerical value and the data of the dictionary DB 55, and outputs it to the file processing unit 57.
- the file processing unit 57 the time code output from the content DB 54 and the area name and landmark output from the voice recognition unit 56 are converted into TEXT as metadata, and a metadata file is generated.
- the metadata file is supplied to the recording unit 58, and the recording unit 58 records the metadata file and the content data output from the content DB 54.
- the configuration in which the key code recognized by the voice recognition unit is filed together with the time code in the metadata file is described. May be added to make a file. For example, if Yodogawa is recognized by voice, a file with general attribute keywords such as topography and river is also added. By doing so, it is possible to use keywords such as added terrain and rivers at the time of searching, so that searchability can be improved.
- the speech recognition unit employs a word recognition method for recognizing speech on a word-by-word basis, and limits the number of words of speech input and the number of words of a recognition dictionary to be used, thereby improving the speech recognition rate. can do.
- Each of the above embodiments is provided with an information processing unit such as a computer including a keyboard, and is configured to correct created metadata or tags by a keypad operation in the case of erroneous recognition. Can also. Industrial applicability
- the metadata creating apparatus of the present invention in order to create or tag metadata related to the content, the metadata is created by voice input using voice recognition, and a predetermined portion of the metadata and the content is created. This makes it possible to create and tag metadata more efficiently than with conventional keyboard input.
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03733537A EP1536638A4 (en) | 2002-06-24 | 2003-06-23 | METADATA PRODUCTION DEVICE, CREATION METHOD DAFUR AND TESTING DEVICE |
MXPA04012865A MXPA04012865A (es) | 2002-06-24 | 2003-06-23 | Dispositivo de preparacion de metadatos, metodo de preparacion para el mismo y dispositivo de recuperacion. |
US10/519,089 US20050228665A1 (en) | 2002-06-24 | 2003-06-23 | Metadata preparing device, preparing method therefor and retrieving device |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002-182506 | 2002-06-24 | ||
JP2002182506 | 2002-06-24 | ||
JP2002319757A JP2004153765A (ja) | 2002-11-01 | 2002-11-01 | メタデータ制作装置及び制作方法 |
JP2002-319757 | 2002-11-01 | ||
JP2002319756A JP3781715B2 (ja) | 2002-11-01 | 2002-11-01 | メタデータ制作装置及び検索装置 |
JP2002-319756 | 2002-11-01 | ||
JP2002-334831 | 2002-11-19 | ||
JP2002334831A JP2004086124A (ja) | 2002-06-24 | 2002-11-19 | メタデータ制作装置及び制作方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004002144A1 true WO2004002144A1 (ja) | 2003-12-31 |
WO2004002144B1 WO2004002144B1 (ja) | 2004-04-08 |
Family
ID=30003905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2003/007908 WO2004002144A1 (ja) | 2002-06-24 | 2003-06-23 | メタデータ作成装置、その作成方法および検索装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20050228665A1 (ja) |
EP (1) | EP1536638A4 (ja) |
CN (1) | CN1663249A (ja) |
MX (1) | MXPA04012865A (ja) |
WO (1) | WO2004002144A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018041183A (ja) * | 2016-09-06 | 2018-03-15 | 株式会社日立ビルシステム | 保全作業管理システム及び保全作業管理装置 |
Families Citing this family (155)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
JP4127668B2 (ja) * | 2003-08-15 | 2008-07-30 | 株式会社東芝 | 情報処理装置、情報処理方法、およびプログラム |
US20060080286A1 (en) * | 2004-08-31 | 2006-04-13 | Flashpoint Technology, Inc. | System and method for storing and accessing images based on position data associated therewith |
US7818350B2 (en) | 2005-02-28 | 2010-10-19 | Yahoo! Inc. | System and method for creating a collaborative playlist |
JP2006311462A (ja) * | 2005-05-02 | 2006-11-09 | Toshiba Corp | コンテンツ検索装置及びその方法 |
US7467147B2 (en) * | 2005-06-01 | 2008-12-16 | Groundspeak, Inc. | System and method for facilitating ad hoc compilation of geospatial data for on-line collaboration |
JP4659681B2 (ja) * | 2005-06-13 | 2011-03-30 | パナソニック株式会社 | コンテンツタグ付け支援装置およびコンテンツタグ付け支援方法 |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US7844820B2 (en) * | 2005-10-10 | 2010-11-30 | Yahoo! Inc. | Set of metadata for association with a composite media item and tool for creating such set of metadata |
CN103000210A (zh) | 2005-10-21 | 2013-03-27 | 尼尔逊媒介研究股份有限公司 | 用于计量便携式媒体播放器的方法和装置 |
US7822746B2 (en) * | 2005-11-18 | 2010-10-26 | Qurio Holdings, Inc. | System and method for tagging images based on positional information |
EP1998554A4 (en) * | 2006-03-23 | 2009-11-25 | Panasonic Corp | CONTENT IMAGING APPARATUS |
KR101583268B1 (ko) | 2006-03-27 | 2016-01-08 | 닐슨 미디어 리서치 인코퍼레이티드 | 무선통신장치에 표현되는 미디어 컨텐츠의 미터링 방법 및 시스템 |
EP2011017A4 (en) * | 2006-03-30 | 2010-07-07 | Stanford Res Inst Int | METHOD AND APPARATUS FOR ANNOTATING MULTIMEDIA STREAMS |
JP4175390B2 (ja) * | 2006-06-09 | 2008-11-05 | ソニー株式会社 | 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム |
KR100856407B1 (ko) * | 2006-07-06 | 2008-09-04 | 삼성전자주식회사 | 메타 데이터를 생성하는 데이터 기록 및 재생 장치 및 방법 |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
JP2008118232A (ja) * | 2006-11-01 | 2008-05-22 | Hitachi Ltd | 映像再生装置 |
US8643745B2 (en) * | 2007-03-12 | 2014-02-04 | Panasonic Corporation | Content shooting apparatus |
US8204359B2 (en) * | 2007-03-20 | 2012-06-19 | At&T Intellectual Property I, L.P. | Systems and methods of providing modified media content |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8793256B2 (en) | 2008-03-26 | 2014-07-29 | Tout Industries, Inc. | Method and apparatus for selecting related content for display in conjunction with a media |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8364721B2 (en) * | 2008-06-12 | 2013-01-29 | Groundspeak, Inc. | System and method for providing a guided user interface to process waymark records |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
KR101479079B1 (ko) * | 2008-09-10 | 2015-01-08 | 삼성전자주식회사 | 디지털 캡션에 포함된 용어의 설명을 표시해주는 방송수신장치 및 이에 적용되는 디지털 캡션 처리방법 |
US8712776B2 (en) * | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
KR20100061078A (ko) * | 2008-11-28 | 2010-06-07 | 삼성전자주식회사 | 메타 데이터를 이용하는 컨텐츠 소비 방법 및 그 장치 |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US20120311585A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Organizing task items that represent tasks to perform |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US8935204B2 (en) * | 2009-08-14 | 2015-01-13 | Aframe Media Services Limited | Metadata tagging of moving and still image content |
GB2472650A (en) * | 2009-08-14 | 2011-02-16 | All In The Technology Ltd | Metadata tagging of moving and still image content |
JP5257330B2 (ja) * | 2009-11-06 | 2013-08-07 | 株式会社リコー | 発言記録装置、発言記録方法、プログラム及び記録媒体 |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
KR20120045582A (ko) * | 2010-10-29 | 2012-05-09 | 한국전자통신연구원 | 음향 모델 생성 장치 및 방법 |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
BR112015018905B1 (pt) | 2013-02-07 | 2022-02-22 | Apple Inc | Método de operação de recurso de ativação por voz, mídia de armazenamento legível por computador e dispositivo eletrônico |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9325381B2 (en) | 2013-03-15 | 2016-04-26 | The Nielsen Company (Us), Llc | Methods, apparatus and articles of manufacture to monitor mobile devices |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
KR101759009B1 (ko) | 2013-03-15 | 2017-07-17 | 애플 인크. | 적어도 부분적인 보이스 커맨드 시스템을 트레이닝시키는 것 |
US9559651B2 (en) | 2013-03-29 | 2017-01-31 | Apple Inc. | Metadata for loudness and dynamic range control |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
CN105264524B (zh) | 2013-06-09 | 2019-08-02 | 苹果公司 | 用于实现跨数字助理的两个或更多个实例的会话持续性的设备、方法、和图形用户界面 |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN105265005B (zh) | 2013-06-13 | 2019-09-17 | 苹果公司 | 用于由语音命令发起的紧急呼叫的系统和方法 |
JP6163266B2 (ja) | 2013-08-06 | 2017-07-12 | アップル インコーポレイテッド | リモート機器からの作動に基づくスマート応答の自動作動 |
US9942396B2 (en) * | 2013-11-01 | 2018-04-10 | Adobe Systems Incorporated | Document distribution and interaction |
US9544149B2 (en) | 2013-12-16 | 2017-01-10 | Adobe Systems Incorporated | Automatic E-signatures in response to conditions and/or events |
US10182280B2 (en) | 2014-04-23 | 2019-01-15 | Panasonic Intellectual Property Management Co., Ltd. | Sound processing apparatus, sound processing system and sound processing method |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
EP3149728B1 (en) | 2014-05-30 | 2019-01-16 | Apple Inc. | Multi-command single utterance input method |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9703982B2 (en) | 2014-11-06 | 2017-07-11 | Adobe Systems Incorporated | Document distribution and interaction |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
CN106409295B (zh) * | 2015-07-31 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 从自然语音信息中识别时间信息的方法和装置 |
US9935777B2 (en) | 2015-08-31 | 2018-04-03 | Adobe Systems Incorporated | Electronic signature framework with enhanced security |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US9626653B2 (en) | 2015-09-21 | 2017-04-18 | Adobe Systems Incorporated | Document distribution and interaction with delegation of signature authority |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
CN105389350B (zh) * | 2015-10-28 | 2019-02-15 | 浪潮(北京)电子信息产业有限公司 | 一种分布式文件系统元数据信息获取方法 |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US10347215B2 (en) | 2016-05-27 | 2019-07-09 | Adobe Inc. | Multi-device electronic signature framework |
KR102465227B1 (ko) | 2016-05-30 | 2022-11-10 | 소니그룹주식회사 | 영상 음향 처리 장치 및 방법, 및 프로그램이 저장된 컴퓨터 판독 가능한 기록 매체 |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | INTELLIGENT AUTOMATED ASSISTANT IN A HOME ENVIRONMENT |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10503919B2 (en) | 2017-04-10 | 2019-12-10 | Adobe Inc. | Electronic signature framework with keystroke biometric authentication |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK179549B1 (en) | 2017-05-16 | 2019-02-12 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US11652656B2 (en) * | 2019-06-26 | 2023-05-16 | International Business Machines Corporation | Web conference replay association upon meeting completion |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07184160A (ja) * | 1993-12-24 | 1995-07-21 | Canon Inc | 画像データ及び音声データを処理する装置 |
JPH09130736A (ja) * | 1995-11-02 | 1997-05-16 | Sony Corp | 撮像装置及び編集装置 |
JPH09149365A (ja) * | 1995-11-20 | 1997-06-06 | Ricoh Co Ltd | デジタルスチルビデオカメラ |
JP2000078530A (ja) * | 1998-08-28 | 2000-03-14 | Nec Corp | 情報記録装置および情報記録方法、並びに記録媒体 |
JP2000306365A (ja) * | 1999-04-16 | 2000-11-02 | Sony Corp | 編集支援システム及び編集支援システムの制御装置 |
JP2002171481A (ja) * | 2000-12-04 | 2002-06-14 | Ricoh Co Ltd | 映像処理装置 |
JP2002207753A (ja) * | 2001-01-10 | 2002-07-26 | Teijin Seiki Co Ltd | マルチメディア情報記録作成提供システム |
JP2002374494A (ja) * | 2001-06-14 | 2002-12-26 | Fuji Electric Co Ltd | ビデオコンテンツファイル生成システムおよびビデオコンテンツファイル検索方法。 |
JP2003018505A (ja) * | 2001-06-29 | 2003-01-17 | Toshiba Corp | 情報再生装置および会話シーン検出方法 |
JP2003032625A (ja) * | 2002-04-26 | 2003-01-31 | Canon Inc | データ処理装置及びデータ処理方法 |
JP2003111009A (ja) * | 2001-09-28 | 2003-04-11 | Fuji Photo Film Co Ltd | 電子アルバム編集装置 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5546145A (en) * | 1994-08-30 | 1996-08-13 | Eastman Kodak Company | Camera on-board voice recognition |
US5835667A (en) * | 1994-10-14 | 1998-11-10 | Carnegie Mellon University | Method and apparatus for creating a searchable digital video library and a system and method of using such a library |
DE19645716A1 (de) * | 1995-11-06 | 1997-05-07 | Ricoh Kk | Digitale Einzelbild-Videokamera |
US6336093B2 (en) * | 1998-01-16 | 2002-01-01 | Avid Technology, Inc. | Apparatus and method using speech recognition and scripts to capture author and playback synchronized audio and video |
JP2000069442A (ja) * | 1998-08-24 | 2000-03-03 | Sharp Corp | 動画システム |
GB2354105A (en) * | 1999-09-08 | 2001-03-14 | Sony Uk Ltd | System and method for navigating source content |
GB2359918A (en) * | 2000-03-01 | 2001-09-05 | Sony Uk Ltd | Audio and/or video generation apparatus having a metadata generator |
US7051048B2 (en) * | 2000-09-29 | 2006-05-23 | Canon Kabushiki Kaisha | Data management system, data management method, and program |
JP2002157112A (ja) * | 2000-11-20 | 2002-05-31 | Teac Corp | 音声情報変換装置 |
-
2003
- 2003-06-23 WO PCT/JP2003/007908 patent/WO2004002144A1/ja active Application Filing
- 2003-06-23 US US10/519,089 patent/US20050228665A1/en not_active Abandoned
- 2003-06-23 EP EP03733537A patent/EP1536638A4/en not_active Withdrawn
- 2003-06-23 CN CN038149028A patent/CN1663249A/zh active Pending
- 2003-06-23 MX MXPA04012865A patent/MXPA04012865A/es unknown
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07184160A (ja) * | 1993-12-24 | 1995-07-21 | Canon Inc | 画像データ及び音声データを処理する装置 |
JPH09130736A (ja) * | 1995-11-02 | 1997-05-16 | Sony Corp | 撮像装置及び編集装置 |
JPH09149365A (ja) * | 1995-11-20 | 1997-06-06 | Ricoh Co Ltd | デジタルスチルビデオカメラ |
JP2000078530A (ja) * | 1998-08-28 | 2000-03-14 | Nec Corp | 情報記録装置および情報記録方法、並びに記録媒体 |
JP2000306365A (ja) * | 1999-04-16 | 2000-11-02 | Sony Corp | 編集支援システム及び編集支援システムの制御装置 |
JP2002171481A (ja) * | 2000-12-04 | 2002-06-14 | Ricoh Co Ltd | 映像処理装置 |
JP2002207753A (ja) * | 2001-01-10 | 2002-07-26 | Teijin Seiki Co Ltd | マルチメディア情報記録作成提供システム |
JP2002374494A (ja) * | 2001-06-14 | 2002-12-26 | Fuji Electric Co Ltd | ビデオコンテンツファイル生成システムおよびビデオコンテンツファイル検索方法。 |
JP2003018505A (ja) * | 2001-06-29 | 2003-01-17 | Toshiba Corp | 情報再生装置および会話シーン検出方法 |
JP2003111009A (ja) * | 2001-09-28 | 2003-04-11 | Fuji Photo Film Co Ltd | 電子アルバム編集装置 |
JP2003032625A (ja) * | 2002-04-26 | 2003-01-31 | Canon Inc | データ処理装置及びデータ処理方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1536638A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018041183A (ja) * | 2016-09-06 | 2018-03-15 | 株式会社日立ビルシステム | 保全作業管理システム及び保全作業管理装置 |
Also Published As
Publication number | Publication date |
---|---|
EP1536638A4 (en) | 2005-11-09 |
US20050228665A1 (en) | 2005-10-13 |
MXPA04012865A (es) | 2005-03-31 |
WO2004002144B1 (ja) | 2004-04-08 |
CN1663249A (zh) | 2005-08-31 |
EP1536638A1 (en) | 2005-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2004002144A1 (ja) | メタデータ作成装置、その作成方法および検索装置 | |
US7831598B2 (en) | Data recording and reproducing apparatus and method of generating metadata | |
JP4175390B2 (ja) | 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム | |
JP4794740B2 (ja) | オーディオ/ビデオ信号生成装置、及びオーディオ/ビデオ信号生成方法 | |
CN101202864B (zh) | 动画再现装置 | |
JP2007082088A (ja) | コンテンツとメタデータを記録、再生する装置、コンテンツ処理装置、プログラム | |
WO2007000949A1 (ja) | 再生開始位置制御付きコンテンツ再生方法および装置 | |
JP2001028722A (ja) | 動画像管理装置及び動画像管理システム | |
US8255395B2 (en) | Multimedia data recording method and apparatus for automatically generating/updating metadata | |
JP3781715B2 (ja) | メタデータ制作装置及び検索装置 | |
JP4192703B2 (ja) | コンテンツ処理装置、コンテンツ処理方法及びプログラム | |
JP2007052626A (ja) | メタデータ入力装置およびコンテンツ処理装置 | |
JP2004023661A (ja) | 記録情報処理方法、記録媒体及び記録情報処理装置 | |
JP3166725B2 (ja) | 情報記録装置および情報記録方法、並びに記録媒体 | |
US7444068B2 (en) | System and method of manual indexing of image data | |
JP2009283020A (ja) | 記録装置、再生装置、及びプログラム | |
US7873637B2 (en) | Automatically imparting an index by using various kinds of control signals | |
JP2000222381A (ja) | アルバム作成方法および情報処理装置および情報出力装置 | |
JP2014235301A (ja) | ジェスチャーによるコマンド入力識別システム | |
JP2006101324A (ja) | 記録再生装置および記録再生方法 | |
JP2002324071A (ja) | コンテンツ検索システム、コンテンツ検索方法 | |
JP2000333125A (ja) | 編集装置及び記録装置 | |
CN115699723A (zh) | 影像编辑装置、影像编辑方法以及计算机程序 | |
JP2004153765A (ja) | メタデータ制作装置及び制作方法 | |
JP2001136482A (ja) | 映像音声記録再生装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN MX US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
B | Later publication of amended claims |
Effective date: 20031230 |
|
WWE | Wipo information: entry into national phase |
Ref document number: PA/a/2004/012865 Country of ref document: MX |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10519089 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038149028 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003733537 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2003733537 Country of ref document: EP |