WO2012049945A1 - 番組検索装置および番組検索方法 - Google Patents
番組検索装置および番組検索方法 Download PDFInfo
- Publication number
- WO2012049945A1 WO2012049945A1 PCT/JP2011/071091 JP2011071091W WO2012049945A1 WO 2012049945 A1 WO2012049945 A1 WO 2012049945A1 JP 2011071091 W JP2011071091 W JP 2011071091W WO 2012049945 A1 WO2012049945 A1 WO 2012049945A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- program
- data
- morphemes
- morpheme
- unit
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 63
- 238000012545 processing Methods 0.000 claims abstract description 59
- 230000033228 biological regulation Effects 0.000 claims description 13
- 239000000284 extract Substances 0.000 abstract description 13
- 238000001914 filtration Methods 0.000 description 60
- 238000004891 communication Methods 0.000 description 26
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 14
- 230000008569 process Effects 0.000 description 14
- 238000004458 analytical method Methods 0.000 description 7
- 230000014509 gene expression Effects 0.000 description 4
- 235000016496 Panda oleosa Nutrition 0.000 description 2
- 240000000220 Panda oleosa Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000005401 electroluminescence Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 239000000470 constituent Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/432—Content retrieval operation from a local storage medium, e.g. hard-disk
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/432—Content retrieval operation from a local storage medium, e.g. hard-disk
- H04N21/4325—Content retrieval operation from a local storage medium, e.g. hard-disk by playing back content from the storage medium
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4828—End-user interface for program selection for searching program descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8126—Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
- H04N21/8133—Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts specifically related to the content, e.g. biography of the actors in a movie, detailed information about an article seen in a video program
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/82—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
- H04N9/8205—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
- H04N9/8233—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal the additional signal being a character code signal
Definitions
- the present invention relates to a program search apparatus and a program search method for processing text data according to an arbitrary procedure and using it as a search index data.
- an arbitrary character string and a sentence explaining the character string are stored in the database device in association with each other based on dictionary data, etc., and when subtitle data of a program is acquired, the subtitle data is divided into morphemes.
- a technique is known in which the morpheme is compared with a character string held in a database device and only the matching morpheme is highlighted (for example, Patent Document 1). The user can refer to the text explaining the morpheme stored in the database device only by selecting the highlighted morpheme.
- caption data and program information are often broadcast together with a program as a program stream, and the above-described technique for using caption data and program information as index data for search is used. It is expected to be used more effectively.
- caption data is not always added to a program. For example, information such as news and live broadcasts whose contents cannot be presented in advance is not included in the caption data or is limited information such as a title even if included.
- programs including subtitle data (added) and programs not including subtitle data (not added) are mixed.
- programs with index data and programs without index data are mixed.
- the latter program a program to which no caption data is added
- the desired program or the predetermined scene in the program may not be appropriately extracted.
- an object of the present invention is to provide a program search device and a program search method capable of appropriately searching for a program and a predetermined scene in the program regardless of the presence of subtitle data. It is said.
- the present invention provides the following program search apparatus and program search method.
- a table holding unit that holds an allowed word table in which a plurality of morphemes and the number of appearances thereof are associated, a program stream acquisition unit that acquires a program stream generated in accordance with the broadcast ethics rules, and the acquired
- the program stream includes program information that is subtitle data or first text data related to the content of the program, the subtitle data or the program information is extracted from the program stream, divided into morphemes, and the divided If the morpheme is not in the permitted word table, register the morpheme in the permitted word table, and if the divided morpheme is in the permitted word table, a table update unit that updates the number of appearances corresponding to the morpheme;
- a program holding unit for holding a program included in the acquired program stream, and a program related to the held program
- the data acquisition unit for associating the acquisition date and time information, and the second text data is divided into morphemes, and the divided morphemes are not registered in
- program search apparatus characterized by and a program extracting section for extracting a predetermined scene of said program holding unit programs or the program held in the.
- a table holding unit that holds an allowed word table in which a plurality of morphemes and their appearance counts are associated with each other, and program information that is first text data related to the contents of the program generated in accordance with the broadcast ethics rules.
- the program information acquisition unit to be acquired, and the program information is divided into morphemes. If the divided morpheme is not in the permitted word table, the morpheme is registered in the permitted word table, and the divided morpheme is the permitted word table.
- a table updating unit for updating the number of appearances corresponding to the morpheme, a program holding unit for holding a program included in the acquired program stream, and acquiring second text data relating to the held program
- the data acquisition unit for associating the acquisition date information and the second text data are divided into morphemes, and the divided morphemes If the morpheme is not registered in the permitted word table, or if the divided morpheme is registered in the permitted word table but the number of appearances corresponding to the morpheme is less than a predetermined first threshold, the morpheme Is replaced with a predetermined symbol and recombined as third text data, the third text data recombined with the retained program, and the third text data
- An index assigning unit that assigns a set of the acquired date and time information associated with the corresponding second text data as index data, and the program holding unit based on a keyword input for search and the index data
- a program extraction unit for extracting a program held in the program or a predetermined
- the index assigning unit assigns the subtitle data to the program as index data, and if the subtitle data is not added to the program, or If the subtitle data is not added, the program search device according to (1) or (2), wherein the recombined third text data is added to the program as index data. .
- the index assigning unit may determine that no caption data is added to the program if a caption rate, which is the number of caption data per second, is less than a predetermined second threshold value. The program search device according to (3) above.
- the program stream When a program stream generated in accordance with the broadcast ethics regulations is acquired and the acquired program stream includes program information that is first text data related to caption data or program content, the program stream If the subtitle data or the program information is extracted from and divided into morphemes, and the divided morphemes are not in a permitted word table in which a plurality of morphemes and their appearance counts are associated with each other, the morphemes are extracted from the permitted word table.
- the divided morpheme is in the permission word table, the number of appearances corresponding to the morpheme is updated, the program included in the acquired program stream is held in the program holding unit, and the held program Acquire second text data, associate acquisition date and time information, and divide the second text data into morphemes
- the divided morpheme is not registered in the permitted word table, or the divided morpheme is registered in the permitted word table, but the number of appearances corresponding to the morpheme is less than a predetermined first threshold. If so, the morpheme is replaced with a predetermined symbol, recombined as third text data, and the recombined third text data and the third text data are stored in the retained program.
- a program search method comprising extracting a predetermined program or a predetermined scene in the program.
- the morpheme is registered in the permitted word table, and if the divided morpheme is in the permitted word table, the number of occurrences corresponding to the morpheme is updated,
- the program included in the acquired program stream is held in the program holding unit, the second text data related to the held program is acquired, the acquisition date / time information is associated, the second text data is divided into morphemes, and divided The morpheme is not registered in the permitted word table, or the divided morpheme is registered in the permitted word table.
- a program search method comprising: extracting a program held in the program holding unit or a predetermined scene in the program based on the keyword and the index data.
- a filtering apparatus and a filtering method for appropriately filtering arbitrary text data will be described as a first embodiment, and a program and its program will be described as a second embodiment using the filtering technique in the first embodiment.
- a program search apparatus and a program search method for appropriately searching for a predetermined scene will be described. Both embodiments are at least common with respect to filtering techniques.
- a prohibited word table in which words (prohibited words) contrary to public order and morals, which should not be used for services, is used as a table is often used.
- the service provider refers to the prohibited word table, and performs filtering such as removing a word corresponding to the prohibited word from the posted data posted on the electronic bulletin board.
- filtering to remove this prohibited word, change the prohibited word to another Kanji (character) or insert a space or a symbol between the characters to add “fluctuation” to the word and make it a prohibited word. By not matching, filtering can be easily avoided.
- prohibited word table in which prohibited words are tabulated
- permitted word table in which permitted words (permitted words) are tabulated. It is sufficient to leave only words and sentences that are not in conflict. However, words such as persons and buildings are newly appearing every day, and in order to prevent such permission words from being excluded by filtering, it is necessary to increase the frequency of updating the permission word table.
- the required number of words in the permitted word table is significantly larger than the prohibited word table.
- the number of prohibited words generated in one month is about 4000 words, whereas the number of permitted words is about 400.
- the number of words is 10,000, and the distribution and updating of the word table will require enormous costs. For this reason, it is not realistic to use the permission word table.
- a filtering device and a filtering method for automatically forming a permission word table for filtering using a program providing system such as television broadcasting will be described.
- FIG. 1 is an explanatory diagram showing a schematic connection relationship of the program providing system 100 according to the first embodiment.
- the program providing system 100 includes a program providing device 110, a filtering device 120, a display device 130, and a service providing server 140.
- the program providing apparatus 110 includes a broadcasting station 112 and a program providing server 114, and distributes a program stream.
- the program stream includes various information related to the program as additional data.
- the filtering device 120 is transmitted from the broadcasting station 112 as the program providing device 110 through the antenna 122 and from the program providing server 114 as the program providing device 110 through the communication network 124 such as the Internet. Receive program streams of various programs such as cable television broadcasting, IP broadcasting, and video on demand. And the filtering apparatus 120 produces
- the display device 130 is configured by a liquid crystal display, an organic EL (Electro Luminescence) display, a cinema screen, a projector (projector), and the like, and displays programs received by the filtering device 120 and filtered text data.
- a liquid crystal display an organic EL (Electro Luminescence) display
- a cinema screen a projector (projector)
- projector projector
- the service providing server 140 is a server operated by a service provider, and provides various services such as an electronic bulletin board for posting posted data by a third party to an information terminal, a filtering device 120, and the like possessed by the third party.
- the filtering device 120 constituting the program providing system 100 of the present embodiment is intended to appropriately filter text data.
- each function part which comprises the filtering apparatus 120 is demonstrated, and the filtering method using the filtering apparatus 120 is explained in full detail after that.
- FIG. 2 is a functional block diagram illustrating a schematic configuration of the filtering device 120.
- the filtering apparatus 120 includes an operation unit 150, a tuner unit 152, a communication unit 154, a DEMUX (DEMUltipleXer) unit 156, an AV decoding unit 158, a table holding unit 160, and a central control unit 162.
- the tuner unit 152, the communication unit 154, and the DEMUX unit 156 function as a program stream acquisition unit that acquires a program stream.
- the flow of data is represented by solid arrows
- the flow of control signals is represented by broken arrows.
- the operation unit 150 includes operation keys, a cross key, a joystick, a jog dial, a touch panel, and the like, and accepts user operation inputs.
- the tuner unit 152 receives a broadcast signal from the broadcast station 112 via the antenna 122, demodulates the broadcast signal according to the channel number set through the operation unit 150, and generates a program stream.
- the communication unit 154 establishes communication with the program providing server 114 via the communication network 124, and is distributed by the program providing server 114 like the tuner unit 152 using an Internet protocol similar to HTTP (HyperText Transfer Protocol). IP streaming corresponding to the broadcast signal is acquired in packet units, and the IP streaming is restored according to the time stamp to generate a program stream.
- the communication unit 154 can also establish communication with the service providing server 140.
- the DEMUX unit 156 divides the program stream into a plurality of data such as video data (MPEG (Moving Picture Experts Group) video stream), audio data (MPEG audio stream), caption data, time data, and program information.
- video data MPEG (Moving Picture Experts Group) video stream
- audio data MPEG audio stream
- caption data time data
- program information program information
- the AV decoding unit 158 acquires video data and audio data from the DEMUX unit 156, decodes the video data and audio data, and outputs the decoded video signal to the display device 130.
- the audio signal is output to an audio output device such as a speaker (not shown).
- the table holding unit 160 is configured by a storage medium such as a flash memory or an HDD (Hard Disk Drive), and holds a permission word table in which a plurality of morphemes are associated with the number of appearances.
- a storage medium such as a flash memory or an HDD (Hard Disk Drive)
- HDD Hard Disk Drive
- the HDD is precisely a device, but in the present description, it is treated as synonymous with other storage media for convenience of explanation.
- the central control unit 162 manages and controls the entire filtering device 120 by a semiconductor integrated circuit including a central processing unit (CPU), a ROM storing programs, a RAM as a work area, and the like.
- the central control unit 162 also functions as a table update unit 180, a data acquisition unit 182, a data processing unit 184, and a display control unit 186.
- the table update unit 180 One or both of subtitle data and program information is extracted from, and divided into morphemes. Then, if the divided morpheme is not in the permitted word table described later, the table updating unit 180 registers the morpheme, and if the divided morpheme is in the permitted word table, the table updating unit 180 updates the number of appearances corresponding to the morpheme.
- subtitle data refers to text data for displaying information such as a title, cast, commentary, and conversation using characters in video media such as movies and television.
- the program information includes channel number, service ID, event ID, program start time, program end time, program name, program commentary information, program performer and staff information, information on the theme song, program genre, etc. Contains various information about the contents of the program.
- caption data and program information is abbreviated as program additional data.
- program additional data may indicate either caption data or program information.
- the table update unit 180 determines whether or not the program additional data is included in the program stream acquired via the tuner unit 152 or the communication unit 154, and if the program additional data is included, The program additional data is divided into one or a plurality of morphemes using a morpheme dictionary.
- the morpheme dictionary is a dictionary format in which a large amount of sentences are totaled in advance, and the connection probability of each morpheme and the morpheme connected before and after the morpheme.
- the table updating unit 180 can divide a natural language such as Japanese without a break into morpheme units by using a morpheme dictionary.
- the table update unit 180 divides the morpheme into morphemes using character type delimiters such as kanji, alphanumeric, kana, and katakana.
- character type delimiters such as kanji, alphanumeric, kana, and katakana.
- a morpheme analysis engine that divides into morphemes, it is also possible to use a technique of estimating natural language “sharing” by a statistical method and dividing it into morpheme units. Note that the details of the algorithm for dividing into morphemes using the morpheme dictionary are well-known techniques, and thus the description thereof is omitted.
- the table updating unit 180 registers each divided morpheme in the permitted word table, or updates the number of appearances of the registered morpheme.
- FIG. 3 is an explanatory diagram for explaining the permitted word table 200.
- the permitted word table 200 has a table structure in which the previous concatenated morpheme pword, the main morpheme word, and the number of appearances wnum are uniquely associated.
- the previous concatenated morpheme pword is a morpheme located in front of the main morpheme word in the divided morpheme string. If the main morpheme word is the first morpheme of the sentence, it becomes a null value (NULL).
- NULL null value
- the main morpheme word is a morpheme that is a main keyword, and a null value is not allowed.
- the table updating unit 180 when the sentence is “received the prime minister's sentence”, the table updating unit 180 generates the record 202 in which “primary” is the main morpheme word and the previous connected morpheme pword is “NULL”. A record in which the main morpheme word is “NULL” is not generated with “receive” as the previous concatenated morpheme pword.
- the number of appearances wnum is the number of times the combination of the pre-joined main morpheme pword and the main morpheme word appears in the program additional data, and is represented by an integer of 1 or more.
- the table updating unit 180 registers the combination of the two morphemes before and after the combination of the two morphemes in the permitted word table 200 if the combination of the two morphemes before and after the combination is not in the permitted word table 200.
- the number of appearances corresponding to the combination is incremented (+1). Therefore, in the permitted word table 200, the combination of the previous concatenated morpheme pword and the main morpheme word is unique.
- SQL Structured Query Language
- the permission word table 200 is generated using the program additional data included in the program stream, the following effects can be obtained. That is, the program and the program additional data are generated in accordance with the broadcasting ethics regulations.
- the Broadcast Ethics Code for example, is stipulated in the Basic Code of Broadcast Ethics as “Use proper language and try to express quality expressions at the same time.” Program additional data generated in accordance with the Broadcast Ethics Code is contrary to public order and morals. Does not contain words or sentences. Therefore, by generating the permission word table 200 based on the program additional data included in the program stream, it is possible to easily accumulate the permission words without determining whether each word corresponds to a permission word. it can.
- the program stream can be converted into the program stream within the filtering device 120 without constructing a new system for distributing the allowed word table 200 having a large data capacity to the information terminals of the respective users.
- the permission word table 200 can be updated at any time simply by extracting the included program additional data. Therefore, it is possible to construct a system that can update the permitted word table 200 at any time with a minimum maintenance cost.
- a third party can permit the permitted word table 200 when distributing the permitted word table 200 to the information terminal.
- the risk of falsification remains.
- the permitted word table 200 is updated in a closed space in the filtering device 120, the risk of such tampering can be minimized.
- program additional data included in the program stream acquired mainly through the tuner unit 152 is employed for such purposes.
- program additional data of a program stream acquired from the program providing server 114 that performs broadcasting, video on demand, or the like can be employed.
- the above-described program information can be directly acquired from a server (not shown) managed by such a service provider. If the program information conforms to the broadcasting ethics regulations, the program information is used in this embodiment. Can be adopted.
- the communication unit 154 functions as a program information acquisition unit that acquires program information
- the table update unit 180 divides the program information acquired by the communication unit 154 as the program information acquisition unit into morphemes, and permits word tables. 200.
- program additional data that is, caption data and program information is extracted from the program stream and reflected in the permission word table 200
- the program information acquired through the communication unit 154 is also included.
- the permission word table 200 of this embodiment can be used.
- the data acquisition unit 182 acquires arbitrary text data (second text data) from the service providing server 140 through the communication unit 154, and also acquires an acquisition date and time indicating the date and time when the arbitrary text data is generated, posted, or acquired. Associate information with arbitrary text data. For example, if there is a service providing server 140 that publishes post data relating to a program broadcast on an arbitrary broadcasting station 112 as an electronic bulletin board, the data acquisition unit 182 acquires post data from the electronic bulletin board, and acquires the acquisition date and time. As information, the date and time when the posting was made is associated with the posting data.
- an unspecified number of contributors can relay a series of programs broadcast on a specific broadcasting station 112 via a communication network 124. Posting data is posted almost in real time.
- the data acquisition unit 182 acquires post data from an electronic bulletin board provided exclusively for such an arbitrary broadcasting station 112.
- the data acquisition unit 182 may specify the title of a thread related to an arbitrary broadcast station 112 on the post-dedicated site and acquire the post data.
- the data acquisition unit 182 may acquire post data through such a site.
- the posted data acquired by the data acquisition unit 182 is displayed on the display device 130 together with the program stream program acquired by the program stream acquisition unit, which is the posting target.
- the user can view opinions and explanations about the program almost in real time.
- post data may be acquired in the same manner as described above for a program stream program transmitted from the program information providing server 114.
- the program stream program transmitted by the program information providing server 114 is retransmitted at almost the same time as the program broadcast from the broadcasting station 112 by terrestrial digital broadcasting, BS / CS digital broadcasting, cable television broadcasting, or the like. It is limited to the program to be done.
- the data processing unit 184 filters the text data (second text data) acquired by the data acquisition unit 182 to generate new text data (third text data). For example, as described above, when the data acquisition unit 182 acquires post data from the service providing server 140, the data processing unit 184 generates new post data by filtering the post data.
- the data processing unit 184 first divides the text data (second text data) acquired by the data acquisition unit 182 into morphemes using the morpheme dictionary described above. Then, the data processing unit 184 determines whether or not the divided morpheme (precisely, a combination of two morphemes) is registered in the permitted word table 200, and regarding the morphemes registered in the permitted word table 200, It is determined whether the number of appearances is greater than or equal to a predetermined first threshold value ⁇ .
- the data processing unit In step 184, the morpheme is replaced with one or more predetermined symbols, and the divided morpheme is recombined as text data (third text data). Therefore, only the morphemes registered in the permitted word table 200 remain in the newly generated text data.
- the display control unit 186 renders the text data processed by the data processing unit 184 into a text subtitle-like image, and causes the display device 130 to display the rendered image.
- FIG. 4 is an explanatory diagram showing an example of rendering post data.
- the data acquisition unit 182 acquires post data (second text data) from the service providing server 140
- the post data third text data
- the data processing unit 184 Is displayed in the posted data area 212 provided below the program display area 210 in the display device 130, the user can view the posted data in parallel with the program.
- the posted data being browsed at this time is filtered by the data processing unit 184 and therefore does not include words or sentences that are contrary to public order and morals. Therefore, even if it is a minor, it becomes possible to view the posted data without any problem.
- FIG. 5 is a flowchart illustrating the processing flow of the filtering method.
- FIG. 5 illustrates processing for generating the permitted word table 200 in the filtering method.
- the table updating unit 180 acquires the text body of the program additional data from the DEMUX unit 156 (S302), and lexical analysis of the text body. And replace one or more punctuation marks, line feeds, symbols, and external characters (characters other than predetermined kanji, alphanumeric characters, kana, and katakana) with special symbols (eg, “ ⁇ ”) in the text body (S304). .
- special symbols eg, “ ⁇ ”
- the table update unit 180 performs lexical analysis and performs processing for replacing punctuation marks with special symbols, so that the allowed word table 200 is wasted due to symbols and blanks used for the layout specific to the program additional data. Registration of morphemes can be avoided, and only morphemes necessary for the search can be accumulated.
- the table updating unit 180 divides the text body with the punctuation marks and the like replaced into morphemes using the morpheme dictionary (S306).
- the replaced special symbol is used as a delimiter between morphemes.
- FIG. 6 is an explanatory diagram for explaining processing of the table update unit 180.
- a line feed character in the text body is represented by (new line) and a space character is represented by (blank).
- the table update unit 180 selects “>>”, “,”, “ ”, (Line feed), (blank), etc. are collectively replaced with a special symbol“ ⁇ ”and further broken down into morphemes to form a morpheme string as shown in FIG.
- a symbol [/] is inserted between morphemes, but it is not treated as a symbol that actually exists.
- the table updating unit 180 initializes the pre-joined morpheme variable PREV (substitutes the null value NULL) (S308), and whether or not there remains a morpheme (morpheme string) that is not registered in the permitted word table 200. If it is determined whether or not it remains (NO in S310), the process of generating the permitted word table 200 is terminated. If there is still a morpheme for which registration determination has not been made (YES in S310), the table update unit 180 extracts one morpheme at the head of the morpheme string for which registration determination has not been made in the permitted word table 200. The morpheme variable WORD is substituted, and the target morpheme is deleted from the morpheme string (S312).
- PREV substitutes the null value NULL
- the table updating unit 180 determines whether or not the morpheme variable WORD is the special symbol “ ⁇ ” (S314). If the morpheme variable WORD is the special symbol (YES in S314), the table updating unit 180 repeats from the previous connected morpheme variable initialization step S308. .
- the table updating unit 180 determines that the combination of the pre-connected morpheme variable PREV and the morpheme variable WORD is a combination of the pre-connected morpheme word pword and the main morpheme word of the allowed word table 200. It is determined whether or not it exists (S316). If it exists (YES in S316), the number of appearances wnum corresponding to the previous connected morpheme pword and main morpheme word is incremented (S318), and if it does not exist (NO in S316).
- the combination of the previous connected morpheme variable PREV and the morpheme variable WORD is added to the permission word table 200 as a new record of the previous connected morpheme word and main morpheme word, and the corresponding number of appearances wnum is set to 1 (S320). .
- the table updating unit 180 substitutes the value of the morpheme variable WORD for the previous connected morpheme variable PREV (S322), and repeats from the remaining morpheme determination step S310.
- the permitted word table 200 shown in FIG. 3 is generated based on the morpheme string shown in FIG.
- the divided morphemes are not included in the morpheme dictionary, they can be registered in the permitted word table 200 and the number of appearances can be counted.
- the permission word table 200 generated as described above stores the appearance of connection between two morphemes included in the program additional data and the number of appearances thereof.
- the aspect of such connection closely reflects the generation characteristics of the program additional data in the broadcasting station 112 in the area where the user lives and the broadcasting station 112 that the user exclusively views. According to your taste.
- the two concatenations of the pre-connected morpheme pword and the main morpheme word are determined because the character strings that are contrary to public order and morals are excluded by connecting morphemes that are not contrary to public order and morals. It is to do. For example, even if a character string “outside the base” means “outside the base”, it is a violation of public order and morals depending on how it is read. At this time, when the data processing unit 184 determines “base” and “outside” independently, the character string “outside base” may not be excluded. Under the broadcasting ethics regulations, the expression “outside the base” is not used, but the expression “outside the base” is used. Therefore, in the permission word table 200, “base” “no” or “no” “outside” The connected morphemes such as “” are registered, and it becomes possible to eliminate the character string “out of base”.
- the registration determination of the permitted word table 200 may be performed with some symbols included in the text body remaining without being replaced.
- the object of the present embodiment is to extract the combination of morphemes and the number of appearances from text data different from the source text data of the morpheme dictionary. Therefore, the table updating unit 180 may extract morphemes from not only the text body of the program additional data (caption data and program information) included in the program stream, but also other information that can be included in the program stream. Good.
- the filtering apparatus 120 may receive a program stream from a plurality of broadcast stations 112 in parallel by collecting a plurality of combinations of the tuner unit 152 and the DEMUX unit 156, and may collect more morphemes at high speed. Further, the filtering device 120 operates the function unit for generating the permitted word table 200 independently of the function unit for viewing the program, for example, receiving the program stream continuously for 24 hours, and allowing the permitted word table. 200 may be generated.
- FIG. 7 is a flowchart illustrating the processing flow of the filtering method.
- FIG. 7 illustrates a process of filtering text data using the permitted word table 200 generated in FIG. 5 among the filtering methods.
- the data acquisition unit 182 acquires time data included in the program stream of the program being viewed (S350), and a value obtained by subtracting a predetermined second (for example, 10 seconds) from the acquired time data to the start time variable STIME. And time data is set in the end time variable ETIME (S352). Then, the data acquisition unit 182 acquires a posted data group posted in the time range from the start time variable STIME to the end time variable ETIME via the communication unit 154 from the service providing server 140 (S354), and the central control unit The output buffer provided in the RAM 162 is initialized (S356).
- FIG. 8 is an explanatory diagram illustrating a post data group.
- the data acquisition unit 182 acquires time data “September 30, 2009, 17:45:40” from the DEMUX unit 156
- the time range (STIME, ETIME) (“September 30, 2009, 17:45”. : 30 ”,“ September 30, 2009, 17:45:40 ”).
- the post data whose time data is “September 30, 2009 17:45:31” and the time data are “September 30, 2009 17:45:38”. Applicable post data.
- the data processing unit 184 determines whether or not post data that has not been subjected to filtering processing remains (S358). If it is determined that there is no remaining data (NO in S358), the display control unit 186 is accumulated in the output buffer. The post data subjected to filtering is displayed on the display device 130 (S360), and the processing is terminated.
- a statement for forming the table structure of the output buffer can be described as follows using SQL. create table output_buffer ( post timestamp not null, wlist text list, UNIQUE (post) ); Such an output buffer is formed with a table structure in which the posting date / time post (acquisition date / time information) of the posting data and the morpheme string wlist are combined.
- the posting date and time post is the date and time when the posting was made, and the morpheme string wlist is a filtered morpheme string.
- the output buffer is set to be unique with respect to the posting date and time post.
- post data that has not been subjected to filtering processing remains (YES in S358)
- one post data at the head of the remaining post data group is taken out and the post date / time post is substituted into the post date / time variable POSTTIME.
- the text body of the posting source data is substituted into the text variable TEXT, and the target posting data is deleted from the posting data group (S362).
- the data processing unit 184 replaces two or more punctuation marks with a single character punctuation mark (“.”, “.”, “,”, “,”, Etc.) for the text variable TEXT, and also includes line breaks, symbols, and blanks.
- the data processing unit 184 initializes the previous connected morpheme variable PREV (substitutes the null value NULL) (S368), determines whether or not morpheme remains in the target post data (S370), and remains. If it is determined that there is not (NO in S370), the process repeats from post data remaining determination step S358 to determine new post data.
- PREV substitutes the null value NULL
- the data processing unit 184 extracts one morpheme from the head of the morpheme string in the text body of the post data and assigns it to the morpheme variable WORD (S372). . Then, the data processing unit 184 determines whether or not the morpheme variable WORD is a punctuation mark or a blank (S374), and if it is a punctuation mark or a blank (YES in S374), the process proceeds to a time determination step S382.
- the data processing unit 184 indicates that the pre-connected morpheme word pword is equal to the value of the pre-connected morpheme variable PREV and the main morpheme word is morpheme variable. It is determined whether or not there is a record that is equal to the value of WORD, and if it exists, whether or not the number of occurrences wnum is equal to or greater than the first threshold value ⁇ (S376).
- the data processing unit 184 initializes the pre-connected morpheme variable PREV when there is no matching morpheme combination, or when the number of appearances wnum is less than the first threshold value ⁇ (NO in S376). Then, the morpheme variable WORD is replaced with a special symbol “ ⁇ ” which means a fuzzy character (S378).
- the data processing unit 184 replaces the combination of morphemes whose appearance count wnum is less than the first threshold value ⁇ with a special symbol. If the appearance count wnum is less than the first threshold value ⁇ , This is because it cannot be said that it has appeared sufficiently, and is not suitable as a permission word combining the morphemes.
- FIG. 9 is an explanatory diagram for explaining the processing of the data processing unit 184.
- the data processing unit 184 designates “D” corresponding to the morpheme variable WORD among the morpheme as a special symbol “ ⁇ ”.
- the symbol [/] is inserted between morphemes, but it is not treated as a symbol that actually exists.
- the data processing unit 184 sets the pre-connected morpheme variable PREV.
- the value of the morpheme variable WORD is substituted (S380). Then, the data processing unit 184 determines whether or not there is a record in the output buffer whose post date and time variable POSTTIME matches the post date and time post (S382).
- the data processing unit 184 adds a new record in which the posting date and time and the morpheme string wlist are respectively the previously connected morpheme variable POSTTIME and the morpheme variable WORD (S386), and the morpheme remaining It repeats from determination step S370.
- the first threshold value ⁇ is set to 1 for easy understanding. However, it goes without saying that the first threshold value ⁇ can be appropriately changed depending on the application.
- the presence determination step S376 may be executed using the appearance probability obtained by the following expression 1 instead of the appearance count wnum itself. Wnum value of the corresponding record / total value of wnum of all records (Formula 1) With this configuration, the data processing unit 184 can execute the presence determination step S376 based on the ratio of the permitted word table 200 to the population. Therefore, even if an arbitrary morpheme is a permission word when the population is small, if the number of appearances is not updated after that, the probability of appearance decreases as the population grows and is excluded from the permission word There is also. In this way, it is possible to automatically exclude morphemes whose appearance frequency has decreased.
- the filtering device 120 uses the permission word table 200 different from the morpheme dictionary, and uses the combination and the number of appearances of the morpheme acquired from the program additional data included in the program stream. It is possible to appropriately change post data that includes words contrary to the above to post data that does not include such words.
- the permission word table 200 strongly reflects the generation characteristics of the program additional data in the broadcasting station 112 in the area where the user lives and the broadcasting station 112 that the user exclusively views. For this reason, the permitted word table 200 is in accordance with regional characteristics and user preferences, and as a result, the filtered post data easily retains words according to regional characteristics and user preferences.
- the post data is not limited to post data, and is stored in various data displayed on a web browser or a storage medium. It is also possible to filter various text data such as data.
- FIG. 10 is an explanatory diagram showing a schematic connection relationship of the program providing system 400 according to the second embodiment.
- the program providing system 400 includes a program providing device 110, a program search device 420, a display device 130, and a service providing server 140.
- the program providing apparatus 110, the display apparatus 130, and the service providing server 140 are substantially the same in operation as the program providing apparatus 110, the display apparatus 130, and the service providing server 140 described in the first embodiment. The description is omitted.
- the program search device 420 communicates from the broadcasting station 112 as the program providing device 110 through the antenna 122 and from the program providing server 114 as the program providing device 110 to the communication such as the Internet.
- the network 124 a program word stream of various programs such as terrestrial digital broadcasting, BS / CS digital broadcasting, cable television broadcasting, IP broadcasting, video on demand, etc. is received, and a permission word table 200 for performing filtering is generated.
- the program search device 420 holds the program, generates index data of the program using the permission word table 200, and assigns it to the held program.
- the program search device 420 quickly extracts a program desired by the user or a predetermined scene in the program based on the index data.
- Program search device 420 In a configuration in which a plurality of programs are stored and the stored program is viewed afterwards (for example, HDR: Hard Disk Recorder), if the program stream includes caption data, the caption data is used as index data for each program. By associating, the HDR can quickly present a program desired by the user based on the index data.
- the program stream does not always include subtitle data, and for example, the subtitle data is not included or included for those that cannot present the broadcast contents in advance, such as news and live broadcasts. It is also limited information such as titles. Then, depending on the program, an index data is associated and an unrelated index data is generated.
- the program search device 420 attempts to obtain information corresponding to index data from a route other than broadcasting for a program stream that does not include caption data and associate the information with the program as index data.
- the service providing server 140 that publishes post data relating to a program broadcast on an arbitrary broadcasting station 112 as an electronic bulletin board described in the first embodiment is suitable.
- the program search device 420 compares the viewing time of the program with the posting date and time of the posted data, regards the posted data with the matching date and time as being related to the corresponding program, and uses the posted data as index data. Use.
- index data is generated as it is using post data, all kinds of text data including words and sentences that are offensive to public order and morals will be associated as index data, and the capacity of the index data will become enormous and search processing will be performed. Will cause a delay.
- search hit rate increases due to the increase in the index data, but in reality, there are many inappropriate index data for search, such as meaningless text data by ASCII art, and the hit rate Is not necessarily high.
- a hit character or the like corresponding to fluctuation is registered as index data, it not only functions as index data for the program, but also gets caught in a search for other unintended programs, resulting in a decrease in search accuracy.
- the amount and quality of index data differs between a program associated with a large amount of index data and a program associated with index data based on caption data. Cannot properly extract the desired program.
- FIG. 11 is a functional block diagram showing a schematic configuration of the program search device 420.
- the program search device 420 includes an operation unit 150, a tuner unit 152, a communication unit 154, a DEMUX unit 156, an AV decoding unit 158, a table holding unit 160, a central control unit 462, a program holding unit 464, A program information holding unit 466, an RTC (Real Time Clock) unit 468, and an index holding unit 470 are configured.
- the tuner unit 152, the communication unit 154, and the DEMUX unit 156 function as a program stream acquisition unit that acquires a program stream.
- the central control unit 462 includes a table update unit 180, a data acquisition unit 482, a data processing unit 184, a display control unit 186, a program storage control unit 488, a program information storage control unit 490, an index assignment unit 492, and a program extraction unit 494. Also works.
- the central control unit 462, the program holding unit 464, the program information holding unit 466, the RTC having different configurations are described.
- the section 468, the index holding section 470, the data acquisition section 482, the program storage control section 488, the program information storage control section 490, the index assignment section 492, and the program extraction section 494 will be mainly described.
- the program storage control unit 488 holds the program in the program holding unit 464 in a form that can be searched by the channel number and time data.
- the program holding unit 464 includes a storage medium such as a flash memory and an HDD, and holds one or a plurality of programs.
- a storage medium such as a flash memory and an HDD
- an optical disk medium such as a DVD (Digital Versatile Disc) or a BD (Blu-ray Disc)
- a magnetic medium such as a magnetic tape or a magnetic disk
- a flash memory or a portable HDD that can be detached from the program search device 420.
- An external storage medium such as the above may be applied.
- the program holding unit 464 is a randomly accessible file system, and other function units read the video data, audio data, and subtitle data held in the program holding unit 464 by designating an arbitrary time range. Can do.
- the random access method is an existing technology, it will not be described in detail. For example, a program is divided and stored every hour, and the file name of the divided file is “27CH_September 30, 2009 17 By making the name including the channel number and the storage start time, such as “00: 00.TS”, rough random access is possible.
- random access of an arbitrary scene in a program can be performed by obtaining a file offset (byte) at an arbitrary reproduction time.
- a file offset byte
- the total file size (bytes) per hour is TOTAL
- the absolute playback time of an arbitrary scene is T1
- the absolute time at the beginning of the file obtained from the file name is T0
- the file offset is expressed by the following equation (2). I want. TOTAL / 3600 ⁇ (T1-T0) (Formula 2)
- T1-T0 the result of (T1-T0) is used in terms of seconds.
- the program information storage control unit 490 extracts program information from the program stream when the program stream is acquired via the tuner unit 152 or the communication unit 154 as the program stream acquisition unit, and the program information
- the program information holding unit 466 holds the table as a table.
- a command statement for generating such a program information table is expressed using SQL, it can be described as follows. create table epg_table ( phych integer not null, serviceid integer not null, eventid integer not null, sttime timestamp not null, edtime timestamp not null, title text not null, capflg integer not null, UNIQUE (serviceid, eventid, sttime) );
- the program information includes at least a channel number phych, a service ID: serviceid, an event ID: eventid, a program start time sttime, a program end time edtime, a program title title, and a caption flag capflg.
- the combination of the service ID: serviceid, the event ID: eventid, and the program start time sttime is unique.
- the program information storage control unit 490 can acquire information other than the caption flag capflg from the program information.
- the service ID is a unique numerical value corresponding to one or more organizations in one broadcasting station 112, and the event ID is a unique numerical value corresponding to one or more events in one organization.
- the program information storage control unit 490 registers the newly extracted program information by deleting the program information. By doing so, it is possible to exclude duplication of program frames in the same organization. Further, when newly registering program information, the program information storage control unit 490 sets the caption flag capflg of the program information to 0 (unprocessed).
- the program information holding unit 466 is configured by a storage medium such as a flash memory or an HDD, and holds a program information table in which program information included in the program stream is tabulated based on a control command from the program information storage control unit 490. Further, the program information holding unit 466 functions as an EPG database, and other functional units (for example, the index adding unit 492 and the program extracting unit 494) can use the program information table held by the program information holding unit 466 under arbitrary conditions. You can search.
- the data acquisition unit 482 acquires text data (second text data) related to the program.
- the data acquisition unit 482 sends post data (second text data) related to a program from the service providing server 140 that publishes post data related to a program broadcast on an arbitrary broadcasting station 112 as an electronic bulletin board.
- the posting date acquisition date information
- an unspecified number of contributors are relaying live broadcasting of a series of programs broadcast on a specific broadcasting station 112 via the communication network 124.
- the posting data is posted almost in real time.
- the data acquisition unit 482 acquires post data from an electronic bulletin board provided exclusively for such an arbitrary broadcasting station 112.
- the data acquisition unit 482 may specify the title of a thread related to an arbitrary broadcast station 112 on the post-dedicated site and acquire the post data.
- the data acquisition unit 482 may acquire post data through such a site.
- the data acquisition unit 482 corresponds to a web browser, establishes communication with the service providing server 140 through the communication unit 154, transmits request information including a time range and a channel number, and is included in the time range.
- Post data group (text data group) is acquired as a response.
- the data processing unit 184 divides the posting data (second text data) into morphemes.
- the data processing unit 184 does not register the divided morpheme in the permitted word table 200 or registers the morpheme in the permitted word table 200, but the number of appearances corresponding to the morpheme is predetermined. If it is less than one threshold value ⁇ , the morpheme is replaced with one or more predetermined characters and recombined as post data (third text data).
- the RTC unit 468 is composed of an RTC circuit and plays a role as a clock of the program search device 420 itself.
- the index assigning unit 492 adds the morpheme extracted from the program additional data or the posted data to the program held in the program holding unit 464, and the acquisition date / time information associated with the program added data or the posted data (second text data). Are assigned (associated) as index data, and held in the index holding unit 470 as an index table.
- a command statement for generating such an index table is expressed using SQL, it can be described as follows.
- index_table word text not null, postime timestamp not null, serviceid integer not null, eventid integer not null, UNIQUE (word, postime, serviceid, eventid) );
- the index table includes at least a search word word, a search time posttime, a service ID of the corresponding program: serviceid, and an event ID of the corresponding program: eventid.
- the index table has a unique combination of the search word word, the search time posttime, the service ID of the corresponding program: serviceid, and the event ID of the corresponding program: eventid.
- the index assigning unit 492 uses the combination of the subtitle data and the acquisition date / time information as index data. To the program corresponding to the caption data. On the other hand, the index assigning unit 492 determines that the subtitle data is not included in the program stream (the subtitle data is not added to the program) or is not included (the subtitle data is not added to the program). If it can be considered, a set of the recombined text data (third text data) and the acquisition date / time information is assigned as index data to the program corresponding to the caption data.
- “not including (subtitle data is not added to the program)” means that the subtitle rate described later is low.
- the index assigning unit 492 causes the data acquisition unit 482 to Post data (text data) is acquired from the service providing server 140, and the data processing unit 184 generates index data that can search for the corresponding program. Then, the index assigning unit 492 registers the index data in the index table of the index holding unit 470 in order to assign the index data to the program.
- index assigning unit 492 By providing the index assigning unit 492, it is possible to appropriately select which of the caption data included in the program stream and the posted data of the service providing server 140 should be the index data of the program to be assigned, Appropriate index data can be generated. In this way, even if there is no caption data, an index is added, so that the search accuracy can be improved.
- the table update unit 180 distinguishes the caption data in the program additional data used for updating the permitted word table 200 and the caption data used by the index adding unit 492 as the index data. It is also possible to update the permitted word table 200 using subtitle data used as index data.
- the index holding unit 470 is configured by a storage medium such as a flash memory or an HDD, and holds an index table in which index data is tabulated based on a control command from the index assigning unit 492.
- the program extraction unit 494 receives a user operation input through the operation unit 150 and displays the operation result on the display device 130 through a GUI (Graphical User Interface).
- the program extracting unit 494 extracts a program held in the program holding unit 464 or a predetermined scene in the program with reference to the index table based on a keyword or the like input for searching by the user.
- FIG. 12 is a flowchart illustrating the processing flow of the program search method.
- FIG. 12 illustrates index data assignment processing in the program search method.
- the index assigning unit 492 acquires the current time from the RTC unit 468 and substitutes it into the time variable NOW (S500), and the subtitle flag capflg is 0 (unprocessed) from the program information holding unit 466, and the program ends.
- the program information whose time edtime falls in the past from the time variable NOW is retrieved and acquired as a program information string (S502).
- the index assigning unit 492 determines whether or not program information remains in the program information sequence (S504), and if it remains (YES in S504), extracts one program information from the head of the program information sequence and sets a service ID variable.
- the service ID: serviceid is assigned to SERVICEID and the event ID: eventid is assigned to the event ID variable EVENTID, respectively, and the target program information is deleted from the program information sequence (S506). If no program information remains in the program information sequence (NO in S504), the index data adding process is terminated.
- the index assigning unit 492 acquires, from the program holding unit 464, a caption data string from the program additional data that is a file related to the channel number phych and is included in the time range from the program start time sttime to the program end time edtime. (S508). Then, the index assigning unit 492 substitutes the total number of caption data included in the acquired caption data string into the variable CAPNUM (S510).
- FIG. 13 is an explanatory diagram showing an example of caption data. As shown in FIG. 13, for example, the caption data 550 includes at least a caption time 552 and a text body 554. In this embodiment, for simplification of explanation, only subtitle data is handled among program additional data. However, a set of time and text may be extracted from program additional data other than subtitles. For example, (program start time sttime, title title) in the program information may be added to the head of the caption data string as one set.
- the index assigning unit 492 determines whether or not one or more caption data remains in the caption data string (S512), and if it remains (YES in S512), one caption data is added from the head of the caption data string.
- the subtitle time 552 is substituted into the time variable POSTTIME, the text body 554 is substituted into the text variable TEXT2, and the target subtitle data is deleted from the subtitle data string (S514).
- the index assigning unit 492 further performs lexical analysis on the text variable TEXT2 by replacing one or more line breaks, symbols, or spaces with one space (S516), and divides the text variable TEXT2 into morphemes using the morpheme dictionary (S518). ).
- a blank is used as a partition between morphemes.
- the above is the process of dividing the caption data string into morpheme strings, and is repeated for CAPNUM. If no caption data remains in the caption data string (NO in S512), the process proceeds to a morpheme remaining determination step S520.
- the index assigning unit 492 determines whether or not one or more morphemes remain in the morpheme string of the caption data (S520), and if they remain (YES in S520), takes out the first morpheme and extracts the morpheme.
- the index table has a unique combination of the search word word, the search time posttime, the service ID: serviceid of the corresponding program, and the event ID: eventid of the corresponding program.
- the index assigning unit 492 calculates the caption rate CST using the following Equation 3 (S526).
- the result of (program end time edtime ⁇ program start time sttime) is assumed to be converted into seconds, and the subtitle rate CST represents the number of subtitle data per second.
- CST CAPNUM / (edtime-sttime) (Formula 3)
- the second threshold ⁇ 0.1
- the index assigning unit 492 It is determined whether or not CST is greater than or equal to the second threshold value ⁇ (S528).
- the index assigning unit 492 If the caption rate CST is greater than or equal to the second threshold ⁇ (YES in S528), the index assigning unit 492 considers that the caption data string is valid, and the caption flag of the corresponding record in the program information table of the program information holding unit 466 Capflg is set to 1 (with caption data) (S530), and the program information remaining determination step S504 is repeated.
- the appearance rate (caption rate) related to the caption data is compared with the second threshold value ⁇ .
- the index assigning unit 492 calculates the total number of text body data of the program information as the third threshold value ⁇ . The validity of the caption data string may be determined by comparing with a threshold value.
- the index assigning unit 492 may determine the validity of the caption data sequence by comparing the morpheme number of the morpheme sequence output in S518 with a fourth threshold value.
- the index assigning unit 492 determines that the subtitle data string is not sufficient as index data, and the data acquisition unit 482 and the data processing unit 184
- the posting data included in the time range from the program start time sttime to the program end time edtime is acquired and processed (S532).
- the processed post data is accumulated in an output buffer provided in the RAM of the central control unit 462.
- the posted data acquisition step S532 is substantially the same as the process described with reference to FIG. 7 in the first embodiment, and thus the description thereof is omitted here.
- the subtitle data string is not sufficient as index data, and the caption data is not included or is included even if the subtitle data is not included in the case where the broadcast contents cannot be presented in advance such as news or live broadcasting.
- the reliability is low because there is only limited information such as. In such a case, contribution data is adopted rather than using a small amount of caption data, thereby improving reliability.
- the index assigning unit 492 determines whether or not a record remains in the output buffer (S534). If no record remains (NO in S534), the subtitle flag of the corresponding record in the program information table of the program information holding unit 466 is determined. capflg is set to 2 (there is a comment) (S536), and the program information remaining determination step S504 is repeated.
- the index assigning unit 492 takes out the record, substitutes the posting date and time post into the time variable POSTTIME, and acquires the morpheme string wlist (S538).
- the index assigning unit 492 determines whether or not one or more morphemes remain in the morpheme string of the record (S540), and if not (NO in S540), repeats from the record remaining determination step S534.
- the index data generated by the index assigning unit 492 uses a subtitle data as a search information source for a program with a large number of subtitles, etc., and uses a post data as a search information source for a program with high accuracy and a small number of subtitles. A wide and shallow search is possible.
- FIG. 14 is a flowchart for explaining the processing flow of the program search method.
- FIG. 14 illustrates a program search process in the program search method.
- the program extraction unit 494 substitutes the keyword into a morpheme variable WORD (S572).
- the program extraction unit 494 searches the index table of the index holding unit 470 (S574), and further uses the service ID: serviceid and event ID: eventid included in each row of the search result to program information of the program information holding unit 466.
- the table is searched to acquire the program name and the like (S576), and the search list as the search result is displayed on the display device 130 and presented to the user (S578).
- FIG. 15 is an explanatory diagram showing a display example of a search list.
- the program extraction unit 494 searches for index data based on the keyword, and based on the searched index data, As shown in FIG. 15, program information is listed and displayed.
- the program extraction unit 494 replaces each record in the program information table of the program information holding unit 466 so as to be easily understood by the user, and appropriately displays it in a layout.
- the program extraction unit 494 receives the channel number phych acquired from the program information holding unit 466 and the index holding unit 470.
- the program holding unit 464 is searched using the obtained search time posttime (S582), and the AV decoding unit 158 displays the program extracted by the search process on the display device 130 (S584).
- FIG. 16 is an explanatory diagram showing a display example on the display device 130.
- the search time 620 associated with the keyword for search is selected as the playback start point. I understand.
- the user can browse an arbitrary program or an arbitrary scene associated with a keyword for search from programs for thousands of hours.
- the program search device 420 and the program search method described above information corresponding to the index data is acquired from other routes, for example, post data on the electronic bulletin board, for the program stream not including the caption data, and the index data Can be associated with the program as For this reason, the program search device 420 and the program search method can attach index data to all programs regardless of the presence or absence of caption data, and can improve the search accuracy of programs.
- the program search device 420 and the program search method use only post data processed into text data in accordance with the broadcast ethics regulations as index data, so that words and sentences that are contrary to public order and morals, It is possible to eliminate unnecessary text data such as irrelevant characters related to a program to be played, meaningless text data by ASCII art, and associate only text data appropriate as index data with the program. In this way, it is possible to avoid an increase in the data capacity of the index data and deterioration in search accuracy due to inappropriate index data.
- the program search device 420 and the program search method can balance the subtitle data included in the program stream in quantity by filtering the posted data and limiting the index data associated with the program.
- the hit rate is not biased.
- the processed post data becomes text data in accordance with the broadcast ethics regulations.
- the subtitle data included in the program stream in advance and the words and Sentence quality is equal.
- the uniformity of the search can be maintained.
- the user can appropriately extract a desired program and a predetermined scene in the program.
- the permitted word table 200 is updated in a closed state in the filtering device 120, the permitted word table 200 is efficiently generated through the tuner unit 152 and the communication unit 154. In addition, it is possible to cope with fluctuations for avoiding filtering while minimizing the risk of tampering.
- the permission word table 200 strongly reflects the generation characteristics of the program additional data of the broadcast station 112 in the area where the user lives and the broadcast station 112 that the user views exclusively, so that it depends on the locality and the user's preference.
- the permitted word table 200 is obtained, and as a result, words according to regional characteristics and user's preference are likely to remain in the filtered post data.
- each step of the filtering method and program search method of this specification does not necessarily have to be processed in time series in the order described in the flowchart, and may include processing in parallel or by a subroutine.
Abstract
Description
(1)複数の形態素とその出現回数とを対応付けた許可ワードテーブルを保持するテーブル保持部と、放送倫理規定に沿って生成された番組ストリームを取得する番組ストリーム取得部と、取得された前記番組ストリームに字幕データまたは番組の内容に関する第1のテキストデータである番組情報が含まれている場合、前記番組ストリームから前記字幕データまたは前記番組情報を抽出し、形態素に分割して、分割した前記形態素が前記許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新するテーブル更新部と、取得された番組ストリームに含まれる番組を保持する番組保持部と、保持された前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付けるデータ取得部と、前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合するデータ加工部と、保持された前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与するインデックス付与部と、検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出する番組抽出部と、を備えることを特徴とする番組検索装置。
(2)複数の形態素とその出現回数とを対応付けた許可ワードテーブルを保持するテーブル保持部と、放送倫理規定に沿って生成された、番組の内容に関する第1のテキストデータである番組情報を取得する番組情報取得部と、前記番組情報を形態素に分割し、分割した前記形態素が前記許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新するテーブル更新部と、取得された番組ストリームに含まれる番組を保持する番組保持部と、保持された前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付けるデータ取得部と、前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合するデータ加工部と、保持された前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与するインデックス付与部と、検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出する番組抽出部と、を備えることを特徴とする番組検索装置。
(3)前記インデックス付与部は、保持された前記番組に字幕データが付加されていれば、その字幕データをインデックスデータとして前記番組に付与し、前記番組に字幕データが付加されていなければ、または、字幕データが付加されていないとみなせれば、再結合した前記第3のテキストデータをインデックスデータとして前記番組に付与することを特徴とする上記(1)または(2)に記載の番組検索装置。
(4)前記インデックス付与部は、1秒あたりの字幕データ数である字幕率が予め定められた第2閾値未満であれば、前記番組に字幕データが付加されていないとみなすことを特徴とする上記(3)に記載の番組検索装置。
(5)放送倫理規定に沿って生成された番組ストリームを取得し、取得した前記番組ストリームに字幕データまたは番組の内容に関する第1のテキストデータである番組情報が含まれている場合、前記番組ストリームから前記字幕データまたは前記番組情報を抽出し、形態素に分割して、分割した前記形態素が、複数の形態素とその出現回数とを対応付けた許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新し、取得した番組ストリームに含まれる番組を番組保持部に保持し、保持した前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付け、前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合し、保持した前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与し、検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出することを特徴とする番組検索方法。
(6)放送倫理規定に沿って生成された、番組の内容に関する第1のテキストデータである番組情報を取得し、前記番組情報を形態素に分割し、分割した前記形態素が、複数の形態素とその出現回数とを対応付けた許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新し、取得した番組ストリームに含まれる番組を番組保持部に保持し、保持した前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付け、前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合し、保持した前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与し、検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出することを特徴とする番組検索方法。
図1は、第1の実施形態における番組提供システム100の概略的な接続関係を示した説明図である。番組提供システム100は、番組提供装置110と、フィルタリング装置120と、表示装置130と、サービス提供サーバ140とを含んで構成される。
図2は、フィルタリング装置120の概略的な構成を示した機能ブロック図である。フィルタリング装置120は、操作部150と、チューナー部152と、通信部154と、DEMUX(DEMUltipleXer)部156と、AVデコード部158と、テーブル保持部160と、中央制御部162とを含んで構成される。ここで、チューナー部152と、通信部154と、DEMUX部156とは番組ストリームを取得する番組ストリーム取得部として機能する。図2では、データの流れを実線の矢印で表し、制御信号の流れを破線の矢印で表している。
create table allowing_word_table(
pword text,
word text not null,
wnum integer,
UNIQUE(pword, word)
);
図5は、フィルタリング方法の処理の流れを説明したフローチャートである。特に、図5では、フィルタリング方法のうち、許可ワードテーブル200を生成する処理について説明している。
create table output_buffer (
post timestamp not null,
wlist text list,
UNIQUE(post)
);
かかる出力バッファは、投稿データの、投稿日時post(取得日時情報)と形態素列wlistとを組み合わせたテーブル構造で形成される。投稿日時postは投稿が行われた日時であり、形態素列wlistはフィルタリングが施された形態素列である。また、出力バッファは、投稿日時postについてユニークになるように設定されている。
該当レコードのwnum値 / 全レコードのwnumの合計値 …(式1)
このように構成することで、データ加工部184は、許可ワードテーブル200の母集団との比率に基づいて存在判定ステップS376を実行することができる。したがって、任意の形態素が、母集団が小さいときに許可ワードとなっていても、その後、出現回数が更新されないと、母集団が大きくなるに連れて出現確率が減り、許可ワードから除外される場合もある。こうして、出現頻度が少なくなった形態素を自動的に排除することが可能となる。
第1の実施形態では、任意のテキストデータを適切にフィルタリングするフィルタリング装置120およびフィルタリング方法を説明した。第2の実施形態では、第1の実施形態で説明したフィルタリング技術を用い、番組や番組内の所定シーンを適切に検索する番組検索装置420および番組検索方法を説明する。
番組を複数蓄積し、蓄積された番組を事後的に視聴する構成(例えば、HDR:Hard Disk Recorder)において、番組ストリームに字幕データが含まれている場合、その字幕データをインデックスデータとして番組それぞれに関連付けることで、HDRは、そのインデックスデータに基づいて、ユーザが所望する番組を迅速に提示することができる。しかし、番組ストリームには必ず字幕データが含まれているとは限らず、例えば、ニュースや生放送といった、予めその放送内容を提示できないものに関しては字幕データが含まれていない、または、含まれていたとしても表題等の極限られた情報である。そうすると、番組によっては、インデックスデータが関連付けられているものと、そうでないものが生じる。
TOTAL/3600×(T1-T0) …(式2)
ここで、(T1-T0)の結果は秒換算して用いるものとする。
create table epg_table (
phych integer not null,
serviceid integer not null,
eventid integer not null,
sttime timestamp not null,
edtime timestamp not null,
title text not null,
capflg integer not null,
UNIQUE(serviceid, eventid, sttime)
);
ここで番組情報は、少なくとも、チャンネル番号phych、サービスID:serviceid、イベントID:eventid、番組開始時刻sttime、番組終了時刻edtime、番組名title、字幕フラグcapflgを含んでいる。また、番組情報テーブルでは、サービスID:serviceidと、イベントID:eventidと、番組開始時刻sttimeとの組み合わせがユニークになる。番組情報記憶制御部490は、字幕フラグcapflg以外の情報は番組情報から取得できる。また、サービスIDは1つの放送局112中の1つ以上の編成に対応した固有な数値であり、イベントIDは1編成中の1つ以上のイベントに対応した固有な数値である。
create table index_table (
word text not null,
postime timestamp not null,
serviceid integer not null,
eventid integer not null,
UNIQUE(word, postime, serviceid, eventid)
);
ここで、インデックステーブルは、少なくとも、検索語word、検索時刻postime、該当番組のサービスID:serviceid、該当番組のイベントID:eventidを含む。また、インデックステーブルは、検索語wordと、検索時刻postimeと、該当番組のサービスID:serviceidと、該当番組のイベントID:eventidとの組み合わせがユニークになる。
図12は、番組検索方法の処理の流れを説明したフローチャートである。特に、図12では、番組検索方法のうち、インデックスデータの付与処理について説明している。まず、インデックス付与部492は、RTC部468から現在時刻を取得し、時刻変数NOWへ代入し(S500)、番組情報保持部466から、字幕フラグcapflgが0(未処理)であり、かつ番組終了時刻edtimeが時刻変数NOWより過去にあたる番組情報を検索し、番組情報列として取得する(S502)。
CST = CAPNUM / (edtime-sttime) …(式3)
統計上、字幕がついているとみなせる番組の字幕率CSTは0.1~0.25の間の値をとるので、第2閾値β=0.1として判定し、インデックス付与部492は、字幕率CSTが第2閾値β以上であるか否か判定する(S528)。字幕率CSTが第2閾値β以上であれば(S528におけるYES)、インデックス付与部492は、字幕データ列が有効であるとみなして、番組情報保持部466の番組情報テーブルにおける該当レコードの字幕フラグcapflgを1(字幕データあり)に設定し(S530)、番組情報残り判定ステップS504から繰り返す。ここでは、番組付加データのうち、字幕データに関する出現率(字幕率)を第2閾値βと比較しているが、同様に、インデックス付与部492は、番組情報のテキスト本文のデータ総数を第3閾値と比較して字幕データ列の有効性を判断してもよい。
120 …フィルタリング装置
160 …テーブル保持部
180 …テーブル更新部
182、482 …データ取得部
184 …データ加工部
200 …許可ワードテーブル
420 …番組検索装置
464 …番組保持部
492 …インデックス付与部
494 …番組抽出部
Claims (6)
- 複数の形態素とその出現回数とを対応付けた許可ワードテーブルを保持するテーブル保持部と、
放送倫理規定に沿って生成された番組ストリームを取得する番組ストリーム取得部と、
取得された前記番組ストリームに字幕データまたは番組の内容に関する第1のテキストデータである番組情報が含まれている場合、前記番組ストリームから前記字幕データまたは前記番組情報を抽出し、形態素に分割して、分割した前記形態素が前記許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新するテーブル更新部と、
取得された番組ストリームに含まれる番組を保持する番組保持部と、
保持された前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付けるデータ取得部と、
前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合するデータ加工部と、
保持された前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与するインデックス付与部と、
検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出する番組抽出部と、
を備えることを特徴とする番組検索装置。 - 複数の形態素とその出現回数とを対応付けた許可ワードテーブルを保持するテーブル保持部と、
放送倫理規定に沿って生成された、番組の内容に関する第1のテキストデータである番組情報を取得する番組情報取得部と、
前記番組情報を形態素に分割し、分割した前記形態素が前記許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新するテーブル更新部と、
取得された番組ストリームに含まれる番組を保持する番組保持部と、
保持された前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付けるデータ取得部と、
前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合するデータ加工部と、
保持された前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与するインデックス付与部と、
検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出する番組抽出部と、
を備えることを特徴とする番組検索装置。 - 前記インデックス付与部は、保持された前記番組に字幕データが付加されていれば、その字幕データをインデックスデータとして前記番組に付与し、前記番組に字幕データが付加されていなければ、または、字幕データが付加されていないとみなせれば、再結合した前記第3のテキストデータをインデックスデータとして前記番組に付与することを特徴とする請求項1または2に記載の番組検索装置。
- 前記インデックス付与部は、1秒あたりの字幕データ数である字幕率が予め定められた第2閾値未満であれば、前記番組に字幕データが付加されていないとみなすことを特徴とする請求項3に記載の番組検索装置。
- 放送倫理規定に沿って生成された番組ストリームを取得し、
取得した前記番組ストリームに字幕データまたは番組の内容に関する第1のテキストデータである番組情報が含まれている場合、前記番組ストリームから前記字幕データまたは前記番組情報を抽出し、形態素に分割して、分割した前記形態素が、複数の形態素とその出現回数とを対応付けた許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新し、
取得した番組ストリームに含まれる番組を番組保持部に保持し、
保持した前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付け、
前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合し、
保持した前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与し、
検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出することを特徴とする番組検索方法。 - 放送倫理規定に沿って生成された、番組の内容に関する第1のテキストデータである番組情報を取得し、
前記番組情報を形態素に分割し、分割した前記形態素が、複数の形態素とその出現回数とを対応付けた許可ワードテーブルに無ければ、その形態素を前記許可ワードテーブルに登録し、分割した前記形態素が前記許可ワードテーブルに有れば、前記形態素に対応した出現回数を更新し、
取得した番組ストリームに含まれる番組を番組保持部に保持し、
保持した前記番組に関する第2のテキストデータを取得すると共に、取得日時情報を関連付け、
前記第2のテキストデータを形態素に分割し、分割した前記形態素が前記許可ワードテーブルに登録されていない、または、分割した前記形態素が前記許可ワードテーブルに登録されているが、その形態素に対応した出現回数が予め定められた第1閾値未満であれば、前記形態素を、予め定められた記号に置換し、第3のテキストデータとして再結合し、
保持した前記番組に、再結合した前記第3のテキストデータと、この第3のテキストデータに対応する前記第2のテキストデータに関連付けられた前記取得日時情報との組をインデックスデータとして付与し、
検索のため入力されたキーワードと前記インデックスデータとに基づいて、前記番組保持部に保持された番組または番組内の所定シーンを抽出することを特徴とする番組検索方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201180007305XA CN102845075A (zh) | 2010-10-14 | 2011-09-15 | 节目检索装置及节目检索方法 |
EP11832383A EP2568397A1 (en) | 2010-10-14 | 2011-09-15 | Program retrieval device and program retrieval method |
KR1020127025351A KR20120127664A (ko) | 2010-10-14 | 2011-09-15 | 프로그램 검색 장치 및 프로그램 검색 방법 |
US13/599,982 US20120323564A1 (en) | 2010-10-14 | 2012-08-30 | Program search device and program search method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010232008A JP5392228B2 (ja) | 2010-10-14 | 2010-10-14 | 番組検索装置および番組検索方法 |
JP2010-232008 | 2010-10-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/599,982 Continuation US20120323564A1 (en) | 2010-10-14 | 2012-08-30 | Program search device and program search method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012049945A1 true WO2012049945A1 (ja) | 2012-04-19 |
Family
ID=45938178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/071091 WO2012049945A1 (ja) | 2010-10-14 | 2011-09-15 | 番組検索装置および番組検索方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20120323564A1 (ja) |
EP (1) | EP2568397A1 (ja) |
JP (1) | JP5392228B2 (ja) |
KR (1) | KR20120127664A (ja) |
CN (1) | CN102845075A (ja) |
WO (1) | WO2012049945A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105808560A (zh) * | 2014-12-29 | 2016-07-27 | 腾讯科技(深圳)有限公司 | 一种同机多业务的检索方法及系统 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5952241B2 (ja) * | 2013-09-03 | 2016-07-13 | 日本電信電話株式会社 | 情報付与装置、情報付与方法および情報付与プログラム |
JP2015052897A (ja) * | 2013-09-06 | 2015-03-19 | 株式会社東芝 | 電子機器、電子機器の制御方法及びコンピュータプログラム |
KR101509727B1 (ko) * | 2013-10-02 | 2015-04-07 | 주식회사 시스트란인터내셔널 | 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법 |
US9489986B2 (en) * | 2015-02-20 | 2016-11-08 | Tribune Broadcasting Company, Llc | Use of program-schedule text and teleprompter output to facilitate selection of a portion of a media-program recording |
US9792956B2 (en) * | 2015-02-20 | 2017-10-17 | Tribune Broadcasting Company, Llc | Use of program-schedule text and closed-captioning text to facilitate selection of a portion of a media-program recording |
US11132497B2 (en) * | 2018-10-14 | 2021-09-28 | Bonggeun Kim | Device and method for inputting characters |
CN109525301A (zh) * | 2018-10-25 | 2019-03-26 | 深圳市海勤科技有限公司 | 卫星信号接收方法及系统、服务器终端、用户终端 |
JP2020154395A (ja) * | 2019-03-18 | 2020-09-24 | 富士ゼロックス株式会社 | 情報処理装置及びプログラム |
CN110413735B (zh) * | 2019-07-25 | 2022-04-29 | 深圳供电局有限公司 | 一种问答检索方法及其系统、计算机设备、可读存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3538955B2 (ja) | 1995-04-26 | 2004-06-14 | 松下電器産業株式会社 | 情報収集支援装置 |
WO2006019101A1 (ja) * | 2004-08-19 | 2006-02-23 | Nec Corporation | コンテンツ関連情報取得装置、およびプログラム |
JP2006190019A (ja) * | 2005-01-05 | 2006-07-20 | Hitachi Ltd | コンテンツ視聴システム |
JP2008204425A (ja) * | 2007-01-26 | 2008-09-04 | Yahoo Japan Corp | Urlの類似性分析による処理省略判定プログラム、装置 |
JP2010067005A (ja) * | 2008-09-10 | 2010-03-25 | Yahoo Japan Corp | 検索装置、および検索装置の制御方法 |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5940624A (en) * | 1991-02-01 | 1999-08-17 | Wang Laboratories, Inc. | Text management system |
US5434678A (en) * | 1993-01-11 | 1995-07-18 | Abecassis; Max | Seamless transmission of non-sequential video segments |
JP3476237B2 (ja) * | 1993-12-28 | 2003-12-10 | 富士通株式会社 | 構文解析装置 |
US7139031B1 (en) * | 1997-10-21 | 2006-11-21 | Principle Solutions, Inc. | Automated language filter for TV receiver |
JPH11261908A (ja) * | 1998-03-06 | 1999-09-24 | Toshiba Corp | 番組及び又は情報の選択支援装置 |
JP3601653B2 (ja) * | 1998-03-18 | 2004-12-15 | 富士通株式会社 | 情報検索装置および方法 |
JP3781561B2 (ja) * | 1998-08-13 | 2006-05-31 | 日本電気株式会社 | 自然言語解析装置、システム及び記録媒体 |
US7286984B1 (en) * | 1999-11-05 | 2007-10-23 | At&T Corp. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US8051446B1 (en) * | 1999-12-06 | 2011-11-01 | Sharp Laboratories Of America, Inc. | Method of creating a semantic video summary using information from secondary sources |
US8006268B2 (en) * | 2002-05-21 | 2011-08-23 | Microsoft Corporation | Interest messaging entertainment system |
US7269548B2 (en) * | 2002-07-03 | 2007-09-11 | Research In Motion Ltd | System and method of creating and using compact linguistic data |
US8050970B2 (en) * | 2002-07-25 | 2011-11-01 | Google Inc. | Method and system for providing filtered and/or masked advertisements over the internet |
WO2005050474A2 (en) * | 2003-11-21 | 2005-06-02 | Philips Intellectual Property & Standards Gmbh | Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics |
US20060074660A1 (en) * | 2004-09-29 | 2006-04-06 | France Telecom | Method and apparatus for enhancing speech recognition accuracy by using geographic data to filter a set of words |
US7680648B2 (en) * | 2004-09-30 | 2010-03-16 | Google Inc. | Methods and systems for improving text segmentation |
US7549119B2 (en) * | 2004-11-18 | 2009-06-16 | Neopets, Inc. | Method and system for filtering website content |
US8185921B2 (en) * | 2006-02-28 | 2012-05-22 | Sony Corporation | Parental control of displayed content using closed captioning |
JP4913154B2 (ja) * | 2006-11-22 | 2012-04-11 | 春男 林 | 文書解析装置および方法 |
US8280871B2 (en) * | 2006-12-29 | 2012-10-02 | Yahoo! Inc. | Identifying offensive content using user click data |
US8712757B2 (en) * | 2007-01-10 | 2014-04-29 | Nuance Communications, Inc. | Methods and apparatus for monitoring communication through identification of priority-ranked keywords |
JP4760864B2 (ja) * | 2008-06-25 | 2011-08-31 | ソニー株式会社 | 情報処理装置、情報処理方法、プログラム、及び、情報処理システム |
WO2010079954A2 (en) * | 2009-01-06 | 2010-07-15 | Lg Electronics Inc. | An iptv receiver and an method of managing video functionality and video quality on a screen in the iptv receiver |
CN101751386B (zh) * | 2009-12-28 | 2012-05-23 | 华建机器翻译有限公司 | 一种未登录词的识别方法 |
WO2011112989A2 (en) * | 2010-03-11 | 2011-09-15 | Cypes Gregory B | Systems and methods for location tracking in a social network |
-
2010
- 2010-10-14 JP JP2010232008A patent/JP5392228B2/ja active Active
-
2011
- 2011-09-15 WO PCT/JP2011/071091 patent/WO2012049945A1/ja active Application Filing
- 2011-09-15 EP EP11832383A patent/EP2568397A1/en not_active Withdrawn
- 2011-09-15 KR KR1020127025351A patent/KR20120127664A/ko not_active Application Discontinuation
- 2011-09-15 CN CN201180007305XA patent/CN102845075A/zh active Pending
-
2012
- 2012-08-30 US US13/599,982 patent/US20120323564A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3538955B2 (ja) | 1995-04-26 | 2004-06-14 | 松下電器産業株式会社 | 情報収集支援装置 |
WO2006019101A1 (ja) * | 2004-08-19 | 2006-02-23 | Nec Corporation | コンテンツ関連情報取得装置、およびプログラム |
JP2006190019A (ja) * | 2005-01-05 | 2006-07-20 | Hitachi Ltd | コンテンツ視聴システム |
JP2008204425A (ja) * | 2007-01-26 | 2008-09-04 | Yahoo Japan Corp | Urlの類似性分析による処理省略判定プログラム、装置 |
JP2010067005A (ja) * | 2008-09-10 | 2010-03-25 | Yahoo Japan Corp | 検索装置、および検索装置の制御方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105808560A (zh) * | 2014-12-29 | 2016-07-27 | 腾讯科技(深圳)有限公司 | 一种同机多业务的检索方法及系统 |
CN105808560B (zh) * | 2014-12-29 | 2020-07-31 | 腾讯科技(深圳)有限公司 | 一种同机多业务的检索方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
US20120323564A1 (en) | 2012-12-20 |
CN102845075A (zh) | 2012-12-26 |
EP2568397A1 (en) | 2013-03-13 |
JP2012084094A (ja) | 2012-04-26 |
JP5392228B2 (ja) | 2014-01-22 |
KR20120127664A (ko) | 2012-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5392228B2 (ja) | 番組検索装置および番組検索方法 | |
US10371532B2 (en) | Method and apparatus for providing geospatial and temporal navigation | |
US8374845B2 (en) | Retrieving apparatus, retrieving method, and computer program product | |
JP5392227B2 (ja) | フィルタリング装置およびフィルタリング方法 | |
JP2007274604A (ja) | 電子装置、その情報閲覧方法及び情報閲覧プログラム | |
CN105230035A (zh) | 用于选择的时移多媒体内容的社交媒体的处理 | |
JP5919325B2 (ja) | コメント表示装置、コメント配信装置、コメント表示システム、コメント表示方法及びプログラム | |
JP2007267173A (ja) | コンテンツ再生装置および方法 | |
JP2004297245A (ja) | ストリーミング配信方法 | |
JP2010245853A (ja) | 動画インデクシング方法及び動画再生装置 | |
JP4656202B2 (ja) | 情報処理装置および方法、プログラム、並びに記録媒体 | |
US9606991B2 (en) | Comment distribution system, and a method and a program for operating the comment distribution system | |
KR20080048130A (ko) | 메타데이터를 이용한 멀티미디어 재생장치 간의 멀티미디어컨텐츠 북마크 공유 방법 및 시스템 | |
KR20140083637A (ko) | 사용자의 감성에 기반한 맞춤형 콘텐츠를 제공하는 서버 및 방법 | |
EP2560380B1 (en) | Chapter creation device, chapter creation method, and chapter creation program | |
JP2009295054A (ja) | 映像コンテンツ検索装置及びコンピュータプログラム | |
CN101212268A (zh) | 一种通过电子业务指南显示特定信息的方法 | |
JP5211091B2 (ja) | 端末装置、コンテンツナビゲーションプログラム、コンテンツナビゲーションプログラムを記録した記録媒体、およびコンテンツナビゲーション方法 | |
WO2014002728A1 (ja) | 録画装置、テレビジョン受信機及び録画方法 | |
JP6621691B2 (ja) | コンテンツ表示制御装置、コンテンツ表示制御装置の制御方法、トピック管理システム、制御プログラム、および記録媒体 | |
KR102664295B1 (ko) | 수어 자막 동영상 플랫폼 제공 방법 및 장치 | |
US11910064B2 (en) | Methods and systems for providing preview images for a media asset | |
US20230254350A1 (en) | Methods, systems, and media for presenting user comments containing timed references in synchronization with a media content item | |
JP2006163710A (ja) | 番組情報蓄積装置及び方法並びに番組情報蓄積用プログラム | |
Sánchez et al. | Video in the Spanish digital'press': 2010-2015 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180007305.X Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11832383 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011832383 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20127025351 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |