WO2012070182A1 - 推定装置、推定方法、並びにプログラム - Google Patents
推定装置、推定方法、並びにプログラム Download PDFInfo
- Publication number
- WO2012070182A1 WO2012070182A1 PCT/JP2011/005735 JP2011005735W WO2012070182A1 WO 2012070182 A1 WO2012070182 A1 WO 2012070182A1 JP 2011005735 W JP2011005735 W JP 2011005735W WO 2012070182 A1 WO2012070182 A1 WO 2012070182A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- blog
- program
- tag
- character string
- database
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/16—Analogue secrecy systems; Analogue subscription systems
- H04N7/173—Analogue secrecy systems; Analogue subscription systems with two-way working, e.g. subscriber sending a programme selection signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/44—Browsing; Visualisation therefor
- G06F16/447—Temporal browsing, e.g. timeline
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/489—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4828—End-user interface for program selection for searching program descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/44—Receiver circuitry for the reception of television signals according to analogue transmission standards
Definitions
- the present invention relates to an estimation device, an estimation method, and a program, and more particularly, to a broadcasting station that broadcasts a program referred to by a blog and a technique for estimating the program using the collected characteristic words of the blog.
- blog in this specification is defined as a comment or article posted by an individual on a web site.
- Real-life bloggers do not live on TV programs for 365 days, but they mix their daily life and book impressions on one blog site. Describe. As such a blogger's behavior, it has been established that tag names are given as sticky notes of such mixed contents. For example, tags such as #daily, #books, and #tv are attached to categories such as daily life, books, and television. In addition, a unique tag is often assigned to a program that is often viewed. For example, many are based on abbreviations such as the serial drama A ⁇ # rendoraA.
- the present invention is capable of accurately estimating programs referred to by social media such as blogs without extracting feature words from EPG or subtitle text or maintaining a synonym dictionary. Objective.
- an aspect of the present invention is an estimation device.
- This device collects a blog including a character string written by an individual on a web site and time information at which the character string is written via a network, extracts a tag appearing in the character string of the blog, and A tag extraction unit that associates the blog with the extracted tag and stores it in a tag appearance database; and a broadcasting station that broadcasts a program referred to in the blog based on a feature word that appears in the character string of the blog.
- a temporary broadcast station estimation unit that estimates and stores the estimated broadcast station as a temporary broadcast station in the blog database in association with the blog, and a blog stored in the tag appearance database and written within a predetermined time range If the number of tags that appear in the character string of the blog exceeds a predetermined threshold, the blog that refers to the blog database includes the tag in the character string.
- a broadcast station determination unit for determining a broadcasting station that broadcasts programs mentioned in the blog based on the counting result.
- a blog including a character string written by an individual on a web site and time information at which the character string is written is collected via a network, and tags appearing in the character string of the blog are extracted, Associating the blog with the extracted tag and storing it in the tag appearance database, and estimating a broadcasting station that broadcasts the program referred to in the blog based on the feature word appearing in the character string of the blog Storing the estimated broadcast station as a temporary broadcast station in the blog database in association with the blog, and a blog character string written in a predetermined time range that is stored in the tag appearance database
- the blog database is referred to in a blog that includes the tag in a character string. That aggregates the provisional broadcasting station of the program, to execute the counting result and the step of determining the broadcast station broadcasting the program referred to in the blog based on the processor.
- Embodiment 1 of this invention It is a block diagram of the estimation apparatus in Embodiment 1 of this invention. It is a flowchart in Embodiment 1 of this invention. It is an example (before program estimation) of the blog DB data structure of Embodiment 1 of this invention. It is an example (after program estimation) of the blog DB data structure of Embodiment 1 of this invention. It is an example of the tag appearance DB data structure in Embodiment 1 of this invention. It is an example (before program estimation) of the tag program DB data structure in Embodiment 1 of this invention. It is an example (after program estimation) of the tag program DB data structure in Embodiment 1 of this invention. It is an example of the feature word data in Embodiment 1 of this invention. It is a block diagram of the estimation apparatus in Embodiment 2 of this invention. It is a flowchart in Embodiment 2 of this invention.
- FIG. 1 is a block diagram of an estimation apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a flowchart according to Embodiment 1 of the present invention.
- the blog collection unit 101 of the estimation apparatus 100 collects blog articles through the network 200 such as the Internet (S1), and registers the post unique ID, text, and post time in the blog DB (DataBase) 102 shown in FIG. (S2).
- the provisional broadcast station, final broadcast station, and program information (title, broadcast station name, program details, etc.), which will be described later, are unknown and remain empty.
- “Blog” in this specification refers to a piece of comment or article posted (written) by a person at once on the web site, and text information (character string) constituting the posted comment or article. ) And the time information when the comment or article is posted.
- the posting unique ID is an ID (IDentification) unique to each blog article.
- the format of the post unique ID is not particularly limited.
- the blog site for collecting blog articles is not particularly limited.
- the tag extraction unit 103 extracts an arbitrary tag from the blog text (S3).
- alphanumeric characters starting with “#” are extracted as tags like #rendoraA, but the format of the tags is not particularly limited in the present invention, and is mechanically according to a predetermined rule. Any format can be used as long as it is a format unique to a blog (such as text enclosed in a specific format among HTML tags) or a user behavior format that can be automatically retrieved.
- the tag extraction unit 103 registers the extracted tag in the tag appearance DB 104 in association with the blog post unique ID and the post time (S4).
- the tag appearance DB 104 has the data structure shown in FIG. FIG. 5 shows an example excluding feature word data used when estimating a temporary broadcasting station name described later. Further, the tag extraction unit 103 determines whether the extracted tag is a new tag that has not yet been registered in the tag program DB 105 indicating the association between the tag and the program information estimated from the tag (S5). Is registered in the tag program DB 105 (S6).
- the tag program DB 105 has the data structure shown in FIG. At this time, the fixed time (the time when the association between the tag and the program information is determined), the program information, and the fixed broadcast station described later are unknown and are therefore empty.
- the temporary broadcast station estimation unit 106 estimates a temporary broadcast station from the blog text collected by the blog collection unit 101 (S7), and temporarily stores the estimated temporary broadcast station in the blog DB 102 (S8).
- feature word data prepared in advance for each broadcasting station as shown in the example of FIG. 8 is used for the estimation of the temporary broadcasting station.
- This feature word data is a term that appears in the blog text, and is not necessarily the tag described above.
- This method does not require the extraction of feature words by morphological analysis of EPG or subtitle text as in the prior art, and can greatly reduce the calculation cost for estimation.
- a priority score is assigned to each feature word data as shown in FIG. 8 and temporarily stored in the provisional broadcast station of the blog DB 102.
- the blog text “Look at the satellite broadcast of broadcast station A” matches both broadcast station A and the satellite broadcast of broadcast station A). It is possible to estimate the broadcasting station having the highest provisional total score as a provisional broadcasting station.
- This priority score is also a predetermined value like the feature word, and is static data once set for each broadcasting station. The matched temporary total score is temporarily stored in a predetermined column of the blog DB 102.
- the provisional broadcast station at this stage is merely a provisional broadcast station, and there may be a plurality of matching provisional broadcast stations, or the provisional broadcast station may be unknown without matching.
- the provisional broadcast station may be unknown without matching.
- the broadcast station determination unit 107 periodically divides the tag appearance DB 104 into a predetermined time range Ra (for example, 10 minutes before to the current time, etc.), and the time range Ra It is determined whether or not an arbitrary tag T appears for a predetermined threshold ⁇ times (for example, 50 times) (S9), and when the tag T appears ⁇ times or more, a broadcast station is determined by a method described later.
- the prescribed time range Ra is a broadcast station estimated reference time range that is used as a reference when determining the correspondence between the tag and the broadcast station.
- the threshold ⁇ is a broadcast station determination reference value that is referred to in order to determine whether or not to associate a tag with a broadcast station.
- the prescribed time range is fixed to 10 minutes.
- the time range may be made variable by taking out the start time and end time and determining the time range Ra when the current time overlaps the end time.
- the broadcasting station determination unit 107 starts from the oldest posting time (20:50:22) including the tag # prog1 from the blog DB 102 based on the posting unique ID recorded in the tag appearance DB 104, and the latest posting time (22: The blog list Lb in the time range Rb is acquired until (02:20) (S10).
- the broadcast station determination unit 107 creates a ranking by counting the number of appearances of temporary broadcast stations from this blog list Lb, and identifies the temporary broadcast station with the highest number of appearances as the determined broadcast station indicated by # prog1 (S11). ).
- the blog DB 102 stores blogs for the past week collected by the blog collection unit 101 and deletes past blogs.
- the oldest posting time including the tag T is used as it is in the time range Rb, but the oldest time is the same as the latest posting time.
- the broadcasting stations are always determined for the sake of simplification of explanation.
- the distribution of temporary broadcasting stations is statistically determined and matched with the rejection condition (
- the rejection condition refers to, for example, the case where there is almost no difference between the first and second total priority scores, and the unknown rate is remarkably large with respect to the total number of postings (for example, the unknown rate is 30% or more)
- the tag T does not indicate a specific broadcasting station or program information and is not used for estimation of program information.
- the program estimation unit 108 next acquires all the program candidates of the confirmed broadcast station corresponding to the time range Rb from the program information DB 109 (S12).
- the program information DB 109 is a database that accumulates at least information such as broadcast station names, broadcast times, titles, and program details, but the acquisition unit for these information is not particularly limited in the present invention. Information acquired via a network, information acquired from electronic program data included in a broadcast wave, or information acquired by other methods may be used.
- the program estimation unit 108 registers information related to “program 1” acquired from the program information DB 109 as the program information in the tag program DB 105, and the current time is 22:05 as the confirmed time. Is registered, the association between the tag T and the program information is completed (S14).
- the program information rewriting unit 110 rewrites the program information and the confirmed broadcast station in the blog DB 102 based on the estimated program information for the blog list Lb in which the program information is empty (S15).
- the program information that was unknown at the stage of S7 and the confirmed broadcasting station that was erroneously estimated are also correct information.
- a program referred to by social media such as a blog can be obtained with high accuracy without extracting feature words from EPG or subtitle text or maintaining a synonym dictionary. It is possible to obtain an effect that it can be estimated. (Embodiment 2)
- FIG. 9 is a block diagram of the estimation apparatus according to Embodiment 2 of the present invention.
- FIG. 10 is a flowchart in the second embodiment of the present invention. It has a block configuration in which a program information setting unit 111 is added to the estimation apparatus 100 of the first embodiment.
- steps S1 to S4 and steps S6 to S15 are the same as those in the first embodiment, and thus description thereof is omitted.
- the tag extraction unit 103 determines whether the tag is a new tag that has not yet been registered in the tag program DB 105 indicating the association between the tag and the program information estimated from the tag (S5). Register in the DB 105 (S6). If the tag is already registered in the tag program DB 105, the program information setting unit 111 determines whether there is program information associated with the tag (S16). It is determined whether the posting time is within the fixed time in the tag program DB 105 + the threshold value ⁇ (S17). If it is within the range, it is estimated that the tag indicates the same program information, and the program information in the blog DB 102 Then, the broadcast station list is rewritten (S18).
- the threshold value ⁇ is a program estimation reference time range that is used as a reference when determining the correspondence between the tag and the program information.
- the restriction by the fixed time + threshold value ⁇ is for not compulsory old related program information when the same program is broadcasted at different times by different broadcasting stations.
- the same program may be broadcast in different time zones between the broadcasting station F and the broadcasting station G.
- broadcast station names are different, program information does not match completely.
- the value of the threshold value ⁇ may be determined by experiment in consideration of program information broadcast by each station.
- the tag program DB 105 may be periodically checked, and the tag and the program information association may be updated to a new one by deleting the tag whose final time is the current time minus the threshold value ⁇ . Is possible.
- the tag T related to the program is set, but the blog text that cannot be estimated from the known feature word table (that is, not used in the past). It is possible to obtain the effect of estimating the program of the blog text that has been abandoned.
- the broadcast station determination unit 107 acquires the blog list Lb related to a specific tag from the blog DB 102 based on the post unique ID recorded in the tag appearance DB 104, and the number of appearances of the temporary broadcast station from the acquired blog list Lb. As described above, the ranking is created by counting and the temporary broadcasting station with the highest number of appearances is identified as the confirmed broadcasting station indicated by the tag.
- the way of associating the tag with the broadcasting station by the broadcasting station determining unit 107 is not limited to the method based on the maximum value of the number of appearances.
- the broadcast station determination unit 107 not only simply counts the number of appearances of temporary broadcast stations appearing in the blog list Lb when determining a determined broadcast station, but further determines the determined broadcast station by further analyzing the total result. Also good.
- the broadcast station determination unit 107 also uses the broadcast station determination unit 107 to determine the number of appearances of temporary broadcast stations appearing in the blog list Lb when determining a determined broadcast station, but further determines the determined broadcast station by further analyzing the total result. Also good.
- Another example of the association between the tag and the broadcast station by the broadcast station determination unit 107 will be described.
- the broadcasting station determination unit 107 first calculates a time-series change in the number of temporary broadcasting stations estimated from the blog list Lb.
- the time-series change is expressed by a graph in which, for example, the time is on the horizontal axis, and the number of temporary broadcast stations estimated from blogs posted at each time is on the vertical axis.
- the broadcast station determination unit 107 calculates a time derivative of the calculated time-series change, and sets a provisional broadcast station having a graph with the maximum maximum differential value as a broadcast station corresponding to the tag.
- Obtaining the differential value of the time-series change in this way is equivalent to evaluating the instantaneous excitement in the blog. This makes it possible to reflect changes in excitement in accordance with the progress of the program, such as the start of a program or the broadcast of a popular corner in the program, in the determination of the broadcasting station.
- the program estimation unit 108 has described the case where the “program” with the largest number of posts in the blog list Lb is estimated as the corresponding program indicated by the tag.
- the method of associating the tag with the program by the program estimation unit 108 is not limited to the method based on the maximum number of posts.
- another example of the association between the tag and the program by the program estimation unit 108 will be described.
- the program estimation unit 108 may count the number of posted programs in the blog list Lb and associate the tag with the program based on the number of standardized programs obtained by standardizing the number of the programs with the broadcast time of the program. . More specifically, the program estimation unit 108 obtains the number of programs per unit broadcast time by dividing the total number of programs by the broadcast time of the program. In general, it is considered that a program with a longer broadcast time is more posted on a blog than a program with a shorter broadcast time. For example, considering that a program with a broadcast time of 3 hours is broadcast after a program with a broadcast time of 10 minutes, the total number of posts posted on a blog is considered to be higher for a program with a broadcast time of 3 hours. It is done. Therefore, the program estimation unit 108 associates a tag with a program based on the number of programs per unit broadcast time, thereby reducing the difference in the number of posts due to the broadcast time and improving the program estimation accuracy.
- 100 estimation device 101 blog collection unit, 102 blog DB, 103 tag extraction unit, 104 tag appearance DB, 105 tag program DB, 106 temporary broadcast station estimation unit, 107 broadcast station determination unit, 108 program estimation unit, 109 program information DB 110 Program information rewriting part, 111 Program information setting part, 200 network.
- the present invention relates to an estimation device, an estimation method, and a program, and in particular, can be used for a broadcasting station that broadcasts a program referred to by a blog and a technique for estimating the program using the collected characteristic words of the blog. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Information Transfer Between Computers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
更に、現実の字幕テキストは、シナリオや放送倫理規定に沿った正しい日本語である。一方、ユーザがテレビを見ながらブログに入力する場合は、省略語、スラング、タグが多用されるカジュアルな日本語になりがちである。特にタレント名や番組名などは省略語にされやすいため、思い通りの推定精度が得られない。これを解決する既知の技術として、正しい日本語と、省略語、スラング、タグとの類義語辞書を作成することで、推定精度を高めることも考えられているが、日々未知の単語が現れる類義語辞書をメンテナンスしていくことは高コストである。
また、確かにブロガーの行動様式としてタグを付けることは良く行われているものの、そのタグは決して放送局が指定したものではなく、自然発生的に決まったものであって、出現頻度にもばらつきがあり、番組情報と結びつけることが難しかった。
本発明の別の態様は、推定方法である。この方法は、個人がwebサイトへ書き込んだ文字列と、その文字列を書き込んだ時刻情報とが含まれるブログをネットワークを介して収集し、前記ブログの文字列中に出現するタグを取り出し、前記ブログと取り出したタグとを対応づけてタグ出現データベースに格納するステップと、前記ブログの文字列中に出現する特徴語をもとに前記ブログで言及されている番組を放送する放送局を推定し、推定した放送局を仮放送局として前記ブログと対応づけてブログデータベースに格納するステップと、前記タグ出現データベース中に格納されたブログであって所定の時刻範囲内に書き込まれたブログの文字列中に出現するタグの数が所定の閾値を越えた場合、前記ブログデータベースを参照して前記タグを文字列中に含むブログで言及されている番組の仮放送局を集計し、集計結果をもとに前記ブログで言及されている番組を放送する放送局を確定するステップとをプロセッサに実行させる。
(実施の形態1)
ここで、投稿固有IDは個々のブログ記事に特有のID(IDentification)である。本発明では特に投稿固有IDの書式は限定しない。また、ブログ記事を収集するブログサイトについても、特に限定するものではない。
更に、タグ取出部103は、取り出したタグがタグとそのタグから推定された番組情報の関連付けを示したタグ番組DB105にまだ登録されていない新しいタグであるかを判断し(S5)、新しいタグであるならばタグ番組DB105に登録する(S6)。タグ番組DB105は図6に示したデータ構造を持つ。なお、この時点では確定時刻(タグと番組情報の関連付けが決定した時刻)と番組情報、および後述する確定放送局は不明であるため、空である。
本発明の実施の形態では、この仮放送局の推定に、図8の例に示したような放送局ごとに予め用意した特徴語データを用いる。この特徴語データは、ブログ本文中に出てくる用語であり、必ずしも上述したタグとは限らない。特徴語データには優先度スコアを予め定めておく。例えば、図3の投稿固有ID=06565406541では、ブログ本文の「Bテレビ」という文字列のみにマッチするので「放送局B」を仮放送局と推定する。この方法は、従来技術のように、EPGや字幕テキストを形態素解析して特徴語を取り出すような必要がなく、推定に掛かる計算コストを大幅に抑えることができる。
仮放送局を推定する精度を上げるために、図8のように特徴語データ毎に優先度のスコア付けを行い、ブログDB102の仮放送局に一時保持し、一つのブログ本文が複数の放送局の特徴語データにマッチする場合(例えば「放送局Aの衛星放送を見てみなよ」というブログ本文は、放送局Aと放送局Aの衛星放送との両者にマッチする)、放送局ごとに合計し、最も仮合計スコアが高い放送局を仮放送局と推定してもよい。この優先度スコアも、特徴語同様、予め定められた値であり、各放送局向けに一度設定した静的なデータである。マッチした仮合計スコアは、ブログDB102の所定のカラムへ一時保存する。
なお、本実施例では説明の簡単化のために、前記規定の時刻範囲を10分間に固定した例を説明しているが、定期的に時刻を監視し、現在放送中の番組情報から、番組の開始時刻と終了時刻を取り出し、現在時刻が終了時刻に重なった時点で、時刻範囲Raを決定することで、時刻範囲を可変にしてもよい。
放送局確定部107は、タグ出現DB104に記録した投稿固有IDを元にブログDB102から#prog1というタグが含まれている最も古い投稿時刻(20:50:22)から最新の投稿時刻(22:02:20)まで時刻範囲Rb内のブログリストLbを取得する(S10)。放送局確定部107は次に、このブログリストLbから仮放送局の出現回数を集計してランキングを作成し、最も出現回数の多い仮放送局を#prog1が示す確定放送局と特定する(S11)。
なお、本実施例においてブログDB102は、ブログ収集部101が収集した過去1週間分のブログを格納し、それよりも過去のブログは削除しているものとする。また、説明の簡単化のために、タグTが含まれている最も古い投稿時刻をそのまま時刻範囲Rbに採用したが、最も古い時刻は、最新の投稿時刻と同一日時であるといった制限を加えて、同一番組が異なる地方局で日時をずらして放送するといった編成に対処してもよい。
1位:放送局A=296件 (合計スコア 485.0)
2位:放送局E=6件 (合計スコア 8.5)
3位:放送局F=1件 (合計スコア 2.0)
不明=2件(不明率0.7%)
であった場合には、「放送局A」を確定放送局とする。この方法により、一つ一つのブログ記事から推定した仮放送局のばらつきや誤差を抑えることができ、推定精度を格段に高めることができる。
また、本実施例では説明の簡単化のために必ず放送局が確定するようになっているが、より精度を高めるために、仮放送局の分布を統計的に判定し、棄却条件に一致(ここで棄却条件は、例えば、1位と2位の合計優先度スコアに殆ど差がない場合、不明率が全体の投稿件数に対して著しく大きい(例えば不明率30%以上など)場合を指す)した場合には、タグTが特定の放送局や番組情報を示すものではないとして、番組情報の推定には用いないと判断してもよい。
更に、前記ブログリストLbの投稿時刻を元にして、それぞれの番組放送時間中の投稿件数を集計すると、ユーザがリアルタイムに番組を実況しているという行動様式から、以下のように特定の番組についての投稿件数が非常に大きくなる。
「放送局名:放送局A
タイトル:番組5
放送時間:20:45-21:00
前記ブログリストLb中の投稿件数:1件」、
「放送局名:放送局A
タイトル:番組6
放送時間:21:00-22:00
前記ブログリストLb中の投稿件数:5件」、
「放送局名:放送局A
タイトル:番組1
放送時間:22:00-22:45
前記ブログリストLb中の投稿件数:299件」。
この結果から、番組推定部108は、前記ブログリストLb中の投稿件数が最も多い「番組1」がタグ#prog1が示す該当番組であるとして推定する(S13)。ここでも、複数の番組に関連するブログのばらつきや誤差を抑えることで、推定精度を高めている。
例えば、図3に示した投稿固有ID=06565406567は、ブログ本文からは仮放送局を推定できていなかったが、前記手法によって、図4に示したように放送局が確定されている。また、図4および図7は同様にして番組情報が確定した状態を示している。
図4の投稿固有ID=06565406542は#dairyというタグを含んでいるものの、そのタグが特定の番組と関連づけられる結果が得られなかったため、番組を引用しているブログではないと判別できる。
(実施の形態2)
既にタグ番組DB105に登録されているタグである場合、番組情報設定部111は、そのタグに関連付けられている番組情報があるかどうかを判断し(S16)、番組情報がある場合には、ブログの投稿時刻がタグ番組DB105にある確定時刻+閾値γ以内であるかどうかを判断し(S17)、範囲内であればタグが同一の番組情報を示していると推定し、ブログDB102の番組情報および放送局リストを書き換える(S18)。ここで閾値γは、タグと番組情報との対応関係を求める際に基準とする番組推定基準時刻範囲である。
上記の説明では、放送局確定部107が、タグ出現DB104に記録した投稿固有IDを元にブログDB102から特定のタグに関するブログリストLbを取得し、取得したブログリストLbから仮放送局の出現回数を集計してランキングを作成し、最も出現回数の多い仮放送局をそのタグが示す確定放送局と特定する場合に説明した。放送局確定部107によるタグと放送局との対応の付け方は、出現回数の最大値に基づく方法に限らない。放送局確定部107は、確定放送局を求める際に、単にブログリストLbに出現する仮放送局の出現回数を集計するのみならず、その集計結果をさらに解析することで確定放送局を求めてもよい。以下、放送局確定部107によるタグと放送局との対応付けの、別の例について説明する。
上記の説明では、番組推定部108は、ブログリストLb中の投稿件数が最も多い「番組」を、タグが示す該当番組であるとして推定する場合について説明した。番組推定部108によるタグと番組との対応の付け方は、投稿件数の最大値に基づく方法に限らない。以下、番組推定部108によるタグと番組との対応付けの、別の例について説明する。
Claims (6)
- 個人がwebサイトへ書き込んだ文字列と、その文字列を書き込んだ時刻情報とが含まれるブログをネットワークを介して収集し、前記ブログの文字列中に出現するタグを取り出し、前記ブログと取り出したタグとを対応づけてタグ出現データベースに格納するタグ取出部と、
前記ブログの文字列中に出現する特徴語をもとに前記ブログで言及されている番組を放送する放送局を推定し、推定した放送局を仮放送局として前記ブログと対応づけてブログデータベースに格納する仮放送局推定部と、
前記タグ出現データベース中に格納されたブログであって所定の時刻範囲内に書き込まれたブログの文字列中に出現するタグの数が所定の閾値を越えた場合、前記ブログデータベースを参照して前記タグを文字列中に含むブログで言及されている番組の仮放送局を集計し、集計結果をもとに前記ブログで言及されている番組を放送する放送局を確定する放送局確定部とを含むことを特徴とする推定装置。 - 各放送局が放送する番組の放送時間帯を格納した番組情報データベースを参照し、前記放送局確定部が確定した放送局が放送する番組の中で、前記番組の放送時間帯中に書き込まれた前記ブログの文字列中に出現する前記タグの数をもとに前記タグが示す番組を推定し、前記タグと推定した番組とを対応づけてタグ番組データベースに格納する番組推定部をさらに含むことを特徴とする請求項1に記載の推定装置。
- 前記タグ出現データベース中に格納されたブログであって所定の時刻範囲内に書き込まれたブログのうち前記番組推定部が番組を推定したタグを含むブログを、前記番組および前記放送局確定部が確定した放送局と対応づけて前記ブログデータベースに格納する番組情報書換部をさらに含むことを特徴とする請求項2に記載の推定装置。
- ネットワークを介して収集したブログの文字列中に、前記番組推定部が番組と対応づけたタグを含む場合、前記ブログが書き込まれた時刻がタグの所定の時刻範囲に収まっていることを条件として、前記タグに対応づけられた番組および前記番組を放送する放送局を前記ブログと対応づけて前記ブログデータベースに格納する番組情報設定部をさらに含むことを特徴とする請求項2から3のいずれかに記載の推定装置。
- 個人がwebサイトへ書き込んだ文字列と、その文字列を書き込んだ時刻情報とが含まれるブログをネットワークを介して収集し、前記ブログの文字列中に出現するタグを取り出し、前記ブログと取り出したタグとを対応づけてタグ出現データベースに格納するステップと、
前記ブログの文字列中に出現する特徴語をもとに前記ブログで言及されている番組を放送する放送局を推定し、推定した放送局を仮放送局として前記ブログと対応づけてブログデータベースに格納するステップと、
前記タグ出現データベース中に格納されたブログであって所定の時刻範囲内に書き込まれたブログの文字列中に出現するタグの数が所定の閾値を越えた場合、前記ブログデータベースを参照して前記タグを文字列中に含むブログで言及されている番組の仮放送局を集計し、集計結果をもとに前記ブログで言及されている番組を放送する放送局を確定するステップとをプロセッサに実行させることを特徴とする推定方法。 - 個人がwebサイトへ書き込んだ文字列と、その文字列を書き込んだ時刻情報とが含まれるブログをネットワークを介して収集し、前記ブログの文字列中に出現するタグを取り出し、前記ブログと取り出したタグとを対応づけてタグ出現データベースに格納する機能と、
前記ブログの文字列中に出現する特徴語をもとに前記ブログで言及されている番組を放送する放送局を推定し、推定した放送局を仮放送局として前記ブログと対応づけてブログデータベースに格納する機能と、
前記タグ出現データベース中に格納されたブログであって所定の時刻範囲内に書き込まれたブログの文字列中に出現するタグの数が所定の閾値を越えた場合、前記ブログデータベースを参照して前記タグを文字列中に含むブログで言及されている番組の仮放送局を集計し、集計結果をもとに前記ブログで言及されている番組を放送する放送局を確定する機能とをコンピュータに実現させることを特徴とするプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011800158325A CN102822821A (zh) | 2010-11-24 | 2011-10-13 | 推定装置、推定方法及程序 |
KR1020127025031A KR101381138B1 (ko) | 2010-11-24 | 2011-10-13 | 추정 장치, 추정 방법, 그리고 프로그램을 기록한 기록매체 |
EP11843976.9A EP2573688A4 (en) | 2010-11-24 | 2011-10-13 | ESTIMATING APPARATUS, ESTIMATING METHOD, AND PROGRAM |
US13/612,161 US20130013625A1 (en) | 2010-11-24 | 2012-09-12 | Estimating apparatus, estimating method, and program |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010261775 | 2010-11-24 | ||
JP2010-261775 | 2010-11-24 | ||
JP2011215271A JP2012129982A (ja) | 2010-11-24 | 2011-09-29 | 推定装置、推定方法、並びにプログラム |
JP2011-215271 | 2011-09-29 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/612,161 Continuation US20130013625A1 (en) | 2010-11-24 | 2012-09-12 | Estimating apparatus, estimating method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012070182A1 true WO2012070182A1 (ja) | 2012-05-31 |
Family
ID=46145556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/005735 WO2012070182A1 (ja) | 2010-11-24 | 2011-10-13 | 推定装置、推定方法、並びにプログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US20130013625A1 (ja) |
EP (1) | EP2573688A4 (ja) |
JP (1) | JP2012129982A (ja) |
KR (1) | KR101381138B1 (ja) |
CN (1) | CN102822821A (ja) |
WO (1) | WO2012070182A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014021727A (ja) * | 2012-07-18 | 2014-02-03 | Nippon Hoso Kyokai <Nhk> | 情報抽出装置及びプログラム |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5723844B2 (ja) * | 2012-10-01 | 2015-05-27 | シャープ株式会社 | 情報通信システムおよび携帯端末装置 |
RU2595524C2 (ru) * | 2014-09-29 | 2016-08-27 | Общество С Ограниченной Ответственностью "Яндекс" | Устройство и способ обработки содержимого веб-ресурса в браузере |
JP6889323B2 (ja) * | 2019-07-16 | 2021-06-18 | 株式会社 ディー・エヌ・エー | ライブ動画を配信するためのシステム、方法、及びプログラム |
KR20220031551A (ko) | 2019-07-16 | 2022-03-11 | 가부시키가이샤 디에누에 | 라이브 동영상을 배신하기 위한 시스템, 방법, 및 프로그램 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008124861A (ja) | 2006-11-14 | 2008-05-29 | Funai Electric Co Ltd | テレビジョン放送視聴システム及びテレビジョン放送受信装置 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040268403A1 (en) * | 2003-06-26 | 2004-12-30 | Microsoft Corporation | Context-sensitive television tags |
JP4333516B2 (ja) * | 2004-08-05 | 2009-09-16 | ソニー株式会社 | 記録制御装置および方法、並びにプログラム |
US8055715B2 (en) * | 2005-02-01 | 2011-11-08 | i365 MetaLINCS | Thread identification and classification |
JP2007274605A (ja) * | 2006-03-31 | 2007-10-18 | Fujitsu Ltd | 電子装置、放送番組情報の収集方法、その収集プログラム及びその収集システム |
JP2008099172A (ja) * | 2006-10-16 | 2008-04-24 | Sony Corp | 記録装置および方法、並びにプログラム |
US20080215607A1 (en) * | 2007-03-02 | 2008-09-04 | Umbria, Inc. | Tribe or group-based analysis of social media including generating intelligence from a tribe's weblogs or blogs |
US20090037387A1 (en) * | 2007-08-02 | 2009-02-05 | Alticast Corp. | Method for providing contents and system therefor |
US7519658B1 (en) * | 2008-05-02 | 2009-04-14 | International Business Machines Corporation | Automatic blogging during media viewing |
US8346708B2 (en) * | 2009-01-22 | 2013-01-01 | Nec Laboratories America, Inc. | Social network analysis with prior knowledge and non-negative tensor factorization |
US9165085B2 (en) * | 2009-11-06 | 2015-10-20 | Kipcast Corporation | System and method for publishing aggregated content on mobile devices |
-
2011
- 2011-09-29 JP JP2011215271A patent/JP2012129982A/ja not_active Abandoned
- 2011-10-13 EP EP11843976.9A patent/EP2573688A4/en not_active Withdrawn
- 2011-10-13 KR KR1020127025031A patent/KR101381138B1/ko active IP Right Grant
- 2011-10-13 WO PCT/JP2011/005735 patent/WO2012070182A1/ja active Application Filing
- 2011-10-13 CN CN2011800158325A patent/CN102822821A/zh active Pending
-
2012
- 2012-09-12 US US13/612,161 patent/US20130013625A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008124861A (ja) | 2006-11-14 | 2008-05-29 | Funai Electric Co Ltd | テレビジョン放送視聴システム及びテレビジョン放送受信装置 |
Non-Patent Citations (4)
Title |
---|
KO FUJIMURA: "Blog Mining and Visualization Technologies equipped with BLOGRANGER TG", IEICE TECHNICAL REPORT, vol. 108, no. 119, 23 June 2008 (2008-06-23), pages 57 - 62, XP008166600 * |
See also references of EP2573688A4 |
TAKANORI OIKAWA ET AL.: "Jimaku Text no Riyo ni yoru Blog de In'yo sareta Television Bangumi no Suitei", DAI 2 KAI FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT -DEIM 2010- RONBUNSHU, 25 May 2010 (2010-05-25), pages 1 - 5, XP008166684, Retrieved from the Internet <URL:http://db-event.jpn.org/deim2010/proceedings/files/D6-4.pdf> * |
YASUHIKO WATANABE ET AL.: "Multiple Media Database System for TV Newscasts and Newspapers", IEICE TECHNICAL REPORT, vol. 97, no. 593, 12 March 1998 (1998-03-12), pages 57 - 64, XP008166602 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014021727A (ja) * | 2012-07-18 | 2014-02-03 | Nippon Hoso Kyokai <Nhk> | 情報抽出装置及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
KR101381138B1 (ko) | 2014-04-10 |
EP2573688A4 (en) | 2014-03-19 |
JP2012129982A (ja) | 2012-07-05 |
KR20120133387A (ko) | 2012-12-10 |
EP2573688A1 (en) | 2013-03-27 |
US20130013625A1 (en) | 2013-01-10 |
CN102822821A (zh) | 2012-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106331778B (zh) | 视频推荐方法和装置 | |
CN103412881B (zh) | 提供搜索结果的方法及系统 | |
US20090094189A1 (en) | Methods, systems, and computer program products for managing tags added by users engaged in social tagging of content | |
US20150350732A1 (en) | Assessing digital content across a communications network | |
US20150332373A1 (en) | Method and system for pushing mobile application | |
US20150205580A1 (en) | Method and System for Sorting Online Videos of a Search | |
JP5144838B1 (ja) | 情報処理装置、情報処理方法、及び、プログラム | |
US20130216203A1 (en) | Keyword-tagging of scenes of interest within video content | |
CN105046600A (zh) | 一种酒店用电视的个性化信息推荐方法及系统 | |
WO2012070182A1 (ja) | 推定装置、推定方法、並びにプログラム | |
CN102265276A (zh) | 基于上下文的推荐系统 | |
CN105981067A (zh) | 针对视频的各个部分提供评论和统计信息的装置及其方法 | |
WO2018113673A1 (zh) | 针对综艺类query的搜索结果的推送方法及装置 | |
CN101764661A (zh) | 基于数据融合的视频节目推荐系统 | |
CN107885745A (zh) | 一种歌曲推荐方法及装置 | |
JP2011108117A (ja) | 話題特定システム、話題特定装置、クライアント端末、プログラム、話題特定方法、および情報処理方法 | |
CN105183925A (zh) | 内容关联推荐方法及装置 | |
CN110881131B (zh) | 一种直播回看视频的分类方法及其相关装置 | |
CN110896488A (zh) | 一种直播间的推荐方法以及相关设备 | |
CN102495840A (zh) | 一种用于视频网站的搜索引擎 | |
CN105574030A (zh) | 一种信息搜索方法及装置 | |
CN104796777A (zh) | 一种热点视频信息推送方法 | |
CN112989824A (zh) | 信息推送方法及装置、电子设备及存储介质 | |
CN107193870B (zh) | 网页内容的提取方法和系统 | |
CN103605742A (zh) | 识别网络资源实体目录页的方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180015832.5 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11843976 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20127025031 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011843976 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |