Claims (30)
一種使用短訊息之媒體事件結構與內容辨識方法,該方法包括:使用至少一計算裝置,藉由對一短訊息之集合進行採樣,獲得複數個使用者之一短訊息樣本,該等短訊息係於一媒體事件的廣播期間所獲得;使用該至少一計算裝置與該短訊息樣本,辨識該媒體事件中之一片段,包括使用該短訊息樣本,辨識該被辨識片段的開始;以及使用該至少一計算裝置,從該短訊息樣本辨識所採用之至少一用詞,該至少一用詞為該被辨識片段之上下文的指示。
A media event structure and content identification method using a short message, the method comprising: using at least one computing device, by sampling a set of short messages, obtaining a short message sample of a plurality of users, the short message system Obtaining during a broadcast of a media event; using the at least one computing device and the short message sample to identify a segment of the media event, including using the short message sample to identify a beginning of the identified segment; and using the at least A computing device identifies at least one word used from the short message sample, the at least one word being an indication of a context of the identified segment.
如申請專利範圍第1項所述之方法,進一步包括:使用該至少一計算裝置,從一短訊息的集合選擇該短訊息樣本,所述選擇包括從該複數個使用者之至少一使用者選擇短訊息,該至少一使用者則為具有至少一門檻數量訂閱者的一追隨發佈者(followcaster)。
The method of claim 1, further comprising: selecting the short message sample from a set of short messages using the at least one computing device, the selecting comprising selecting from at least one user of the plurality of users The short message, the at least one user is a followcaster with at least one threshold number of subscribers.
如申請專利範圍第2項所述之方法,所述辨識該媒體事件中之一片段進一步包括:使用該至少一計算裝置,利用被辨識具有至少一門檻數量訂閱者而成為一追隨發佈者的至少一使用者的短訊息活動,辨識該媒體事件中之該片段。
The method of claim 2, wherein the identifying one of the media events further comprises: using the at least one computing device, using at least one threshold number of subscribers identified as a follower publisher A user's short message activity identifies the segment in the media event.
如申請專利範圍第1項所述之方法,進一步包括:使用該至少一計算裝置,從一短訊息的集合選擇該短訊息樣本,所述選擇包括選擇一會話形式的短訊息。
The method of claim 1, further comprising: selecting the short message sample from a set of short messages using the at least one computing device, the selecting comprising selecting a short message in the form of a session.
如申請專利範圍第4項所述之方法,其中該會話形式訊息包含一種該訊息被指向至一或多個使用者的指示。
The method of claim 4, wherein the conversational form message includes an indication that the message is directed to one or more users.
如申請專利範圍第5項所述之方法,其中該指示包括一種鏈結該訊息傳送者與該一或多個使用者的指示器。
The method of claim 5, wherein the indication comprises an indicator linking the sender of the message to the one or more users.
如申請專利範圍第1項所述之方法,所述辨識該媒體事件中之一片段進一步包括:使用該至少一計算裝置,利用被辨識成為一會話形式訊息的短訊息,辨識該媒體事件中之該片段。
The method of claim 1, wherein the identifying one of the media events further comprises: using the at least one computing device to identify the media event by using a short message identified as a conversational form message The fragment.
如申請專利範圍第1項所述之方法,所述辨識該媒體事件中之一片段進一步包括:使用該至少一計算裝置與該短訊息樣本,決定代表在該短訊息樣本中所使用之一用詞的複數個詞頻分數,該複數個詞頻分數的每一個都對應於該媒體事件之一時間窗,並為在該對應時間窗中包含該用詞之短訊息數量的指示;使用該至少一計算裝置與該短訊息樣本,為每一個詞頻分數決定對應於該複數個詞頻分數之複數個正規化頻率分數,該對應正規化頻率分數包括該詞頻分數與一文集詞頻的比率,該文集詞頻為在包含該用詞之樣本中短訊息數量的指示;使用該至少一計算裝置與代表該用詞所辨識之複數個正規化詞頻分數,決定一最大正規化詞頻分數;以及使用該至少一計算裝置,從對應於被決定為代表該用
詞之最大正規化頻率分數的時間窗辨識該片段。
The method of claim 1, wherein the identifying one of the media events further comprises: using the at least one computing device and the short message sample, determining to represent one of the short message samples used a plurality of word frequency scores of the words, each of the plurality of word frequency scores corresponding to a time window of the media event, and an indication of the number of short messages including the word in the corresponding time window; using the at least one calculation And the short message sample, for each word frequency score, determining a plurality of normalized frequency scores corresponding to the plurality of word frequency scores, the corresponding normalized frequency score including a ratio of the word frequency score to an corpus of word frequency, the lexical frequency of the corpus is An indication of the number of short messages in the sample of the word; determining, by using the at least one computing device, a plurality of normalized word frequency scores identified by the word, determining a maximum normalized word frequency score; and using the at least one computing device, From the corresponding to be determined to represent the use
The time window of the maximum normalized frequency score of the word identifies the segment.
如申請專利範圍第1項所述之方法,所述從該短訊息樣本辨識所採用之至少一用詞進一步包括:使用該至少一計算裝置與該短訊息樣本,決定代表在該短訊息樣本中所使用之一用詞的複數個詞頻分數,該複數個詞頻分數的每一個都對應於該媒體事件之一時間窗,並為在該對應時間窗中包含該用詞之短訊息數量的指示;使用該至少一計算裝置與為該用詞所辨識之複數個詞頻分數,決定該用詞的使用頻率在對應於該被辨識片段的時間時是否相對較高;以及使用該至少一計算裝置,如果該用詞的使用頻率在對應於該被辨識片段的時間相對較高時,便將該用詞辨識為該被辨識片段一上下文的用詞指示。
The method of claim 1, wherein the at least one word used for identifying the short message sample further comprises: using the at least one computing device and the short message sample, determining to represent the short message sample a plurality of word frequency scores of the word used, each of the plurality of word frequency scores corresponding to a time window of the media event, and an indication of the number of short messages including the word in the corresponding time window; Using the at least one computing device and the plurality of word frequency scores identified for the word, determining whether the frequency of use of the word is relatively high at a time corresponding to the identified segment; and using the at least one computing device if When the frequency of use of the word is relatively high corresponding to the time of the recognized segment, the word is recognized as a word indication of the context of the recognized segment.
如申請專利範圍第9項所述之方法,其中該詞頻分數的每一個都包括一正規化頻率分數,該正規化頻率分數包括指示包含代表該時間窗之用詞之短訊息數量的一詞頻與指示在包含該用詞之樣本中該短訊息數量之一文集詞頻的比率。
The method of claim 9, wherein each of the word frequency scores includes a normalized frequency score, the normalized frequency score including a word frequency indicating a number of short messages including words representing the time window. A ratio indicating the frequency of the corpus of the short message number in the sample containing the word.
一種使用短訊息之媒體事件結構與內容辨識系統,該系統包括:至少一計算裝置,用於:藉由對一短訊息之集合進行採樣,獲得複數個使用者之一短訊息樣本,該等短訊息係於一媒體事件的廣播期間所獲得;
使用該短訊息樣本,辨識該媒體事件中之一片段,包括使用該短訊息樣本,辨識該被辨識片段的開始;以及從該短訊息樣本辨識所採用之至少一用詞,該至少一用詞為該被辨識片段之上下文的指示。
A media event structure and content identification system using a short message, the system comprising: at least one computing device, configured to: obtain a short message sample of a plurality of users by sampling a set of short messages, the short The message was obtained during the broadcast of a media event;
Using the short message sample to identify one of the media events, including using the short message sample to identify the beginning of the identified segment; and identifying at least one word from the short message sample, the at least one term An indication of the context of the identified segment.
如申請專利範圍第11項所述之系統,該至少一計算裝置進一步用於:從一短訊息的集合選擇該短訊息樣本,所述選擇包括從該複數個使用者之至少一使用者選擇短訊息,該至少一使用者則為具有至少一門檻數量訂閱者的一追隨發佈者(followcaster)。
The system of claim 11, wherein the at least one computing device is further configured to: select the short message sample from a set of short messages, the selecting comprising selecting at least one user from the plurality of users to select a short message The message that the at least one user is a followcaster with at least one threshold number of subscribers.
如申請專利範圍第12項所述之系統,用於辨識該媒體事件中一片段之該至少一計算裝置,進一步用於:利用被辨識具有至少一門檻數量訂閱者而成為一追隨發佈者的至少一使用者的短訊息活動,辨識該媒體事件中之該片段。
The system of claim 12, wherein the at least one computing device for identifying a segment of the media event is further configured to: use at least one threshold number of subscribers to be identified as a follower publisher A user's short message activity identifies the segment in the media event.
如申請專利範圍第11項所述之系統,該至少一計算裝置進一步用於:從一短訊息的集合選擇該短訊息樣本,所述選擇包括選擇一會話形式的短訊息。
The system of claim 11, wherein the at least one computing device is further configured to: select the short message sample from a set of short messages, the selecting comprising selecting a short message in a conversational form.
如申請專利範圍第14項所述之系統,其中該會話形式訊息包含一種該訊息被指向至一或多個使用者的指示。
The system of claim 14, wherein the conversational form message includes an indication that the message is directed to one or more users.
如申請專利範圍第15項所述之系統,其中該指示包括一種鏈結該訊息傳送者與該一或多個使用者的指示
器。
The system of claim 15 wherein the indication comprises an indication linking the sender of the message to the one or more users
Device.
如申請專利範圍第11項所述之系統,用於辨識該媒體事件中一片段之該至少一計算裝置,進一步用於:利用被辨識成為一會話形式訊息的短訊息,辨識該媒體事件中之該片段。
The system of claim 11, wherein the at least one computing device for identifying a segment of the media event is further configured to: identify the media event by using a short message identified as a conversational form message The fragment.
如申請專利範圍第11項所述之系統,用於辨識該媒體事件中一片段之該至少一計算裝置,進一步用於:使用該短訊息樣本,決定代表在該短訊息樣本中所使用之一用詞的複數個詞頻分數,該複數個詞頻分數的每一個都對應於該媒體事件之一時間窗,並為在該對應時間窗中包含該用詞之短訊息數量的指示;使用該短訊息樣本,為每一個詞頻分數決定對應於該複數個詞頻分數之複數個正規化頻率分數,該對應正規化頻率分數包括該詞頻分數與一文集詞頻的比率,該文集詞頻為在包含該用詞之樣本中短訊息數量的指示;使用代表該用詞所辨識之複數個正規化詞頻分數,決定一最大正規化詞頻分數;以及從對應於被決定為代表該用詞之最大正規化頻率分數的時間窗辨識該片段。
The system of claim 11, wherein the at least one computing device for identifying a segment of the media event is further configured to: use the short message sample to determine whether to use one of the short message samples. Using a plurality of word frequency scores of the words, each of the plurality of word frequency scores corresponding to one of the time events of the media event, and indicating that the number of short messages of the word is included in the corresponding time window; using the short message a sample, for each word frequency score, determining a plurality of normalized frequency fractions corresponding to the plurality of word frequency scores, the corresponding normalized frequency score including a ratio of the word frequency score to an corpus of word frequency, the lexical frequency of the corpus is including the word An indication of the number of short messages in the sample; determining a maximum normalized word frequency score using a plurality of normalized word frequency scores identified by the word; and a time corresponding to a maximum normalized frequency score determined to represent the term The window recognizes the segment.
如申請專利範圍第11項所述之系統,用於從該短訊息樣本辨識所採用至少一用詞之該至少一計算裝置,進一步用於:使用該短訊息樣本,決定代表在該短訊息樣本中所使用之一用詞的複數個詞頻分數,該複數個詞頻分數的
每一個都對應於該媒體事件之一時間窗,並為在該對應時間窗中包含該用詞之短訊息數量的指示;使用為該用詞所辨識之複數個詞頻分數,決定該用詞的使用頻率在對應於該被辨識片段的時間時是否相對較高;以及如果該用詞的使用頻率在對應於該被辨識片段的時間相對較高時,便將該用詞辨識為該被辨識片段一上下文的用詞指示。
The system of claim 11, wherein the at least one computing device for identifying the at least one word from the short message sample is further configured to: use the short message sample to determine a representative of the short message sample One of the word frequency scores used in one of the words, the plural frequency fraction
Each of them corresponds to a time window of the media event, and is an indication of the number of short messages containing the word in the corresponding time window; using a plurality of word frequency scores identified for the word, determining the word Whether the frequency of use is relatively high when the time corresponding to the recognized segment is; and if the frequency of use of the word is relatively high at a time corresponding to the identified segment, the term is recognized as the recognized segment A contextual indication is used.
如申請專利範圍第19項所述之系統,其中該詞頻分數的每一個都包括一正規化頻率分數,該正規化頻率分數包括指示包含代表該時間窗之用詞之短訊息數量的一詞頻與指示在包含該用詞之樣本中該短訊息數量之一文集詞頻的比率。
The system of claim 19, wherein each of the word frequency scores comprises a normalized frequency score, the normalized frequency score comprising a word frequency indicating a number of short messages including words representing the time window. A ratio indicating the frequency of the corpus of the short message number in the sample containing the word.
一種電腦可讀儲存媒介,其上明確儲存電腦可讀指令以使用短訊息樣本進行媒體事件結構與內容之辨識,該等指令包括進行以下事項之指令:藉由對一短訊息之集合進行採樣,獲得複數個使用者之一短訊息樣本,該等短訊息係於一媒體事件的廣播期間所獲得;使用該短訊息樣本,辨識該媒體事件中之一片段,包括使用該短訊息樣本以辨識該被辨識片段的開始之手段;以及從該短訊息樣本辨識所採用之至少一用詞,該至少一用詞為該被辨識片段之上下文的指示。
A computer readable storage medium having explicitly stored computer readable instructions for identifying a media event structure and content using short message samples, the instructions including instructions for: sampling a collection of short messages, Obtaining a short message sample of a plurality of users, the short message being obtained during a broadcast of a media event; using the short message sample to identify a segment of the media event, including using the short message sample to identify the short message sample Means of the beginning of the identified segment; and identifying at least one word used from the short message sample, the at least one term being an indication of the context of the identified segment.
如申請專利範圍第21項所述之電腦可讀儲存媒介,
該等指令進一步包括進行以下事項之指令:從一短訊息的集合選擇該短訊息樣本,所述選擇包括從該複數個使用者之至少一使用者選擇短訊息,該至少一使用者則為具有至少一門檻數量訂閱者的一追隨發佈者(followcaster)。
A computer readable storage medium as described in claim 21,
The instructions further include instructions for selecting the short message sample from a set of short messages, the selecting comprising selecting a short message from at least one user of the plurality of users, the at least one user having At least one threshold number of followers of a follower (followcaster).
如申請專利範圍第22項所述之電腦可讀儲存媒介,所述辨識該媒體事件中之一片段的指令,進一步包括進行以下事項之指令:利用被辨識具有至少一門檻數量訂閱者而成為一追隨發佈者的至少一使用者的短訊息活動,辨識該媒體事件中之該片段。
The computer readable storage medium of claim 22, wherein the instruction to identify one of the media events further comprises an instruction to: use a subscriber identified as having at least one threshold to become a subscriber A short message activity of at least one user following the publisher identifies the segment in the media event.
如申請專利範圍第21項所述之電腦可讀儲存媒介,該等指令進一步包括進行以下事項之指令:從一短訊息的集合選擇該短訊息樣本,所述選擇包括選擇一會話形式的短訊息。
The computer readable storage medium of claim 21, wherein the instructions further comprise instructions for: selecting the short message sample from a set of short messages, the selecting comprising selecting a short message in a conversational form .
如申請專利範圍第24項所述之電腦可讀儲存媒介,其中該會話形式訊息包含一種該訊息被指向至一或多個使用者的指示。
The computer readable storage medium of claim 24, wherein the conversational form message comprises an indication that the message is directed to one or more users.
如申請專利範圍第25項所述之電腦可讀儲存媒介,其中該指示包括一種鏈結該訊息傳送者與該一或多個使用者的指示器。
The computer readable storage medium of claim 25, wherein the indication comprises an indicator linking the sender of the message to the one or more users.
如申請專利範圍第21項所述之電腦可讀儲存媒介,所述辨識該媒體事件中之一片段的指令,進一步包括進行以下事項之指令:利用被辨識成為一會話形式訊息的短訊息,辨識該媒
體事件中之該片段。
The computer readable storage medium of claim 21, wherein the instruction to identify one of the media events further comprises: an instruction to identify a short message identified as a conversational form message; The media
The fragment in the body event.
如申請專利範圍第21項所述之電腦可讀儲存媒介,所述辨識該媒體事件中之一片段的指令,進一步包括進行以下事項之指令:使用該短訊息樣本,決定代表在該短訊息樣本中所使用之一用詞的複數個詞頻分數,該複數個詞頻分數的每一個都對應於該媒體事件之一時間窗,並為在該對應時間窗中包含該用詞之短訊息數量的指示;使用該短訊息樣本,為每一個詞頻分數決定對應於該複數個詞頻分數之複數個正規化頻率分數,該對應正規化頻率分數包括該詞頻分數與一文集詞頻的比率,該文集詞頻為在包含該用詞之樣本中短訊息數量的指示;使用代表該用詞所辨識之複數個正規化詞頻分數,決定一最大正規化詞頻分數;以及從對應於被決定為代表該用詞之最大正規化頻率分數的時間窗辨識該片段。
The computer readable storage medium of claim 21, wherein the instruction to identify one of the media events further comprises: an instruction to: use the short message sample to determine a representative of the short message sample One of the plurality of word frequency scores used in the word, each of the plurality of word frequency scores corresponding to one of the time events of the media event, and an indication of the number of short messages containing the word in the corresponding time window Using the short message sample, determining a plurality of normalized frequency scores corresponding to the plurality of word frequency scores for each word frequency score, the corresponding normalized frequency score including a ratio of the word frequency score to an corpus of word frequency, the lexical frequency of the corpus is An indication of the number of short messages in the sample of the word; determining a maximum normalized word frequency score using a plurality of normalized word frequency scores identified by the word; and from the largest regularity corresponding to being determined to represent the term The time window of the frequency fraction identifies the segment.
如申請專利範圍第21項所述之電腦可讀儲存媒介,所述從該短訊息樣本辨識所採用之至少一用詞的指令,進一步包括進行以下事項之指令:使用該短訊息樣本,決定代表在該短訊息樣本中所使用之一用詞的複數個詞頻分數,該複數個詞頻分數的每一個都對應於該媒體事件之一時間窗,並為在該對應時間窗中包含該用詞之短訊息數量的指示;使用為該用詞所辨識之複數個詞頻分數,決定該用詞
的使用頻率在對應於該被辨識片段的時間時是否相對較高;以及如果該用詞的使用頻率在對應於該被辨識片段的時間相對較高時,便將該用詞辨識為該被辨識片段一上下文的用詞指示。
The computer readable storage medium of claim 21, wherein the at least one word instruction used to identify the short message sample further includes an instruction to: use the short message sample to determine a representative One of the plurality of word frequency scores of the word used in the short message sample, each of the plurality of word frequency scores corresponding to one of the time events of the media event, and including the word in the corresponding time window An indication of the number of short messages; the use of a plurality of word frequency scores identified for the word to determine the word
Whether the frequency of use is relatively high when the time corresponding to the identified segment is; and if the frequency of use of the word is relatively high at a time corresponding to the identified segment, the term is recognized as the identified The segment is indicated by a contextual word.
如申請專利範圍第29項所述之電腦可讀儲存媒介,其中該詞頻分數的每一個都包括一正規化頻率分數,該正規化頻率分數包括指示包含代表該時間窗之用詞之短訊息數量的一詞頻與指示在包含該用詞之樣本中該短訊息數量之一文集詞頻的比率。
The computer readable storage medium of claim 29, wherein each of the word frequency scores includes a normalized frequency score, the normalized frequency score including a number of short messages indicating a word representing the time window. The frequency of a word and the ratio of the frequency of the episode indicating the number of the short messages in the sample containing the word.