TWI722327B

TWI722327B - Audio-visual content and user interaction sequence analysis system and method

Info

Publication number: TWI722327B
Application number: TW107136683A
Authority: TW
Inventors: 葉丙成; 鄭曜忻; 郭家良; 周靖昌
Original assignee: 泛學優有限公司
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2021-03-21
Also published as: TW202016905A

Abstract

本發明提供一種影音內容與用戶互動序列分析系統及方法，其包含使用者自定義互動元件、使用者分析模組、影音內容分析模組、訊息分析模組、時序互動模組以及整合多序列分析模組，其中整合多序列分析模組收集前述模組所提供之使用者分群資料、影像物件序列、音訊序列、訊息序列及時序互動序列，以產生於多媒體資訊時間軸上任一互動時間點與互動因子相關之學習行為量化結果。以及產生專注力評估，預測當下使用者在學習過程中的專注力降低之時間點並改善，令使用者真正的觀看多媒體資訊之內容，以及增進學習興趣。 The present invention provides a system and method for analyzing audio-visual content and user interaction sequence, which includes user-defined interactive components, user analysis modules, audio-visual content analysis modules, message analysis modules, time-series interactive modules, and integrated multi-sequence analysis A module that integrates a multi-sequence analysis module to collect the user grouping data, image object sequence, audio sequence, message sequence, and time-series interaction sequence provided by the aforementioned module, to generate any interaction time point and interaction on the multimedia information timeline Quantified results of factor-related learning behavior. And to generate a concentration assessment, predict the current point of time when the user's concentration in the learning process is reduced and improve, so that the user can really watch the content of the multimedia information and increase the interest in learning.

Description

Audio-visual content and user interaction sequence analysis system and method

本發明係為一種學習系統及方法，尤為透過蒐集使用者互動資訊，達到學習資訊的適性化評估與用戶互動序列分析的系統及方法。 The present invention is a learning system and method, in particular, a system and method that achieves adaptive evaluation of learning information and analysis of user interaction sequence by collecting user interaction information.

隨網路及個人電子設備的進步，舊有補習班或家教等授課方式，已逐漸被多媒體教學所取代。其中在個人進修或企業教育訓練領域中，線上多媒體教學不受限於固定授課時間、不受限於固定授課地點、能夠隨時停止、重播、快轉等特色亦是其受青睞之主因。 With the advancement of the Internet and personal electronic equipment, the old teaching methods such as cram school or home tutoring have gradually been replaced by multimedia teaching. Among them, in the field of personal training or corporate education and training, online multimedia teaching is not limited to a fixed teaching time, not limited to a fixed teaching location, and the ability to stop at any time, replay, fast forward and other features are also the main reasons for its popularity.

現有的多媒體教學往往是將學習資訊由教學系統端傳送課程相關的多媒體資訊給使用者端，但對於使用者的真正參與以及學習成效，卻無法衡量或得到回饋，更無法藉由使用者的學習程度調整適切的學習內容，造成單方向的學習資訊傳遞。雖目前有以播放學習資訊的次數、時數等數據做為學習成效的衡量方式，但依然無法得到使用者的回饋而進一步調整適切的學習內容。以企業教育訓練為例，使用者通常以閒置的電腦來播放學習資訊，以拉長觀看時數，但並未有真正的參與學習或做出的學習行為的回應。此外，學習成效及衡量在不同領域、類型甚至是不同講師的學習資訊下，使用者互動的意義也不同，因此必須有加權衡量的計算方式。 Existing multimedia teaching often sends learning information from the teaching system to the user. However, the real participation and learning effectiveness of the user cannot be measured or received, and it is impossible to rely on the user’s learning. The degree is adjusted to the appropriate learning content, resulting in a unidirectional transmission of learning information. Although data such as the number of times and hours of playing learning information are currently used as a measure of learning effectiveness, it is still impossible to obtain feedback from users to further adjust appropriate learning content. Taking corporate education and training as an example, users usually use idle computers to broadcast learning information to lengthen the viewing time, but there is no real response to participating in learning or making learning behaviors. In addition, learning effectiveness and measurement. Under the learning information of different fields, types and even different lecturers, the meaning of user interaction is different, so there must be a weighted measurement. Calculation.

由此可見，如何利用使用者的學習歷史記錄，以及蒐集分析學習過程中的回饋資訊，達到學習資訊的適性化，為各方所亟欲改善之焦點。 It can be seen that how to use the user's learning history and collect and analyze the feedback information in the learning process to achieve the appropriateness of the learning information is the focus that all parties urgently want to improve.

本發明提供一種影音內容與用戶互動序列分析系統，主要包含使用者自定義互動元件、使用者分析模組、影音內容分析模組、訊息分析模組、時序互動模組以及整合多序列分析模組。使用者自定義互動元件設於多媒體資訊介面，其中有多個互動因子，如問題、回答問題、筆記、重點標記、表情符號、快轉、倒轉等，各互動因子係提供輸入互動資料。使用者分析模組將資料庫中的所有使用者之歷史學習行為資料，依據歷史學習行為資料計算距離並分群，以及將當下使用者歸類至所對應的使用者分群，產生與分群結果相關的使用者分群資料。影音內容分析模組輸入之多媒體資訊中所屬的影像資料內之物件作標記，產生與多媒體資訊中各時間點相關的影像物件序列，以及將輸入之多媒體資訊中所屬的音訊資料進行音頻分析，並計算各音框的音高，產生與多媒體資訊中各時間點相關的音訊序列。訊息分析模組利用資料庫中所提取之文本，針對互動資料進行之目的分類，產生與互動資料之目的相關的訊息序列，時序互動模組將互動資料輸入時間以及多媒體資訊時間軸中所對應的時間點關連產生互動時間點，再將多媒體資訊時間軸中各互動時間點結合產生時序互動序列。整合多序列分析模組收集前述使用者分群資料、影像物件序列、音訊序列、訊息序列及時序互動序列，產生於多媒體資訊時間軸上任一互動時間點與互動因子相關之學習行為量化結果。 The present invention provides a sequence analysis system for audio-visual content and user interaction, which mainly includes user-defined interactive components, user analysis modules, audio-visual content analysis modules, message analysis modules, time-series interactive modules, and integrated multi-sequence analysis modules . The user-defined interactive elements are set in the multimedia information interface. There are multiple interactive factors, such as questions, answer questions, notes, key marks, emoticons, fast forward, reverse, etc. Each interactive element provides input interactive data. The user analysis module calculates the distance and grouping the historical learning behavior data of all users in the database based on the historical learning behavior data, and classifies the current user into the corresponding user group, and generates related results of the grouping User grouping data. The audio-visual content analysis module inputs the objects in the image data belonging to the multimedia information to mark, generate a sequence of image objects related to each time point in the multimedia information, and perform audio analysis on the audio data belonging to the input multimedia information, and Calculate the pitch of each sound frame, and generate an audio sequence related to each time point in the multimedia information. The message analysis module uses the text extracted from the database to classify the interactive data according to the purpose of the interactive data, and generates a message sequence related to the purpose of the interactive data. The time-series interactive module inputs the interactive data into the time and the corresponding information in the multimedia information timeline. The time point is connected to generate an interactive time point, and then the interactive time points in the multimedia information timeline are combined to generate a time-series interactive sequence. Integrate multi-sequence analysis module to collect the aforementioned user grouping data, image object sequence, and audio Sequences, message sequences, and time-series interactive sequences are generated at any interactive time point on the multimedia information timeline and the quantitative results of learning behaviors related to the interactive factors.

其中更進一步設定互動因子之權重，透過整合多序列分析模組與使用者分群資料之關聯，產生與各使用者分群與互動因子對應的學習行為量化結果。 Among them, the weight of the interaction factor is further set, and the quantified results of learning behavior corresponding to each user's grouping and interaction factor are generated by integrating the association between the multi-sequence analysis module and the user grouping data.

本發明提供一種影音內容與用戶互動序列分析方法，其步驟包含：在設置於多媒體資訊介面的使用者自定義互動元件中的互動因子內，輸入互動資料；將資料庫中的所有使用者之歷史學習行為資料，藉由使用者分析模組依據歷史學習行為資料計算距離及分群後，將當下使用者歸類之對應的使用者分群，並產生使用者分群資料；多媒體資訊中所屬的影像資料，透過影音內容分析模組將影像資料內之物件作標記，產生與多媒體資訊中各時間點相關的影像物件序列；多媒體資訊中所屬的音訊資料，透過影音內容分析模組計算音訊資料各音框的音高的音頻分析，產生與多媒體資訊中各時間點相關的音訊序列；提取資料庫中之文本，並透過訊息分析模組針對互動資料進行目的分類，產生與互動資料之目的相關的訊息序列；將各互動資料輸入時間與多媒體資訊中對應各時間點透過時序互動模組產生互動時間點，再將多媒體資訊時間軸中各互動時間點結合產生時序互動序列；以及將使用者分群資料、影像物件序列、音訊序列、訊息序列及時序互動序列透過整合多序列分析模組與互動資料關聯，產生於多媒體資訊中任一時間點之學習行為量化結果。 The present invention provides a method for analyzing audio-visual content and user interaction sequence. The steps include: inputting interactive data in the interaction factor in the user-defined interactive element set in the multimedia information interface; and compiling the history of all users in the database Learning behavior data. After calculating the distance and grouping based on the historical learning behavior data by the user analysis module, the current user is classified into corresponding user groups, and the user grouping data is generated; the image data in the multimedia information, Mark the objects in the image data through the audio-visual content analysis module to generate a sequence of image objects related to each time point in the multimedia information; the audio data belonging to the multimedia information is calculated through the audio-visual content analysis module for each audio frame of the audio data Pitch audio analysis generates audio sequences related to each time point in multimedia information; extracts text in the database, and classifies interactive data through the message analysis module to generate a message sequence related to the purpose of the interactive data; The input time of each interactive data and the corresponding time points in the multimedia information are generated through the time-series interactive module to generate the interaction time points, and then the interactive time points in the multimedia information timeline are combined to generate a time-series interactive sequence; and the user is grouped into data and image objects Sequence, audio sequence, message sequence and timing interaction sequence Through the integration of multi-sequence analysis modules and interactive data associations, the quantitative results of learning behaviors generated at any point in the multimedia information.

其中利用本發明提供一種影音內容與用戶互動序列分析方法中之時序互動模組將多媒體資訊時間軸上各互動時間點及對應各學習行為量化結果收集，產生一長短期記憶模型。 Among them, the present invention provides a time-series interaction module in a method for analyzing audio-visual content and user interaction sequence to collect the quantified results of each interaction time point on the multimedia information time axis and corresponding learning behaviors to generate a long- and short-term memory model.

其中將當下使用者與長短期記憶模型比對，依據學習行為量化結果將當下使用者歸類至長短期記憶模型中具有相似學習型為之使用者分群中。 Among them, the current users are compared with the long-term short-term memory model, and the current users are classified into groups of users with similar learning types in the long-term short-term memory model based on the quantitative results of learning behavior.

其透過長短期記憶模型，判斷當下使用者所屬之使用者分群，並依據使用者分群在多媒體資訊中的時間軸上之各互動時間點之分布及各互動時間點之對應的學習行為量化結果，評估專注力指數。 It uses the long- and short-term memory model to determine the user group to which the current user belongs, and based on the distribution of each interaction time point on the time axis of the user grouping in the multimedia information and the quantified results of the corresponding learning behavior at each interaction time point. Assess concentration index.

其中透過長短期記憶模型，判斷當下使用者所屬之使用者分群，並依據使用者分群在多媒體資訊中的時間軸上專注力指數較低的時間區間，啟動使用者自定義互動元件上之互動因子，提供輸入互動資料。 Among them, the long and short-term memory model is used to determine the user group to which the current user belongs, and the interaction factor on the user-defined interactive component is activated according to the time interval of the user group's lower concentration index on the time axis in the multimedia information , Provide input interactive information.

上列詳細說明係針對本發明之一可行實施例之具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。 The above detailed description is a specific description of a possible embodiment of the present invention, but this embodiment is not intended to limit the scope of the patent of the present invention. Any equivalent implementation or modification that does not deviate from the technical spirit of the present invention should be included in In the scope of the patent in this case.

綜上所述，本案不但在空間型態上確屬創新，並能較習用物品增進上述多項功效，應已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請貴局核准本件發明專利申請案，以勵發明，至感德便。 To sum up, this case is not only innovative in terms of spatial form, but also can enhance the above-mentioned multiple functions compared with conventional articles. It should fully meet the requirements of novel and progressive statutory invention patents. An application is filed in accordance with the law, and you are kindly requested to approve it. This invention patent application is to encourage invention, and it is easy to feel the virtue.

100‧‧‧使用者自定義互動元件 100‧‧‧User-defined interactive components

110‧‧‧互動因子 110‧‧‧Interaction factor

200‧‧‧使用者分析模組 200‧‧‧User Analysis Module

300‧‧‧訊息分析模組 300‧‧‧Message Analysis Module

400‧‧‧影音內容分析模組 400‧‧‧Video and audio content analysis module

500‧‧‧時序互動模組 500‧‧‧Timing Interactive Module

600‧‧‧整合多序列分析模組 600‧‧‧Integrated multi-sequence analysis module

S301~S308‧‧‧步驟流程 S301~S308‧‧‧Step Process

圖1為本發明之使用者自定義互動元件之示意圖。 Figure 1 is a schematic diagram of a user-defined interactive component of the present invention.

圖2為本發明之影音內容與用戶互動序列分析系統之示意圖。 Fig. 2 is a schematic diagram of the system for analyzing the sequence of video and audio content and user interaction of the present invention.

圖3為本發明之影音內容與用戶互動序列分析方法之流程圖。 Fig. 3 is a flow chart of the method for analyzing the sequence of video and audio content and user interaction of the present invention.

圖4為本發明之長短期記憶模型之示意圖。 Figure 4 is a schematic diagram of the long short-term memory model of the present invention.

圖5為本發明之專注力預測及激勵機制之示意圖。 Figure 5 is a schematic diagram of the concentration prediction and incentive mechanism of the present invention.

為利貴審查委員了解本發明之技術特徵、內容與優點及其所能達到之功效，茲將本發明配合附圖，並以實施例之表達形式詳細說明如下，而其中所使用之圖式，其主旨僅為示意及輔助說明書之用，未必為本發明實施後之真實比例與精準配置，故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍，合先敘明。 In order to facilitate the reviewers to understand the technical features, content and advantages of the present invention and its achievable effects, the present invention is described in detail with the accompanying drawings and in the form of embodiment expressions as follows, and the diagrams used therein are as follows: The subject matter is only for the purpose of illustration and auxiliary description, and may not be the true proportions and precise configuration after the implementation of the invention. Therefore, it should not be interpreted in terms of the proportions and configuration relationships of the accompanying drawings, and should not limit the scope of rights of the invention in actual implementation. Hexian stated.

請參閱圖1，為本發明之使用者自定義互動元件之示意圖，如圖所示，使用者自定義互動元件100設於軟體介面或多媒體資訊介面，使用者自定義互動元件100具有多個互動因子110提供輸入互動資料，其中互動因子110包含了提出問題、回答問題、筆記、重點標記、表情符號、快轉、倒轉等，當使用者開啟互動因子110時，可在其開啟的對話框或欄位內鍵入文字內容、符號、滑鼠點擊等，作為輸入互動資料。其中系統管理者可針對互動因子100之權重及互動的意義進行調整，例如圖1中所示：合作行30%、溝通行為40%、衝突行為30%。 Please refer to FIG. 1, which is a schematic diagram of the user-defined interactive element of the present invention. As shown in the figure, the user-defined interactive element 100 is set in a software interface or a multimedia information interface, and the user-defined interactive element 100 has multiple interactions. The factor 110 provides input interactive data. The interaction factor 110 includes asking questions, answering questions, notes, key marks, emoticons, fast forwarding, rewinding, etc. When the user turns on the interactive factor 110, you can open the dialog box or Type text content, symbols, mouse clicks, etc. in the fields as input interactive data. The system administrator can adjust the weight of the interaction factor 100 and the meaning of interaction, as shown in Figure 1: cooperative behavior 30%, communication behavior 40%, and conflict behavior 30%.

請參閱圖2，為本發明之影音內容與用戶互動序列分析系統之示意圖，其中包含使用者分析模組200、訊息分析模組300、影音內容分析模組400、時序互動模組500及整合多序列分析模組600。其中使用者分析模組200將資料庫中的所有使用者之歷史學習行為資料，如使用資料分群技術，例如K-平均演算法(K-means)、階層式分群法(Hierarchical clustering)，依據歷史學習行為資料計算距離並分群，並將當下使用者歸類至所對應的使用者分群，將本次分群的結果作為使用者分群資料，其中使用者分群資料為使用者過往學習歷程或使用平台相關的統計資訊項目，例如：問題答對率、平均觀看時長、好友數量等。其中以K-平均演算法為例，假設以使用者的問題答對率與平均觀看時長兩個變數分別作為分群的變數，即可將此關係以一個二維的座標系表示，首先隨機選擇k個中心，並計算每個資料點χ_j與各個中心μ_i的距離(使用歐式距離)，以各點距離最近的中心作為分群，再依據分群結果，重新計算一新的中心點S，以下列公式，不斷迭代此過程以得到最小值J：

訊息分析模組300利用資料庫中提取對應之文本，針對當下使用者所輸入之互動資料進行目的分類，如使用意圖分類(Intent classification)方法，並產生訊息序列，其中意圖分類是以使用者輸入的文字訊息，進一步判斷使用者輸入該文字訊息的目的是什麼，如於資料庫中事先定義多項類別，例如：飛機、火車、捷運等類別的交通工具後，當使用長短期記憶模型(long-short term memory，LSTM)時，則依據使用者所輸入之文字訊息的內容，判斷輸出使用者最有可能需要搭乘的交通工具類別，例如：使用者輸入“我想從台灣去日本”，則LSTM會輸出“飛機”，藉此意圖分析以判斷訊息序列之目的內容，如使用者的提問、抱怨、閒聊。影音內容分析模組400將輸入之多媒體資訊中所屬的影像資料內之物件作標記，如辨識影像資料內之物件為何物，產生與多媒體資訊中各時間點相關的該影像物件序列，以及將輸入之多媒體資訊中所屬的音訊資料進行音頻分析，如先將整個音訊資料切成多個音框，再以基音檢測算法(Harmonic product spectrum method)對原始音訊做多次向下取樣(Down sampling)，將取樣後對原音訊的壓縮的音訊合併，以凸顯基頻的高點，並計算各音框的音高、排除不穩定的音高及平滑化，產生與多媒體資訊中各時間點相關的音訊序列，如各時間點之音高及頻率。時序互動模組500將輸入互動資料之時間點，與輸入互動資料之時多媒體資訊時間軸中所對應的時間點作為一互動時間點，並產生包含了多個互動時間點的時序互動序列。整合多序列分析模組600將上述使用者分群資料、影像物件序列、音訊序列、訊息序列及時序互動序列結合，產生一學習行為量化結果，由此學習行為量化結果可得知，在多媒體資訊時間軸中任一互動時間點上，當下使用者及其所述的使用者分群中之使用者，與多媒體資訊中的影像或聲音產生之互動結果。亦可由多媒體資訊時間軸中上互動時間點之分布，以建立一長短期記憶模型。 Please refer to FIG. 2, which is a schematic diagram of the video content and user interaction sequence analysis system of the present invention, which includes a user analysis module 200, a message analysis module 300, a video content analysis module 400, a time sequence interaction module 500, and integrated multiple Sequence analysis module 600. The user analysis module 200 collects the historical learning behavior data of all users in the database, such as using data clustering techniques, such as K-means and Hierarchical clustering, according to history The learning behavior data is calculated by distance and grouped into groups, and the current users are classified into corresponding user groups. The result of this grouping is used as the user grouping data, where the user grouping data is related to the user's past learning history or use of the platform Statistical information items of, such as: correct answer rate, average viewing time, number of friends, etc. Take the K-average algorithm as an example. Assuming that the user’s question answer rate and the average viewing time are used as the variables for grouping, the relationship can be expressed in a two-dimensional coordinate system. First, k is selected randomly. Calculate the distance between each data point χ _j and each center μ _i (using Euclidean distance), and use the center with the closest distance to each point as the grouping, and then recalculate a new center point S according to the grouping result, with the following Formula, continue to iterate this process to get the minimum value J:

The message analysis module 300 extracts the corresponding text from the database, and classifies the interactive data currently input by the user, such as using the intent classification method, and generates a message sequence, where the intent classification is input by the user To further determine the purpose of the user’s input of the text message. For example, after defining multiple categories in the database in advance, such as airplanes, trains, and MRT vehicles, when using the long and short-term memory model (long -short term memory, LSTM), based on the content of the text message entered by the user, determine the type of transportation that the user is most likely to take. For example: the user enters "I want to go from Taiwan to Japan", then LSTM will output "aircraft" to analyze the intention to determine the purpose of the message sequence, such as user questions, complaints, and small chats. The audio-visual content analysis module 400 marks the object in the image data belonging to the input multimedia information, such as identifying the object in the image data, generating the sequence of the image object related to each time point in the multimedia information, and inputting Audio analysis is performed on the audio data belonging to the multimedia information. For example, the entire audio data is cut into multiple sound frames, and then the original audio is down-sampled multiple times by the Harmonic product spectrum method. Combine the compressed audio of the original audio after sampling to highlight the high point of the fundamental frequency, calculate the pitch of each sound frame, eliminate unstable pitch and smooth, and generate audio related to each time point in the multimedia information Sequence, such as pitch and frequency at each point in time. The time-series interaction module 500 regards the time point at which the interactive data is input, and the time point corresponding to the time axis of the multimedia information when the interactive data is input as an interaction time point, and generates a time-series interaction sequence including multiple interaction time points. The integrated multi-sequence analysis module 600 combines the above-mentioned user grouping data, image object sequence, audio sequence, message sequence, and time-series interaction sequence to generate a learning behavior quantification result. From the learning behavior quantification result, it can be known that in the multimedia information time At any interaction time point in the axis, the interaction result of the current user and the users in the said user group with the video or sound in the multimedia information. It is also possible to establish a long and short-term memory model based on the distribution of interactive time points in the multimedia information timeline.

請參閱圖3，為本發明之一種影音內容與用戶互動序列分析方法之流程圖，其步驟如下：S301：使用者播放多媒體資訊，其中在多媒體資訊介面上設有使用者自定義互動元件； S302：使用者在使用者自定義互動元件中之互動因子內輸入互動資料；S303：使用者分析模組將資料庫中的所有使用者之歷史學習行為資料，依據歷史學習行為資料計算距離及分群後，將當下使用者歸類之對應的使用者分群，並將此分群結果作為使用者分群資料；S304：將多媒體資訊中所屬的影像資料，透過影音內容分析模組將影像資料內之物件作標記，產生與多媒體資訊中各時間點相關的影像物件序列；S305：多媒體資訊中所屬的音訊資料，透過該影音內容分析模組計算音訊資料各音框的音高的音頻分析，產生與多媒體資訊中各時間點相關的音訊序列；S306：提取該資料庫中之文本，訊息分析模組提針對所輸入的互動資料進行目的分類，產生與互動資料之目的相關的該訊息序列；S307：將互動資料輸入時間點與多媒體資訊時間軸所對應時間點作為互動時間點，並收集時間軸上多個互動時間點，產生時序互動序列；以及S308：整合多序列分析模組將上述使用者分群資料、影像物件序列、音訊序列、訊息序列及時序互動序列結合，產生在多媒體資訊時間軸中在該互動時間點上對應的學習行為量化結果。當使用者再次開啟使用者自定義互動元件中之互動因子時，則重複上述S302~S308之步驟，產生在多媒體資訊時間軸中在該互動時間點上對應的學習行為量化結果。 Please refer to FIG. 3, which is a flowchart of a method for analyzing audio-visual content and user interaction sequence of the present invention. The steps are as follows: S301: the user plays multimedia information, wherein a user-defined interactive element is provided on the multimedia information interface; S302: The user inputs interactive data in the interactive factors in the user-defined interactive component; S303: The user analysis module calculates the distance and grouping based on the historical learning behavior data of all users in the database After that, group the corresponding users into which the current user is classified, and use the grouping result as the user grouping data; S304: use the video data in the multimedia information to use the video content analysis module to make the objects in the video data Mark, generate a sequence of image objects related to each time point in the multimedia information; S305: Audio data belonging to the multimedia information, calculate the audio analysis of the pitch of each audio frame of the audio data through the audio-visual content analysis module, and generate the multimedia information S306: extract the text in the database, and the message analysis module classifies the input interactive data to generate the message sequence related to the purpose of the interactive data; S307: interact The data input time point and the time point corresponding to the multimedia information time axis are used as the interaction time point, and multiple interaction time points on the time axis are collected to generate a time-series interaction sequence; and S308: integrated multi-sequence analysis module to group the above-mentioned user data, The image object sequence, audio sequence, message sequence, and time-series interactive sequence are combined to generate a quantitative result of the corresponding learning behavior at the interactive time point in the multimedia information timeline. When the user turns on the interaction factor in the user-defined interactive element again, the steps of S302 to S308 are repeated to generate a quantitative result of the learning behavior corresponding to the interactive time point in the multimedia information timeline.

其中上述S302~S307之步驟，係當收到當下使用者於互動因子中輸入互動資料後執行，執行順序並不在此限。 The steps of S302 to S307 mentioned above are executed after receiving the interaction data entered by the current user in the interaction factor, and the execution sequence is not limited to this.

建立長短期記憶模型 Building a long and short-term memory model

如圖4所示，時序互動模組將多媒體資訊時間軸上各該互動時間點及對應各該學習行為量化結果收集並建立一長短期記憶模型，換而言之，將使用者分群資料、影像物件序列、音訊序列、訊息序列及時序互動序列作為輸入參數，以建立能夠達成自我學習、專注力評估及適性化之長短期記憶模型。 As shown in Figure 4, the time-series interactive module divides the multimedia information on the time axis. The interaction time point and the quantitative results of each learning behavior are collected and a long- and short-term memory model is collected and constructed. In other words, user grouping data, image object sequence, audio sequence, message sequence, and time-series interaction sequence are used as input parameters to create A long-term and short-term memory model capable of achieving self-learning, concentration assessment and adaptability.

長短期記憶模型之自我學習 Self-learning of long and short-term memory models

長短期記憶模型透過將當下使用者所輸入之學習行為量化結果與長短期記憶模型比對，依據學習行為量化結果將當下使用者歸類至長短期記憶模型中具有相似學習型為之使用者分群中，以自我學習。 The long and short-term memory model compares the quantified learning behavior input by the current user with the long-term short-term memory model, and classifies the current users into groups of users with similar learning behaviors in the long-term short-term memory model based on the quantified learning behavior results In order to learn by yourself.

此長短期記憶模型可隨著資料庫中使用者之學習行為資料量的增加而加以學習，並即時的將當下使用者歸類至所對應的使用者分群中，以進行後續各項預測。 This long and short-term memory model can be learned as the amount of learning behavior data of users in the database increases, and the current users can be classified into corresponding user groups in real time to make subsequent predictions.

長短期記憶模型之專注力評估 Concentration assessment of long-term short-term memory model

透過長短期記憶模型，判斷當下使用者所屬之使用者分群，並依據此使用者分群在多媒體資訊中的時間軸上之各互動時間點之分布及各互動時間點之學習行為量化結果，評估專注力指數。 Through the long and short-term memory model, determine the user group to which the current user belongs, and evaluate the concentration based on the distribution of each interaction time point of the user group on the time axis of the multimedia information and the quantitative results of learning behavior at each interaction time point Power index.

藉由當下使用者或所屬之使用者分群，在多媒體資訊進行中專注力指數與時間軸對應的變化，提供進一步預測可能產生的互動行為或作為系統管理者調整多媒體資訊內容之依據。 With the current user or user grouping, the change of the concentration index and the time axis in the progress of multimedia information can provide further predictions of possible interactive behaviors or serve as a basis for system administrators to adjust the content of multimedia information.

長短期記憶模型之專注力預測及激勵機制 Concentration prediction and incentive mechanism of long and short-term memory models

如圖5所示，透過長短期記憶模型，判斷當下使用者所屬之使用者分群，並依據此使用者分群在多媒體資訊中的時間軸上專注力指數較低的時間區間中，啟動該使用者自定義互動元件上之互動因子，提供輸入互動資料，並透過增加互動因子之權重以增加使用者輸入互動資料之動機，以及提升學習行為量化結果。 As shown in Figure 5, through the long and short-term memory model, determine the user group to which the current user belongs, and activate the user based on the user group in the time interval with a low concentration index on the time axis in the multimedia information Customize the interaction factor on the interactive component to improve It is used to input interactive data, and increase the user's motivation to input interactive data by increasing the weight of the interaction factor, and improve the quantitative results of learning behavior.

藉由長短期記憶模型的專注力評估，以取代現有以觀看時數做為評估學習成效評估之依據，能預測當下使用者於多媒體資訊中可能出現專注力指數降低的時間點，並在該時間點上啟動互動因子，以提供獎勵、積分等誘因，獲取當下使用者之互動回饋及增加注意力，有效改善以閒置電腦播放學習資訊之問題，令當下使用者真正的觀看多媒體資訊之內容，以及增進當下使用者之興趣。 By using the focus evaluation of the long and short-term memory model to replace the existing viewing hours as the basis for evaluating learning effectiveness, it can predict the current user’s time point in the multimedia information at which the focus index may decrease, and at that time Click to activate the interactive factor to provide incentives, points and other incentives to obtain interactive feedback from current users and increase attention, effectively improve the problem of playing learning information on idle computers, and enable current users to truly watch the content of multimedia information, and Enhance the interest of current users.

綜上所述，本案不僅於技術思想上確屬創新，並具備習用之現有方法所不及之上述多項功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請貴局核准本件發明專利申請案，以勵發明，至感德便。 To sum up, this case is not only innovative in terms of technical ideas, but also has the above-mentioned multiple functions that are not available in conventional existing methods. It has fully met the requirements of novel and progressive statutory invention patents. I file an application in accordance with the law. I implore your office. Approval of this invention patent application to encourage invention, so that it will be convenient.

110‧‧‧互動因子 110‧‧‧Interaction factor

200‧‧‧使用者分析模組 200‧‧‧User Analysis Module

300‧‧‧訊息分析模組 300‧‧‧Message Analysis Module

500‧‧‧時序互動模組 500‧‧‧Timing Interactive Module

Claims

A video and audio content and user interaction sequence analysis system, comprising: an integrated multi-sequence analysis module, the integrated multi-sequence analysis module uses a user grouping data, a video object sequence, an audio sequence, a message sequence and A time-series interaction sequence that generates a quantitative result of a learning behavior related to an interaction factor at any interaction time point of the current user on the multimedia information timeline; a user-defined interactive element, the user-defined interactive element is set in the multimedia The information interface includes a plurality of the interactive factors, and each of the interactive factors provides input interactive data; a user analysis module that compiles the historical learning behavior data of all users in a database, Calculate the distance and group according to historical learning behavior data, and classify the current user into the corresponding user group to generate the user grouping data related to the grouping result; a video content analysis module, the video content analysis module Mark the objects in the image data belonging to the multimedia information to generate a sequence of the image objects related to each time point in the multimedia information, and the audio-visual content analysis module performs audio analysis on the audio data belonging to the multimedia information, And calculate the pitch of each sound frame to generate the audio sequence related to each time point in the multimedia information; a message analysis module, the message analysis module uses the text extracted from the database for the purpose of interactive data Classification to generate the message sequence related to the purpose of the interactive data; a time-series interactive module that relates the input time of the interactive data and the corresponding time point in the multimedia information timeline and generates the interactive time point, and then Combine the interactive time points in the multimedia information time axis to generate the time-series interactive sequence.

For example, the audio-visual content and user interaction sequence analysis system described in item 1 of the scope of patent application, wherein the interaction factor includes asking questions, answering questions, notes, key marks, emoticons, fast forwarding, and rewinding.

For example, the audio-visual content and user interaction sequence analysis system described in item 1 of the scope of patent application, wherein the weight of the interaction factor is further set, and the association between the integrated multi-sequence analysis module and the user grouping data is generated and used The quantified results of the learning behaviors corresponding to the interaction factor.

A method for analyzing audio-visual content and user interaction sequence. The steps include: inputting interactive data in an interactive factor on a user-defined interactive element of a multimedia information interface; learning the history of all users in a database Behavior data. After calculating the distance and grouping based on historical learning behavior data by a user analysis module, the current user is classified into corresponding user groups, and a user grouping data is generated; the image data belonging to the multimedia information , Through an audio-visual content analysis module to mark the objects in the image data to generate an image object sequence related to each time point in the multimedia information; the audio data belonging to the multimedia information is calculated through the audio-visual content analysis module The audio analysis of the pitch of each sound frame generates an audio sequence related to each time point in the multimedia information; extracts the text in the database, and classifies the interactive data through a message analysis module to generate and interact data A message sequence related to the purpose of each interactive data; the input time of each interactive data and the corresponding time point in the multimedia information are generated through a time-series interactive module, and then the interactive time points in the multimedia information timeline are combined to produce Generate a time-series interactive sequence; and associate the user grouping data, the image object sequence, the audio sequence, the message sequence, and the time-series interactive sequence with the interactive data through an integrated multi-sequence analysis module, and generate any information in the multimedia information. A quantitative result of learning behavior at one point in time.

For example, the method for analyzing the video content and user interaction sequence described in item 4 of the scope of patent application, wherein the interaction factor includes asking questions, answering questions, notes, key marks, emoticons, fast forwarding, and rewinding.

For example, the method for analyzing audio-visual content and user interaction sequence described in item 4 of the scope of patent application, wherein the weight of the interaction factor is further set, and the association between the integrated multi-sequence analysis module and the user grouping data is used to generate and use The quantified results of the learning behaviors corresponding to the interaction factor.

For example, the method for analyzing audio-visual content and user interaction sequence described in item 4 of the scope of patent application, wherein the time-series interaction module collects each interaction time point on the multimedia information time axis and the corresponding learning behavior quantification result to generate a long and short-term memory model.

For example, the method for analyzing video content and user interaction sequence described in item 7 of the scope of patent application, in which the current user is compared with the long and short-term memory model, and the current user is classified into the long and short-term memory model based on the quantitative results of learning behavior Groups of users with similar learning styles.

For example, the method for analyzing the video content and the user interaction sequence described in item 8 of the scope of patent application, which uses the long and short-term memory model to determine the user group to which the current user belongs, and based on the user group on the time axis in the multimedia information The distribution of each interaction time point and the quantitative results of the corresponding learning behavior of each interaction time point are evaluated as a concentration index.

As described in item 9 of the scope of patent application, the method of analyzing the sequence of interaction between audio and video content and users, wherein Through the long and short-term memory model, determine the user group to which the current user belongs, and activate the user-defined interactive component based on the user grouping in the time interval of the time axis in the multimedia information when the concentration index is low The interactive factor provides input interactive data.