TW201918901A - Topic providing apparatus and could file prompting method thereof - Google Patents

Topic providing apparatus and could file prompting method thereof Download PDF

Info

Publication number
TW201918901A
TW201918901A TW106137724A TW106137724A TW201918901A TW 201918901 A TW201918901 A TW 201918901A TW 106137724 A TW106137724 A TW 106137724A TW 106137724 A TW106137724 A TW 106137724A TW 201918901 A TW201918901 A TW 201918901A
Authority
TW
Taiwan
Prior art keywords
database
vocabulary
keyword
topics
content
Prior art date
Application number
TW106137724A
Other languages
Chinese (zh)
Other versions
TWI656448B (en
Inventor
許庭瑋
王昱鈞
林春風
陳嬿如
翁慈佳
Original Assignee
中華電信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中華電信股份有限公司 filed Critical 中華電信股份有限公司
Priority to TW106137724A priority Critical patent/TWI656448B/en
Application granted granted Critical
Publication of TWI656448B publication Critical patent/TWI656448B/en
Publication of TW201918901A publication Critical patent/TW201918901A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A topic providing apparatus and a cloud file prompting method thereof are provided. In the method, an interactive text content is obtained, the interactive text content is converted from multimedia conversation content of at least two users. Key vocabulary is obtained from the interactive text content. Multiple topics recorded in a file database is screened according to the key vocabulary. Then, content (such as cloud file, teaching material, etc.) the screened topic can be provided to the user. Accordingly, multiple field can be implemented, and the users in discussion can obtain extra reference data.

Description

主題提供裝置及其雲儲存檔案提示方法Theme providing device and cloud storage file prompting method thereof

本發明是有關於一種機器學習及人工智慧技術的語意分析領域,且特別是有關於一種主題提供裝置及其雲儲存檔案提示方法。The invention relates to the field of semantic analysis of machine learning and artificial intelligence technology, and in particular to a theme providing device and a cloud storage file prompting method thereof.

近年來人工智慧係相當熱門的技術之一,各大型電子公司無不投入大量人力及資金來研發。其中,自然語言處理係人工智慧技術中重要的研究項目,其主要係探討如何處理及運用自然語言,並讓電腦能理解人類的語言。以往,人們可能需要透過鍵盤或滑鼠等輸入裝置來輸入預設的幾個關鍵字,電腦才能提供回應。藉由自然語言處理技術,電腦可理解人類正常談話內容,使大眾能以更簡單且方便的方式與智能產品(例如,智慧型手機、機器人、智慧管家等)互動。此外,人工智慧技術中另一熱門研究項目係機器學習,其主要係讓電腦能夠從輸入資料中自動學習。而機器學習目前已經廣泛應用在資料探勘、自然語言處理、生物特徵辨識、搜尋引擎等領域,可見其重要性。In recent years, one of the most popular technologies in the field of artificial intelligence, all major electronics companies have invested a lot of manpower and funds to research and development. Among them, natural language processing is an important research project in artificial intelligence technology, which mainly discusses how to deal with and use natural language, and let computers understand human language. In the past, people might need to enter a preset number of keywords through an input device such as a keyboard or a mouse to provide a response. With natural language processing technology, computers can understand the normal conversations of human beings, enabling the public to interact with smart products (eg, smart phones, robots, smart butlers, etc.) in a simpler and more convenient way. In addition, another popular research project in artificial intelligence technology is machine learning, which mainly enables computers to learn automatically from input materials. Machine learning has been widely used in data exploration, natural language processing, biometrics, search engines, etc.

另一方面,隨著網路快速發展,現代人幾乎難以脫離網路世界。然而,一般使用者所用電子產品的功能有限,甚至會有效能不足等問題。為了解決前述問題,不少業者會提供雲端處理服務,將需要運算處理的部分交由伺服器執行,而使用者僅需要透過瀏覽器或應用程式發出需求即可。On the other hand, with the rapid development of the Internet, it is almost impossible for modern people to get out of the online world. However, the functions of electronic products used by general users are limited, and there may even be problems such as insufficient performance. In order to solve the above problems, many operators will provide a cloud processing service, and the part that requires arithmetic processing is executed by the server, and the user only needs to issue a request through a browser or an application.

有鑒於此,本發明提供一種主題提供裝置及其雲儲存檔案提示方法,其依據互動交談內容,提供合適主題的檔案、教材等內容。In view of this, the present invention provides a theme providing apparatus and a cloud storage file prompting method thereof, which provide an archive of a suitable theme, a teaching material, and the like according to an interactive conversation content.

本發明的雲儲存檔案提示方法,其包括下列步驟。取得互動文字內容,此互動文字內容係經至少二位使用者之多媒體對話內容轉換而得。自此互動文字內容中取得關鍵詞彙。依據這些關鍵詞彙篩選檔案資料庫所記錄的數個主題。提供篩選主題之內容。The cloud storage file prompting method of the present invention comprises the following steps. Acquire interactive text content, which is converted from the multimedia conversation content of at least two users. Get keyword collections from this interactive text content. According to these keywords, several topics recorded in the archive database are filtered. Provide content for filtering topics.

本發明的主題提供裝置,其包括通訊單元、儲存單元及處理單元。通訊單元傳送或接收資料。儲存單元記錄數個模組及檔案資料庫,此檔案資料庫儲存數個主題之內容。處理單元耦接通訊單元及儲存單元,且存取並載入儲存單元所記錄的那些模組。那些模組包括訊息交換模組、互動詞彙抽取模組、主題分析模組及主題提供模組。訊息交換模組透過通訊單元取得至少二位使用者之多媒體對話內容,並將這些使用者之多媒體對話內容轉換成互動文字內容。互動詞彙抽取模組自互動文字內容中取得關鍵詞彙。主題分析模組依據這些關鍵詞彙篩選那些主題。主題提供模組透過通訊單元提供篩選主題之內容。The subject matter of the present invention provides an apparatus including a communication unit, a storage unit, and a processing unit. The communication unit transmits or receives data. The storage unit records a number of modules and archives, which store the contents of several topics. The processing unit is coupled to the communication unit and the storage unit, and accesses and loads those modules recorded by the storage unit. Those modules include a message exchange module, an interactive vocabulary extraction module, a topic analysis module, and a theme providing module. The message exchange module obtains multimedia conversation content of at least two users through the communication unit, and converts the multimedia conversation content of these users into interactive text content. The interactive vocabulary extraction module obtains key words from the interactive text content. The topic analysis module filters those topics based on these keywords. The theme provides modules to provide content for filtering topics through the communication unit.

基於上述,本發明實施例係基於機器運算方法,將二人以上的線上互動文字內容(包含聲音轉文字的語音討論)之主題進行關注詞標示,以自動提示或推薦雲端儲存空間中相應主題之檔案或教材,從而達到檔案系統智慧化,並加強雲儲存檔案系統的使用度,且增進人群之間知識學習及刺激經濟發展。Based on the above, the embodiment of the present invention is based on a machine operation method, and the subject of the online interactive text content (including the voice-to-text voice discussion) of two or more people is marked with the attention word to automatically prompt or recommend the corresponding topic in the cloud storage space. Archives or textbooks to achieve intelligentization of the file system, and to enhance the use of cloud storage file systems, and to enhance knowledge learning among people and stimulate economic development.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。The above described features and advantages of the invention will be apparent from the following description.

圖1是依據本發明一實施例之主題提供裝置1的元件方塊圖。主題提供裝置1 可以係伺服器、電腦主機、工作站等電子裝置,並至少包括但不僅限於通訊單元110、儲存單元120及處理單元130。1 is a block diagram of components of a subject providing apparatus 1 in accordance with an embodiment of the present invention. The theme providing device 1 can be an electronic device such as a server, a computer host, a workstation, and the like, and includes at least but not limited to the communication unit 110, the storage unit 120, and the processing unit 130.

通訊單元110可以係支援光纖、乙太網路(Ethernet)等有線網路技術或Wi-Fi、行動通訊網路、WiMAX等無線網路的通訊收發器,其可接收來自其他用戶設備(例如,電腦、智慧型手機、平板電腦等)的訊息、聊天內容、多媒體內容等檔案或資料,並可發送各種主題之內容(例如,檔案、教材等)給對應用戶設備。The communication unit 110 can support a wired network technology such as an optical fiber or an Ethernet network, or a communication transceiver of a wireless network such as a Wi-Fi, a mobile communication network, or a WiMAX, which can receive other user equipment (for example, a computer). , smart phones, tablets, etc.) messages, chat content, multimedia content and other files or materials, and can send content of various topics (for example, files, textbooks, etc.) to the corresponding user equipment.

儲存單元120可以係任何型態的固定或可移動隨機存取記憶體(RAM)、唯讀記憶體(ROM)、快閃記憶體(flash memory)、傳統硬碟(hard disk drive)、固態硬碟(solid-state drive)或類似元件或上述元件的組合,並用以記錄訊息交換模組121、互動詞彙抽取模組122、檔案資料庫123、資料庫詞彙抽取模組124、主題分析模組125及主題提供模組126等軟體程式、主題模型、停用詞(stop word)、關鍵詞彙、資料庫詞彙等相關資訊。前述模組、資料庫、檔案及資料待後續實施例再詳細說明。The storage unit 120 can be any type of fixed or removable random access memory (RAM), read only memory (ROM), flash memory, hard disk drive, solid state hard A solid-state drive or the like or a combination of the above components, and used to record the message exchange module 121, the interactive vocabulary extraction module 122, the archive database 123, the database vocabulary extraction module 124, and the topic analysis module 125 And the theme provides software programs such as module 126, theme model, stop word, keyword collection, database vocabulary and other related information. The foregoing modules, databases, files and materials will be described in detail in the following embodiments.

處理單元130與儲存單元120及通訊單元110連接,並可以是中央處理單元(CPU),或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位信號處理器(DSP)、可程式化控制器、特殊應用積體電路(ASIC)或其他類似元件或上述元件的組合。在本發明實施例中,處理單元130用以執行主題提供裝置1的所有作業,且可存取並執行上述儲存單元120中記錄的模組。The processing unit 130 is connected to the storage unit 120 and the communication unit 110, and may be a central processing unit (CPU), or other programmable general purpose or special purpose microprocessor (Microprocessor), digital signal processor (DSP). , a programmable controller, an application specific integrated circuit (ASIC) or other similar component or a combination of the above. In the embodiment of the present invention, the processing unit 130 is configured to execute all the operations of the theme providing apparatus 1 and can access and execute the modules recorded in the storage unit 120.

為了方便理解本發明實施例的操作流程,以下將舉諸多實施例詳細說明本發明實施例中主題提供裝置1之運作流程。圖2是依據本發明一實施例說明一種雲儲存檔案提示方法之流程圖。請參照圖2,本實施例的方法適用於圖1中主題提供裝置1中的各裝置。下文中,將搭配主題提供裝置1的各項元件及模組說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整,且並不僅限於此。In order to facilitate the understanding of the operation flow of the embodiment of the present invention, the operation flow of the theme providing apparatus 1 in the embodiment of the present invention will be described in detail below. 2 is a flow chart illustrating a cloud storage file prompting method according to an embodiment of the invention. Referring to FIG. 2, the method of the present embodiment is applicable to each device in the subject providing device 1 of FIG. Hereinafter, the methods described in the embodiments of the present invention will be described with reference to various components and modules of the subject providing apparatus 1. The various processes of the method can be adjusted accordingly according to the implementation situation, and are not limited thereto.

首先,經擷取網路封包、用戶上傳或透過外部或內件儲存媒介(例如,隨身碟、光碟、外接硬碟等)而使主題提供裝置1取得任何類型(例如,教材、圖庫、新聞等)檔案內容、檔案註解內容、檔案資料屬性等檔案相關資料、或是影音/語音檔案等多媒體檔案,並將這些檔案轉換成文字、語音等形式的資訊內容而儲存於檔案資料庫123中,以作為訓練學習的樣本。First, the theme providing device 1 obtains any type (for example, textbooks, gallery, news, etc.) by capturing a network packet, uploading a user, or using an external or internal storage medium (for example, a flash drive, a compact disc, an external hard drive, etc.). ) file-related content such as file content, file annotation content, file data attributes, or multimedia files such as audio/video files, and convert these files into text, voice, and other information content and store them in the file database 123. As a sample of training and learning.

另一方面,訊息交換模組121會透過該通訊單元取得至少二位使用者之多媒體對話內容,並將這些使用者之多媒體對話內容轉換成互動文字內容,從而此得互動文字內容(步驟S210)。具體而言,訊息交換模組121可自行運作聊天室、討論區或其他訊息交換平台或自外部的訊息交換平台(如圖3之聊天室310),蒐集用戶(例如,師生、客戶、業者等)所留下的純文字訊息、視訊錄音等多媒體對話內容。On the other hand, the message exchange module 121 obtains multimedia conversation content of at least two users through the communication unit, and converts the multimedia conversation content of the users into interactive text content, thereby obtaining interactive text content (step S210). . Specifically, the message exchange module 121 can operate a chat room, a discussion area or other information exchange platform or an external message exchange platform (such as the chat room 310 in FIG. 3) to collect users (for example, teachers, students, customers, and operators). Etc.) The content of multimedia conversations such as plain text messages and video recordings.

接著,資料庫詞彙抽取模組124對檔案資料庫123所記錄資訊內容進行詞彙抽取並濾除停用詞(Stop words),以產生資料庫詞彙。具體而言,資料庫詞彙抽取模組124分析檔案內容、檔案註解內容、檔案資料屬性、書籤分類名稱、搜尋關鍵字之文字內容,以擷取用戶關注的詞彙。由於檔案內容、檔案資料屬性與檔案註解內容多為一般語句,因此資料庫詞彙抽取模組124可透過後綴數組(suffix array)或PAT-Tree等抽詞方法擷取語句中之重要詞彙。經抽詞方法所擷取出之詞彙先利用預先給定之規則條件進行初步過濾,刪除非成詞之詞彙。接著,資料庫詞彙抽取模組124利用預先收集之中英文停用詞列表,進一步過濾擷取出之詞彙,並以空格分隔擷取出之詞彙,最終所得之詞彙即作為資料庫詞彙。Next, the database vocabulary extraction module 124 performs vocabulary extraction on the information content recorded by the archive database 123 and filters out Stop words to generate a database vocabulary. Specifically, the database vocabulary extraction module 124 analyzes the file content, the archive annotation content, the archive material attribute, the bookmark classification name, and the search keyword text content to capture the vocabulary of the user's attention. Since the file content, the archive data attribute and the file annotation content are mostly general sentences, the database vocabulary extraction module 124 can extract important words in the sentence through a suffix array or a PAT-Tree method. The vocabulary extracted by the word-extracting method first performs preliminary filtering using the predetermined rule conditions to delete the non-word-forming vocabulary. Then, the database vocabulary extraction module 124 uses the pre-collected list of Chinese and English stop words to further filter the extracted vocabulary and separate the extracted vocabulary by spaces, and the final vocabulary is used as the database vocabulary.

此外,互動詞彙抽取模組122亦自互動文字內容中取得關鍵詞彙(步驟S220),相似地,互動詞彙抽取模組122係對互動文字內容進行詞彙抽取並濾除停用詞,以產生關鍵詞彙,而其詳細步驟可參照前述取得資料庫詞彙的流程,於此不再贅述。In addition, the interactive vocabulary extraction module 122 also obtains a keyword sink from the interactive text content (step S220). Similarly, the interactive vocabulary extraction module 122 performs vocabulary extraction on the interactive text content and filters out the stop words to generate a keyword sink. For detailed steps, reference may be made to the foregoing process of obtaining the vocabulary of the database, and details are not described herein again.

而主題分析模組125則透過自然語言處理相關之主題模型學習那些資料庫詞彙,以產生資訊內容所隱含之主題。具體而言,主題模型領域中的潛在狄利克里分配 (LDA、Latent Dirichlet Allocation)、LSA(Latent Semantic Analysis)、PLSA(Probabilistic Latent Semantic Analysis)等都是常見用於自然語言處理的模型方法。The topic analysis module 125 learns the database vocabulary through the topic model of the natural language processing to generate the theme implied by the information content. Specifically, the potential Dirichlet allocation (LDA, Latent Dirichlet Allocation), LSA (Latent Semantic Analysis), and PLSA (Probabilistic Latent Semantic Analysis) in the subject model domain are common model methods for natural language processing.

請參照圖3係潛在狄利克里分配的概念數學函式之示意圖,其中的標籤數學意義為::每個文檔(即,資訊內容)所屬之主題之狄利克里分佈:每個主題內詞語之狄利克里分佈:文檔m之主題機率分佈:主題k之詞語機率分佈:文檔m之第n個詞所屬的主題:文檔m之第n個詞:主題總數:文檔總數:文檔內之詞語總數 潛在狄利克里分配模型之文檔生成步驟為: 從狄利克里分佈中取樣生成文檔m 的主題分布。從主題的多項式分佈中取樣生成文檔mn 個詞的主題。從狄利克里分佈中取樣生成主題的詞語分布。從詞語的多項式分佈中採樣最終生成詞語Please refer to Figure 3 for a schematic diagram of the conceptual mathematical function of the potential Dirichlet distribution. The mathematical meaning of the label is: : Dirichi distribution of the subject to which each document (ie, information content) belongs : Dirichi distribution of words within each subject : The subject probability distribution of document m : The probability distribution of the word k : The subject of the nth word of the document m : the nth word of the document m : total number of topics : total number of documents : Total number of words in the document The potential Dirichlet distribution model document generation steps are: From Dirichi distribution Mid-sampling generates the subject distribution of the document m . Polynomial distribution from the topic Sampling to generate the subject of the nth word of the document m . Distribution from Dirichi Sampling generation theme Word distribution . Polynomial distribution from words Mid-sampling final generated words .

而針對訓練樣本D(即,資訊內容),其資料內容相似度(likelihood)可表示成:…(1) 透過最大化相似度評估(Maximum Likelihood Estimate)的訓練條件及對應的訓練方式,例如吉布斯取樣法(Gibbs Sampling)或變分性推斷(Variational Inference),主題分析模組125可求得模型參數α、β、θ、φ。For the training sample D (ie, information content), the data content similarity (likelihood) can be expressed as: (1) The subject analysis module 125 may be configured by a Maximum Likelihood Estimate training condition and a corresponding training method, such as Gibbs Sampling or Variational Inference. The model parameters α, β, θ, φ are obtained.

請接著參照圖5是一範例說明文檔生成過程。針對一篇文章,主題分析模組125可以透過主題機率分佈,例如圖面右方以不同底色柱狀圖所表示三種主題的分佈以及每一種主題的詞彙分佈,例如第一主題包含「三角函數」、「正弦」、「餘弦」等詞彙,其機率分別為0.05、0.02、0.02等,依此類推,之後再依照主題分佈產生不同主題順序的樣本,如產生的主題順序為第一主題、第二主題、第一主題、第二主題、第三主題、第一主題等圓圈。最後,透過主題詞彙分佈,生成對應的詞彙(如圖中所示箭頭指向及標註)。而透過模型的訓練,主題分析模組125即得到最符合訓練資料的主題分佈、詞彙分佈及主題樣本順序。Please refer to FIG. 5 for an example of a document generation process. For an article, the topic analysis module 125 can use the theme probability distribution, for example, the distribution of the three themes represented by different background color histograms on the right side of the drawing and the vocabulary distribution of each theme, for example, the first theme includes a trigonometric function. Words such as "sinusoidal" and "cosine" have a probability of 0.05, 0.02, 0.02, etc., and so on, and then generate samples of different subject sequences according to the theme distribution, such as the first topic, the first topic, Two themes, the first theme, the second theme, the third theme, the first theme, and the like. Finally, through the topic vocabulary distribution, generate corresponding vocabulary (arrows pointing and labeling as shown in the figure). Through the training of the model, the topic analysis module 125 obtains the topic distribution, vocabulary distribution and subject sample order that best match the training materials.

接著,主題分析模組125可依據互動文字內容的關鍵詞彙篩選檔案資料庫123所記錄的主題(步驟S230)。於本實施例中,主題分析模組125判斷關鍵詞彙與那些主題符合的相似度,以作為篩選基礎。具體而言,主題分析模組125透過前述主題模型(例如,潛在狄利克里分配)分析該資料庫詞彙及關鍵詞彙對於不同主題的機率,以作為各機率值之向量。此外,由於互動文字內容與資訊內容皆已轉換到相同主題空間中,因此主題分析模組125可利用餘弦相似度計算公式(2)(假設A、B分別代表二向量),計算資料庫詞彙及關鍵詞彙的對應兩機率值之向量之相似度。…(2)Then, the topic analysis module 125 can filter the topics recorded by the archive database 123 according to the keyword of the interactive text content (step S230). In this embodiment, the topic analysis module 125 determines the similarity of the keywords to those topics, as a basis for screening. Specifically, the topic analysis module 125 analyzes the probability of the database vocabulary and keyword aggregation for different topics through the aforementioned topic model (eg, potential Dirichlet distribution) as a vector of probability values. In addition, since the interactive text content and the information content have been converted into the same subject space, the topic analysis module 125 can calculate the database vocabulary and the cosine similarity calculation formula (2) (assuming that A and B respectively represent two vectors). The similarity of the vectors of the two probability values corresponding to the keyword sink. …(2)

以圖6為例,師生互動內容(即,互動文字內容)與雲端教材(即,資訊內容)輸入至狄利克里分配主題模型,透過主題模型分析兩輸入資料的對於主題1至主題n(n係正整數)的機率向量。最後,主題分析模組125將兩組向量透過公式(2)計算對於不同主題的相似度。Taking Figure 6 as an example, the teacher-student interaction content (ie, interactive text content) and the cloud textbook (ie, information content) are input to the Dirichi distribution theme model, and the theme model is used to analyze the two input materials for the theme 1 to the theme n ( The probability vector of n is a positive integer). Finally, the topic analysis module 125 calculates the similarity for different topics through the formula (2) for the two sets of vectors.

最後,主題提供模組126即可透過通訊單元110提供篩選主題之內容給對應的用戶設備(步驟S240)。主題提供模組126可挑選相似度高於特定門檻值(例如,0.5、0.8、0.9等)的主題,在從檔案資料庫123中取得所挑選主題的對應內容(例如,教材、圖片、文檔、清單等)。如圖7所示係用戶設備之顯示器所呈現畫面700係主題提供裝置1所提供的教材。Finally, the theme providing module 126 can provide the content of the screening theme to the corresponding user equipment through the communication unit 110 (step S240). The topic providing module 126 can select topics with similarities above a certain threshold (eg, 0.5, 0.8, 0.9, etc.), and obtain corresponding content of the selected topic from the archive database 123 (eg, textbooks, pictures, documents, List, etc.). As shown in FIG. 7, the screen 700 presented by the display of the user equipment is a teaching material provided by the theme providing apparatus 1.

值得注意的是,本發明實施例可應用情境相當多,除了雲端檔案的提供、資訊搜尋、圖片資訊提供、教育學習等,呈現主題提供裝置1甚至可作為一種專家系統,提供使用者在交談後還能獲得額外的輔助內容。例如,圖3所示師生在聊天室310交談後或過程中,主題提供裝置1可提供合適主題的雲端教材350給學生的設備。It should be noted that the embodiment of the present invention can be applied in a relatively large number of scenarios. In addition to the provision of cloud files, information search, image information providing, educational learning, etc., the presentity providing device 1 can even serve as an expert system to provide users after the conversation. Additional auxiliary content is also available. For example, after the teacher and the student shown in FIG. 3 are talking in the chat room 310 or during the process, the theme providing device 1 can provide the cloud teaching material 350 of the appropriate theme to the student's device.

綜上所述,本發明實施例主要係由以下流程進行:蒐集雲端電子檔案相關資訊(即,資訊內容)。蒐集複數用戶間之互動文字內容。將雲端電子檔案相關資訊,進行詞彙抽取並濾除停用詞。將互動討論文字內容,進行詞彙抽取並濾除停用詞。利用主題模型,學習資訊內容所隱含之主題。依據主題分佈機率,計算主題符合相似度。最後,依據相似度,推薦符合相關主題之資訊內容給用戶。此資訊內容還可能係應用於教育領域的雲端教材,或是其他法律文件、新聞章節、圖片等各種領域之資訊檔案。In summary, the embodiments of the present invention are mainly performed by the following processes: collecting information related to cloud electronic files (ie, information content). Collect interactive text content between multiple users. The cloud electronic file related information, vocabulary extraction and filter out stop words. The text will be discussed interactively, vocabulary extraction will be performed and the stop words will be filtered out. Use the theme model to learn the topics implied by the content. Calculate the subject to match similarity based on the probability of topic distribution. Finally, based on the similarity, the information content that meets the relevant topic is recommended to the user. This information content may also be applied to cloud textbooks in education, or to other legal documents, news chapters, pictures and other fields of information.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

1‧‧‧主題提供裝置1‧‧‧ theme providing device

110‧‧‧通訊單元110‧‧‧Communication unit

120‧‧‧儲存單元120‧‧‧ storage unit

121‧‧‧訊息交換模組121‧‧‧Message Exchange Module

122‧‧‧互動詞彙抽取模組122‧‧‧Interactive vocabulary extraction module

123‧‧‧檔案資料庫123‧‧‧Archive database

124‧‧‧資料庫詞彙抽取模組124‧‧‧Database vocabulary extraction module

125‧‧‧主題分析模組125‧‧‧Thematic Analysis Module

126‧‧‧主題提供模組126‧‧‧ theme providing module

S210~S240‧‧‧步驟S210~S240‧‧‧Steps

310‧‧‧聊天室310‧‧‧ Chat Rooms

350‧‧‧雲端教材350‧‧‧Cloud textbook

700‧‧‧呈現畫面700‧‧‧presentation

α‧‧‧每個文檔所屬之主題之狄利克里分佈α‧‧‧Dirichley distribution of the subject of each document

β‧‧‧每個主題內詞語之狄利克里分佈β‧‧‧Dirichley distribution of words in each subject

θ‧‧‧主題機率分佈Θ‧‧‧ theme probability distribution

φ‧‧‧詞語機率分佈Φ‧‧‧ word probability distribution

Z‧‧‧所屬的主題The subject of Z‧‧‧

w‧‧‧詞W‧‧‧ words

K‧‧‧主題總數K‧‧‧ total number of topics

M‧‧‧文檔總數M‧‧ ‧ total number of documents

N‧‧‧文檔內之詞語總數Total number of words in the N‧‧‧ document

圖1是依據本發明一實施例之主題提供裝置的元件方塊圖。 圖2是依據本發明一實施例之雲儲存檔案提示方法的流程圖。 圖3是一範例說明聊天室及主題內容之提供。 圖4是潛在狄利克里分配(Latent Dirichlet Allocation,LDA)的概念數學函式之示意圖。 圖5是一範例說明文檔生成過程。 圖6是一範例說明相似度運算流程。 圖7是一範例說明所提供之主題內容。1 is a block diagram of components of a subject providing apparatus in accordance with an embodiment of the present invention. 2 is a flow chart of a cloud storage file prompting method according to an embodiment of the invention. FIG. 3 is an illustration of the provision of a chat room and subject matter. Figure 4 is a schematic diagram of the conceptual mathematical function of the Latent Dirichlet Allocation (LDA). Figure 5 is an illustration of a document generation process. FIG. 6 is an example of a similarity operation flow. Figure 7 is an illustration of the subject matter provided.

Claims (11)

一種雲儲存檔案提示方法,包括: 取得一互動文字內容,其中該互動文字內容係經至少二使用者之多媒體對話內容轉換而得; 自該互動文字內容中取得至少一關鍵詞彙; 依據該至少一關鍵詞彙篩選一檔案資料庫所記錄的多個主題;以及 提供篩選主題之內容。A cloud storage file prompting method, comprising: obtaining an interactive text content, wherein the interactive text content is obtained by converting at least two user multimedia conversation contents; obtaining at least one keyword sink from the interactive text content; The keywords are used to filter a plurality of topics recorded in an archive database; and to provide content for screening topics. 如申請專利範圍第1項所述的雲儲存檔案提示方法,其中自該互動文字內容中取得該至少一關鍵詞彙的步驟,包括: 對該互動文字內容進行詞彙抽取並濾除停用詞,以產生該至少一關鍵詞彙。The method for prompting a cloud storage file according to claim 1, wherein the step of obtaining the at least one keyword from the interactive text content comprises: performing vocabulary extraction on the interactive text content and filtering out the stop word, Generating the at least one keyword sink. 如申請專利範圍第1項所述的雲儲存檔案提示方法,其中依據該至少一關鍵詞彙篩選該檔案資料庫所記錄的該些主題的步驟包括: 判斷該至少一關鍵詞彙與該些主題符合的相似度,以作為篩選基礎。The method for prompting the cloud storage file according to claim 1, wherein the step of filtering the topics recorded by the archive database according to the at least one keyword includes: determining that the at least one keyword meets the topics Similarity, as a basis for screening. 如申請專利範圍第3項所述的雲儲存檔案提示方法,其中判斷該至少一關鍵詞彙與該些主題符合的相似度的步驟之前,更包括: 對該檔案資料庫所記錄資訊內容進行詞彙抽取並濾除停用詞,以產生至少一資料庫詞彙;以及 透過自然語言處理相關之主題模型學習該至少一資料庫詞彙,以產生該些主題。The method for prompting the cloud storage file according to claim 3, wherein before the step of determining the similarity between the at least one keyword and the topics, the method further comprises: performing vocabulary extraction on the information content recorded in the archive database And deleting the stop words to generate at least one database vocabulary; and learning the at least one database vocabulary through the natural language processing related topic model to generate the topics. 如申請專利範圍第4項所述的雲儲存檔案提示方法,其中判斷該至少一關鍵詞彙與該些主題符合的相似度的步驟,包括: 分析該至少一資料庫詞彙及該至少一關鍵詞彙對於不同主題的機率,以作為多個機率值之向量;以及 計算該至少一資料庫詞彙及該至少一關鍵詞彙的該些機率值之向量之相似度。The method for prompting the cloud storage file according to claim 4, wherein the step of determining the similarity between the at least one keyword and the topics comprises: analyzing the at least one database vocabulary and the at least one keyword The probability of different topics, as a vector of multiple probability values; and calculating the similarity of the at least one database vocabulary and the vectors of the probability values of the at least one keyword sink. 如申請專利範圍第4項所述的雲儲存檔案提示方法,其中該檔案資料庫所記錄資訊內容係雲端教材。For example, the cloud storage file prompting method described in claim 4, wherein the information content recorded by the archive database is a cloud textbook. 一種主題提供裝置,包括: 一通訊單元,傳送或接收資料; 一儲存單元,記錄多個模組及一檔案資料庫,其中該檔案資料庫儲存多個主題之內容; 一處理單元,耦接該通訊單元及該儲存單元,且存取並載入該儲存單元所記錄的該些模組,而該些模組包括: 一訊息交換模組,透過該通訊單元取得至少二使用者之多媒體對話內容,並將該至少二使用者之多媒體對話內容轉換成一互動文字內容; 一互動詞彙抽取模組,自該互動文字內容中取得至少一關鍵詞彙; 一主題分析模組,依據該至少一關鍵詞彙篩選該些主題;以及 一主題提供模組,透過該通訊單元提供篩選主題之內容。A subject providing device, comprising: a communication unit for transmitting or receiving data; a storage unit for recording a plurality of modules and a file database, wherein the file database stores contents of a plurality of topics; and a processing unit coupled to the The communication unit and the storage unit access and load the modules recorded by the storage unit, and the modules include: a message exchange module, through which the multimedia conversation content of at least two users is obtained And converting the multimedia conversation content of the at least two users into an interactive text content; an interactive vocabulary extraction module, obtaining at least one keyword sink from the interactive text content; a theme analysis module, filtering according to the at least one keyword sink Themes; and a theme providing a module through which the content of the screening theme is provided. 如申請專利範圍第7項所述的主題提供裝置,其中該互動詞彙抽取模組對該互動文字內容進行詞彙抽取並濾除停用詞,以產生該至少一關鍵詞彙。The theme providing device of claim 7, wherein the interactive vocabulary extracting module performs vocabulary extraction on the interactive text content and filters out the stop words to generate the at least one keyword sink. 如申請專利範圍第7項所述的主題提供裝置,其中該主題分析模組判斷該至少一關鍵詞彙與該些主題符合的相似度,以作為篩選基礎。The subject providing device of claim 7, wherein the subject analysis module determines the similarity between the at least one keyword and the topics as a screening basis. 如申請專利範圍第9項所述的主題提供裝置,其中該檔案資料庫更儲存資訊內容,而該些模組更包括: 一資料庫詞彙抽取模組,對該檔案資料庫所記錄資訊內容進行詞彙抽取並濾除停用詞,以產生至少一資料庫詞彙;而 該主題分析模組透過自然語言處理相關之主題模型學習該至少一資料庫詞彙,以產生該些主題。The theme providing device of claim 9, wherein the file database further stores information content, and the modules further comprise: a database vocabulary extraction module, and the information content recorded in the file database is The vocabulary extracts and filters out the stop words to generate at least one database vocabulary; and the topic analysis module learns the at least one database vocabulary through a natural language processing related topic model to generate the topics. 如申請專利範圍第9項所述的主題提供裝置,其中該主題分析模組分析該至少一資料庫詞彙及該至少一關鍵詞彙對於不同主題的機率,以作為多個機率值之向量,且該主題分析模組計算該至少一資料庫詞彙及該至少一關鍵詞彙的該些機率值之向量之相似度。The theme providing apparatus according to claim 9, wherein the subject analysis module analyzes the probability of the at least one database vocabulary and the at least one keyword to different topics, as a vector of a plurality of probability values, and the The subject analysis module calculates a similarity between the at least one database vocabulary and the vectors of the probability values of the at least one keyword sink.
TW106137724A 2017-11-01 2017-11-01 Topic providing apparatus and could file prompting method thereof TWI656448B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106137724A TWI656448B (en) 2017-11-01 2017-11-01 Topic providing apparatus and could file prompting method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106137724A TWI656448B (en) 2017-11-01 2017-11-01 Topic providing apparatus and could file prompting method thereof

Publications (2)

Publication Number Publication Date
TWI656448B TWI656448B (en) 2019-04-11
TW201918901A true TW201918901A (en) 2019-05-16

Family

ID=66996104

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106137724A TWI656448B (en) 2017-11-01 2017-11-01 Topic providing apparatus and could file prompting method thereof

Country Status (1)

Country Link
TW (1) TWI656448B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385902B (en) * 2020-10-22 2024-01-30 腾讯科技(深圳)有限公司 Content recommendation method, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW589589B (en) * 2002-06-03 2004-06-01 Yuan Chiou Internat Co Ltd An interactive teaching system and method provided through internet
TW201044332A (en) * 2009-06-03 2010-12-16 Qing-Rong Liao Multiple-user on-line interaction system and the method thereof
US20160164815A1 (en) * 2014-12-08 2016-06-09 Samsung Electronics Co., Ltd. Terminal device and data processing method thereof
CN104978878A (en) * 2015-06-26 2015-10-14 苏州点通教育科技有限公司 Microlecture teaching system and method
CN106649405A (en) * 2015-11-04 2017-05-10 陈包容 Method and device for acquiring reply prompt content of chat initiating sentence
WO2017090954A1 (en) * 2015-11-24 2017-06-01 Samsung Electronics Co., Ltd. Electronic device and operating method thereof

Also Published As

Publication number Publication date
TWI656448B (en) 2019-04-11

Similar Documents

Publication Publication Date Title
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN111753198B (en) Information recommendation method and device, electronic equipment and readable storage medium
US11514235B2 (en) Information extraction from open-ended schema-less tables
US11704501B2 (en) Providing a response in a session
CN107346336B (en) Information processing method and device based on artificial intelligence
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
CN109697239B (en) Method for generating teletext information
Pal et al. A semi-automatic metadata extraction model and method for video-based e-learning contents
US9613093B2 (en) Using question answering (QA) systems to identify answers and evidence of different medium types
US10796203B2 (en) Out-of-sample generating few-shot classification networks
US10652454B2 (en) Image quality evaluation
CN105760363B (en) Word sense disambiguation method and device for text file
CN112015928A (en) Information extraction method and device of multimedia resource, electronic equipment and storage medium
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN112199954B (en) Disease entity matching method and device based on voice semantics and computer equipment
CN111813993A (en) Video content expanding method and device, terminal equipment and storage medium
CN114942994A (en) Text classification method, text classification device, electronic equipment and storage medium
CN114168715A (en) Method, device and equipment for generating target data set and storage medium
US11501071B2 (en) Word and image relationships in combined vector space
Alfrjani et al. A new approach to ontology-based semantic modelling for opinion mining
TWI656448B (en) Topic providing apparatus and could file prompting method thereof
CN113821669A (en) Searching method, searching device, electronic equipment and storage medium
Razis et al. Enriching social analytics with latent Twitter image information
Seenivasan ETL in a World of Unstructured Data: Advanced Techniques for Data Integration
Hoy Deep learning and online video: Advances in transcription, automated indexing, and manipulation