TW202407575A - An editing system used for recommending articles - Google Patents
An editing system used for recommending articles Download PDFInfo
- Publication number
- TW202407575A TW202407575A TW111129666A TW111129666A TW202407575A TW 202407575 A TW202407575 A TW 202407575A TW 111129666 A TW111129666 A TW 111129666A TW 111129666 A TW111129666 A TW 111129666A TW 202407575 A TW202407575 A TW 202407575A
- Authority
- TW
- Taiwan
- Prior art keywords
- draft
- articles
- article
- vector
- editing system
- Prior art date
Links
- 239000013598 vector Substances 0.000 claims abstract description 127
- 238000012545 processing Methods 0.000 claims description 31
- 238000013079 data visualisation Methods 0.000 claims description 14
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000000926 separation method Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 8
- 238000010801 machine learning Methods 0.000 claims description 8
- 230000008676 import Effects 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 12
- 238000000034 method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000019771 cognition Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
本發明大致上係關於一種編輯系統,特別係指關於一種用於推薦文章之編輯系統。 The present invention generally relates to an editing system, and in particular to an editing system for recommending articles.
傳統上,當筆者在撰寫文章時,通常會依據其過往熟知的內容或經驗來進行撰寫,而在撰寫的過程中,筆者所撰寫的內容往往會受限於自身過去的閱讀方向而在撰寫的內容中產生偏重或論述不足的情形。即便筆者為了充實其撰寫的內容而花費時間多方閱讀文章,但現有的文章來源過多,使筆者無法在有限的時間內找到與筆者思維重度相關的文章,且瀏覽某些網站平台的文章時,這些平台往往會在該文章末端提供其他相關內容推薦,上述所推薦之內容常見以2至3篇文章作為衍伸閱讀,且但是這些作為衍伸閱讀之文章有時並未與筆者所撰寫之主題內容正相關。 Traditionally, when the author writes an article, he usually writes based on the content or experience he is familiar with in the past. In the process of writing, the content the author writes is often limited by his past reading direction. There is too much emphasis or insufficient discussion in the content. Even though the author spent time reading articles from many sources in order to enrich the content he wrote, there are too many sources of existing articles, making it impossible for the author to find articles that are closely related to the author's thinking in the limited time. Moreover, when browsing articles on certain website platforms, these The platform often provides other related content recommendations at the end of the article. The above recommended content is usually based on 2 to 3 articles as extended reading, and these articles as extended reading are sometimes not related to the subject content written by the author. Positive correlation.
另一方面,當筆者閱讀經由搜尋所得之多篇文章時,在每一已閱讀之文章中對於所欲闡述的議題之各面向的比例及深度不同,讓筆者難以比較每一已閱讀之文章與其正在撰寫文章的關聯度高低,使筆者難以有效率且去蕪存菁地閱讀,因此,筆者在參考文章時,可能會忽略掉與其正在撰寫文章關聯度較高的文章,致使筆者未能及時獲取最需要的內容。 On the other hand, when the author read multiple articles obtained through the search, the proportion and depth of each aspect of the topic to be discussed in each read article were different, making it difficult for the author to compare each read article with its The relevance of the article being written makes it difficult for the author to read efficiently and save the essentials. Therefore, when referring to articles, the author may ignore articles that are highly related to the article he is writing, resulting in the author not being able to obtain them in time. Most needed content.
為了讓筆者能有效率地蒐羅與其所撰寫之內容高關聯性的文章,以助其增廣見聞並引導其快速認知其所閱讀的面向深度以助其強化所撰寫 之內容的完整度,故如何即時推薦筆者選擇所需的文章以讓筆者不浪費時間尋找並循著捷徑深化閱讀所需知識,即為每個業者所亟需解決的問題。 In order to allow the author to efficiently search for articles that are highly relevant to the content he is writing, to help him expand his knowledge and guide him to quickly understand the depth of what he is reading, so as to help him strengthen what he is writing. The completeness of the content, so how to instantly recommend the articles that the author needs to choose so that the author does not waste time looking for and following shortcuts to deepen the knowledge required for reading, is an urgent problem that every industry needs to solve.
本發明之目的在於提供一種用於推薦文章之編輯系統。當使用者利用本發明之編輯系統撰寫文章初稿時,為了將其內心所欲陳述的評論或想法於其文章初稿做完整表示,使用者可利用本系統尋找與其文章初稿相關的參考文章並進行閱讀以求對於其文章初稿完整論述。而透過本發明之系統找出與該文章初稿高關聯性之文章時,能將文章初稿與在不同類型之文章群中進行比對以找出與該文章初稿有較高關聯性的文章,使本發明系統將這些高關聯性的文章提供給使用者,讓使用者參考這些文章後對於其所欲發表的初稿內容得以再做進一步調整或補綴內容,一方面提升使用者對於特定知識或議題的認知範圍,使其構想不再只是侷限於局部,另一方面,透過本發明的資訊分析所找出之推薦文章,讓使用者亦可參考這些推薦文章而重新審視其所撰寫之文章初稿的主題方向及重點內容是否有不足的地方。 The purpose of the present invention is to provide an editing system for recommending articles. When a user uses the editing system of the present invention to write the first draft of an article, in order to fully express the comments or thoughts he or she wants to express in the first draft of the article, the user can use this system to find and read reference articles related to the first draft of the article. In order to provide a complete discussion of the first draft of his article. When using the system of the present invention to find articles that are highly relevant to the first draft of the article, the first draft of the article can be compared with different types of article groups to find articles that are highly relevant to the first draft of the article, so that The system of the present invention provides these highly relevant articles to users, allowing users to refer to these articles and further adjust or modify the content of the first draft they want to publish. On the one hand, it improves the user's understanding of specific knowledge or issues. The scope of cognition makes its ideas no longer limited to the local area. On the other hand, the recommended articles found through the information analysis of the present invention allow users to refer to these recommended articles and re-examine the theme of the first draft of the article they wrote. Are there any deficiencies in the direction and key content?
本發明之另一目的在於建立多個主題模型並依此建構多個二維地圖,以供使用者找出適合的主題模型並從該主題模型中找出推薦之多篇文章,而被推薦的多篇文章會以編號顯示於二維地圖上。而在雲端伺服器的多篇文章會先分離出多個單詞組合並將上述多個單詞組合轉為轉換為與其相對應之複數個向量,以令該複數個向量相對應於該多篇文章,再使該複數篇文章與其相對應之該複數個向量分成多個群組並依據該多個群組建立一主題模型;上述主題模型又可為生活投資、理財、房產生活或區域總經等等主題模型;在每一個主題模型會再做進一步細部分群(例如,以生活投資為主題模型之下還會再細 分出產業思維、商業思維及投資思維三種子分類),且依據該主題模型以建構一二維地圖,並將該複數個向量座標化以產生複數個座標,使複數個座標呈現於該二維地圖上。 Another object of the present invention is to establish multiple topic models and construct multiple two-dimensional maps accordingly, so that users can find a suitable topic model and find recommended articles from the topic model, and the recommended articles Multiple articles will be displayed on the 2D map with numbers. The multiple articles on the cloud server will first separate multiple word combinations and convert the multiple word combinations into plural vectors corresponding to them, so that the plural vectors correspond to the multiple articles. Then the plurality of articles and the corresponding plurality of vectors are divided into multiple groups and a topic model is established based on the multiple groups; the above topic model can also be life investment, financial management, real estate life, or regional general manager, etc. Topic model; each topic model will be further divided into groups (for example, with life investment as the topic model, there will be further details) Separate three sub-categories of industrial thinking, business thinking and investment thinking), and construct a two-dimensional map based on the theme model, and coordinate the plurality of vectors to generate a plurality of coordinates, so that the plurality of coordinates are presented in the two-dimensional on the map.
另外,使用者所撰擬的文章初稿會先分離出初稿單詞組合並將上述初稿單詞組合轉換為與其相對應之一初稿向量,以令該初稿向量相對應於該文章初稿,於本發明中,利用本發明系統可計算使用者之初稿向量與一主題模型中之一群組之一向量與該初稿向量間的相似度值,當該相似度值高於一閥值時,則推薦與該向量相對應的文章,故,當使用者閱讀本系統所推薦之文章時,可藉由吸收的新知而調整其所撰寫的文章初稿以豐富其內容。 In addition, the first draft of the article written by the user will first separate the first draft word combination and convert the first draft word combination into a corresponding first draft vector, so that the first draft vector corresponds to the first draft of the article. In the present invention, The system of the present invention can be used to calculate the similarity value between the user's first draft vector and a vector of a group in a topic model and the first draft vector. When the similarity value is higher than a threshold, it is recommended to match the vector Corresponding articles, therefore, when users read the articles recommended by this system, they can adjust the first draft of the article they write to enrich its content by absorbing new knowledge.
於本發明的第一觀點中,本發明系統係在使用者撰擬文章初稿時推薦給使用者複數個與上述文章初稿具有高關聯性之文章,其包含一編寫應用程式及一雲端伺服器,其中上述編寫應用程式係位於一第一電子裝置(例如,手持電子裝置、桌上型電腦)中,而上述第一電子裝置又具有一第一處理裝置、一第一記憶體、無線傳輸模組,其中前述第一記憶體,例如雲端硬碟(Microsoft SkyDrive、Google Drive、Apple iTune)或一般硬碟,耦合第一處理裝置(可包含CPU、緩衝器、多工器等處理單元),用以儲存上述編寫應用程式,其中透過上述編寫應用程式所提供之一操作介面輸入內容(包括但不限於文字、圖片)以產生一文章初稿,其中前述文章初稿亦儲存於前述第一記憶體,而上述文章初稿係又進一步透過該第一電子裝置之無線傳輸模組輸出至雲端伺服器。 In the first aspect of the present invention, the system of the present invention recommends to the user a plurality of articles that are highly relevant to the first draft of the article when the user is writing the first draft of the article. It includes a writing application and a cloud server. The above-mentioned writing application is located in a first electronic device (for example, a handheld electronic device, a desktop computer), and the above-mentioned first electronic device has a first processing device, a first memory, and a wireless transmission module. , wherein the aforementioned first memory, such as a cloud hard drive (Microsoft SkyDrive, Google Drive, Apple iTune) or a general hard drive, is coupled to a first processing device (which may include a CPU, a buffer, a multiplexer and other processing units), for Store the above-mentioned writing application, in which content (including but not limited to text, pictures) is input through an operation interface provided by the above-mentioned writing application to generate a first draft of an article, wherein the aforementioned first draft of the article is also stored in the aforementioned first memory, and the above-mentioned The first draft of the article is further output to the cloud server through the wireless transmission module of the first electronic device.
於本發明的第二觀點中,本發明揭露一雲端伺服器,前述雲端伺服器進一步包含一第二處理裝置、一通訊介面及複數篇文章,前述雲端伺服器之通訊介面與上述無線傳輸模組連結完成後則自該編寫應用程式傳輸該文章 初稿至該雲端伺服器,其中前述雲端伺服器又進一步包含一語料庫、轉換模組、分群模組、運算模組。上述之語料庫耦合該第二處理裝置用以儲存該文章初稿、複數篇文章、與該文章初稿相對應之一初稿單詞組合及與該複數篇文章相對應之複數個文章單詞組合(即每一文章具有相對應之一文章單詞組合),進一步而言,初稿單詞組合係分離自該文章初稿,該複數個文章單詞組合係分離自該複數篇文章(即每一文章單詞組合係分離自每一篇文章);上述轉換模組則耦合該第二處理裝置及該語料庫,其具有一機器學習模型以將上述初稿單詞組合轉換為與其相對應之一初稿向量,且上述複數個文章單詞組合亦被轉換為與其相對應之複數個向量,以令上述初稿向量相對應於該文章初稿,且該複數個向量相對應於該複數篇文章;另一方面,為了將上述複數篇文章進行分類,於本發明之雲端伺服器又包含一分群模組,其耦合該第二處理裝置,將上述複數篇文章與其相對應之該複數個向量分成多個群組並依據該多個群組建立一主題模型;上述運算模組耦合該第二處理裝置,將上述初稿向量導入該主題模型並選出一群組,並利用一第一演算模型計算前述群組之一向量與該初稿向量間的相似度值,當前數相似度值高於一閥值時,則推薦與該向量相對應的文章。進一步而言,上述複數篇文章與複數個文章單詞組合的相對應關係在於上述的複數篇文章亦被分離出與其相對應之複數個文章單詞組合;而上述文章初稿與初稿單詞組合的相對應關係在於上述的文章初稿亦被分離出與其相對應之初稿單詞組合。進一步而言,上述所指之單詞又進一步為關鍵字。 In a second aspect of the invention, the invention discloses a cloud server. The cloud server further includes a second processing device, a communication interface and a plurality of articles. The communication interface of the cloud server and the wireless transmission module Once the link is complete, the article is transferred from the authoring application The first draft is sent to the cloud server, where the aforementioned cloud server further includes a corpus, conversion module, grouping module, and computing module. The above-mentioned corpus is coupled to the second processing device for storing the first draft of the article, a plurality of articles, a first draft word combination corresponding to the first draft of the article, and a plurality of article word combinations corresponding to the plurality of articles (i.e., each article has a corresponding article word combination), furthermore, the first draft word combination is separated from the first draft of the article, and the plurality of article word combinations is separated from the plurality of articles (i.e., each article word combination is separated from each article article); the above-mentioned conversion module is coupled to the second processing device and the corpus, and has a machine learning model to convert the above-mentioned first draft word combination into a corresponding first draft vector, and the above-mentioned plurality of article word combinations are also converted is a plurality of corresponding vectors, so that the above-mentioned first draft vector corresponds to the first draft of the article, and the plurality of vectors corresponds to the plurality of articles; on the other hand, in order to classify the above-mentioned plurality of articles, in the present invention The cloud server further includes a grouping module, which is coupled to the second processing device, divides the plurality of articles and the plurality of vectors corresponding to them into multiple groups and establishes a topic model based on the multiple groups; the above-mentioned The computing module is coupled to the second processing device, imports the first draft vector into the topic model, selects a group, and uses a first calculation model to calculate the similarity value between one of the vectors in the aforementioned group and the first draft vector. The current number When the similarity value is higher than a threshold, articles corresponding to the vector are recommended. Furthermore, the corresponding relationship between the above-mentioned plural articles and the plural article word combinations is that the above-mentioned plural articles are also separated into their corresponding plural article word combinations; and the corresponding relationship between the first draft of the above-mentioned articles and the first draft word combinations The first draft of the above-mentioned article has also been separated into its corresponding first draft word combinations. Furthermore, the above-mentioned words are further keywords.
於本發明的第三觀點中,本發明除了揭露一種用於推薦文章之編輯系統,亦揭露該系統之文章推薦方法,其步驟包含:透過一編寫應用程式以產生一文章初稿並傳輸至該雲端伺服器,其中前述雲端伺服器又包含複數篇 文章;從上述文章初稿及複數篇文章分別分離出一初稿單詞組合及複數個文章單詞組合;利用一機器學習模型以將上述初稿單詞組合轉換為與其相對應之一初稿向量,且上述複數個文章單詞組合亦被轉換為與其相對應之複數個向量,以令該初稿向量相對應於該文章初稿,且該複數個向量相對應於該複數篇文章;將該複數篇文章與其相對應之該複數個向量分成多個群組並依據該多個群組建立一主題模型;將該初稿向量導入該主題模型並選出一群組,並利用一第一演算模型計算該群組之一向量與該初稿向量間的相似度值,當該相似度值高於一閥值時,則推薦與該向量相對應的文章;若該相似度值低於一閥值時,則不推薦與該向量相對應的文章。在一實施例中,上述機器學習模型係進一步為Doc2Vec模型。於某些實施例中,上述之相似度值進一步為餘弦相似度值。在一實施例中,上述第一演算模型進一步計算上述初稿向量與上述向量間的餘弦相似度值,若上述初稿向量與上述向量間的夾角角度越小,則換算出的餘弦相似度值越高,表示與該向量相對應的文章及使用者的文章初稿高度相似,因此,當該相似度值高於一閥值時,本發明之系統將推薦與上述向量相對應的文章給使用者參考。 In a third aspect of the present invention, the present invention not only discloses an editing system for recommending articles, but also discloses an article recommendation method of the system. The steps include: generating a first draft of an article through an authoring application and transmitting it to the cloud. Servers, wherein the aforementioned cloud servers include plural articles Article; separately separate a first draft word combination and a plurality of article word combinations from the first draft of the above article and the plurality of articles; use a machine learning model to convert the above first draft word combination into a corresponding first draft vector, and the above plurality of articles The word combination is also converted into a plurality of vectors corresponding to it, so that the first draft vector corresponds to the first draft of the article, and the plurality of vectors corresponds to the plurality of articles; the plurality of articles and its corresponding plurality The vectors are divided into multiple groups and a topic model is established based on the multiple groups; the first draft vector is imported into the topic model and a group is selected, and a first calculation model is used to calculate the relationship between one vector of the group and the first draft The similarity value between vectors. When the similarity value is higher than a threshold, the article corresponding to the vector is recommended; if the similarity value is lower than a threshold, the article corresponding to the vector is not recommended. article. In one embodiment, the above machine learning model is further a Doc2Vec model. In some embodiments, the above similarity value is further a cosine similarity value. In one embodiment, the first calculation model further calculates the cosine similarity value between the first draft vector and the above vector. If the angle between the first draft vector and the above vector is smaller, the calculated cosine similarity value is higher. , indicating that the article corresponding to the vector is highly similar to the user's first draft of the article. Therefore, when the similarity value is higher than a threshold, the system of the present invention will recommend the article corresponding to the vector to the user for reference.
在某一實施例中,上述系統進一步包含一分離模組,其耦合該第二處理裝置,上述文章初稿及該複數篇文章係藉由該分離模組之一第二演算模型分別分離出該初稿單詞組合及該複數個文章單詞組合;於某些實施例中,上述分離模組之第二演算模型進一步包含彈性搜尋(elasticsearch)元件以從該文章初稿、複數篇文章分別找出該初稿單詞組合及該複數個文章單詞組合;於另一實施例中,上述第二演算模型又進一步包含ik分詞演算法,其係用於將文章初稿及複數篇文章進行切詞及剔除不必要之詞彙(例如,標點符號、虛詞等 等),以從文章初稿分離出上述初稿單詞組合,且上述複數篇文章亦被分離出複數個文章單詞組合(即每一篇文章被分離出一文章單詞組合)。 In a certain embodiment, the above-mentioned system further includes a separation module coupled to the second processing device, and the first draft of the above-mentioned article and the plurality of articles are respectively separated from the first draft through a second calculation model of the separation module. The word combination and the plurality of article word combinations; in some embodiments, the second algorithm model of the above-mentioned separation module further includes an elastic search (elasticsearch) component to find the first draft word combination from the first draft of the article and the plurality of articles. and the plurality of article word combinations; in another embodiment, the above-mentioned second calculation model further includes an ik word segmentation algorithm, which is used to segment the first draft of the article and the plurality of articles and eliminate unnecessary words (such as , punctuation marks, function words, etc. etc.), to separate the above-mentioned first-draft word combinations from the first draft of the article, and the above-mentioned plural articles are also separated into a plurality of article word combinations (that is, each article is separated into one article word combination).
100:用於推薦文章之編輯系統 100:Editing system for recommended articles
102:編寫應用程式 102:Writing Applications
104:雲端伺服器 104:Cloud server
106:第一電子裝置 106:First electronic device
108:第一處理裝置 108: First processing device
110:第一記憶體 110: First memory
112:無線傳輸模組 112:Wireless transmission module
114:顯示器 114:Display
116:第二處理裝置 116: Second processing device
118:通訊介面 118: Communication interface
120:語料庫 120:Corpus
122:轉換模組 122:Conversion module
124:分群模組 124:Group module
126:運算模組 126:Operation module
128:分離模組 128:Separation module
130:資料視覺化模組 130:Data visualization module
A:文章初稿 A: First draft of the article
A1:初稿單詞組合 A1: First draft word combination
B-J:複數篇文章 B-J: plural articles
B1-J1:複數個文章單詞組合 B1-J1: Plural article word combinations
202~210:步驟 202~210: Steps
300:用於推薦文章之編輯系統 300: Editing system for recommended articles
302:分離模組 302:Separate module
304:轉換模組 304:Conversion module
306:分群模組 306:Group module
A:文章初稿 A: First draft of the article
A1:初稿單詞組合 A1: First draft word combination
B-J:複數篇文章 B-J: plural articles
B1-J1:複數個文章單詞組合 B1-J1: Plural article word combinations
a:初稿向量 a: first draft vector
b-j:複數個向量 b-j: plural vectors
Mo:主題模型 Mo: topic model
Go1:第一群組 Go1: first group
Go2:第二群組 Go2: The second group
Go3:第三群組 Go3: The third group
Ma:二維地圖 Ma: two-dimensional map
1-7:編號 1-7:Number
400:用於推薦文章之編輯系統 400:Editing system for recommended articles
402:表格 402:Table
本發明之實施例係藉由後附圖式中之實例加以說明,而非用以限制本發明。後附圖式中相似之元件符號係指類似之元件。 The embodiments of the present invention are illustrated by the examples in the accompanying drawings, but are not intended to limit the present invention. Similar reference numbers in the following drawings refer to similar components.
圖1係用以顯示一種用於推薦文章之編輯系統之基本架構以及其應用架構之方塊示意圖。 Figure 1 is a block diagram showing the basic architecture of an editing system for recommending articles and its application architecture.
圖2係顯示本發明系統之文章推薦方法流程圖。 Figure 2 is a flow chart showing the article recommendation method of the system of the present invention.
圖3(a)係為一實施例用以顯示本發明編輯系統中之數據流程圖。 FIG. 3(a) is a data flow diagram showing an embodiment of the editing system of the present invention.
圖3(b)係本發明之一實施例用以顯示本發明之二維地圖。 Figure 3(b) is an embodiment of the present invention for displaying a two-dimensional map of the present invention.
圖4係利用一實施例顯示透過本發明推薦之文章。 FIG. 4 shows articles recommended through the present invention using an embodiment.
本發明將以較佳實施例及觀點加以描述,此類敘述係解釋本發明之系統與方法,僅用以說明而非用以限制本發明之申請專利範圍。因此,除說明書中之較佳實施例以外,本發明亦可廣泛實行於其他實施例中。 The present invention will be described with preferred embodiments and viewpoints. Such descriptions explain the system and method of the present invention and are only used to illustrate and not limit the patentable scope of the present invention. Therefore, in addition to the preferred embodiments in the description, the present invention can also be widely implemented in other embodiments.
如圖1所示,其用以顯示一種用於推薦文章之編輯系統100之基本架構以及其應用架構之方塊示意圖。本發明之系統100係依據使用者所撰擬之文章初稿以從雲端伺服器之複數篇文章找出並推薦與前述文章初稿高關聯性的文章,其包含一編寫應用程式102及一雲端伺服器104,其中上述編寫應用程式係位於一第一電子裝置106(例如,手持電子裝置、桌上型電腦)中,而上述第一電子裝置106又具有一第一處理裝置108、一第一記憶體110、無線傳輸模組112及顯示器114,其中前述第一記憶體110,例如雲端硬碟(Microsoft
SkyDrive、Google Drive、Apple iTune)或一般硬碟,耦合第一處理裝置108(可包含CPU、緩衝器、多工器等處理單元),用以儲存上述編寫應用程式102,其中透過上述編寫應用程式102所提供之一操作介面1021輸入內容(包括但不限於文字、圖片)用以產生一文章初稿A,其中前述文章初稿A亦儲存於前述第一記憶體110,而上述文章初稿A係又進一步透過該第一電子裝置106之無線傳輸模組112輸出至雲端伺服器104。上述顯示器114係耦合第一處理裝置108以顯示上述編寫應用程式102。
As shown in FIG. 1 , it is a block diagram showing the basic architecture of an
本發明系統進一步揭露一雲端伺服器104,前述雲端伺服器進一步包含一第二處理裝置116、一通訊介面118及複數篇文章B-J,前述雲端伺服器104之通訊介面118與上述無線傳輸模組112連結完成後則自該編寫應用程式102傳輸該文章初稿A至該雲端伺服器,其中前述雲端伺服器104又進一步包含一語料庫120、轉換模組122、分群模組124、運算模組126。上述之語料庫120耦合該第二處理裝置116用以儲存該文章初稿A、該複數篇文章B-J、初稿單詞組合A1及複數個文章單詞組合B1-J1,其中該初稿單詞組合A1係分離自該文章初稿A,該複數個文章單詞組合B1-J1係分離自該複數篇文章B-J,即文章初稿A具有相對應之初稿單詞組合A1,每一文章具有相對應之一文章單詞組合(例如,文章B具有相對應之一文章單詞組合B1),進一步而言,上述初稿單詞組合A1係分離自前述之文章初稿A,而上述複數個文章單詞組合B1-J1係分離自上述複數篇文章B-J。於某些實施例中,上述初稿單詞組合A1進一步為該文章初稿的多個單詞所組合而成,而上述複數個文章單詞組合中每一文章單詞組合進一步為每一文章的多個單詞所組合而成(例如,文章單詞組合B1進一步為文章B的多個單詞所組合而成)。進一步而言,上述所指之單詞又進一步為關鍵字。
The system of the present invention further discloses a cloud server 104. The cloud server further includes a
於某些實施例中,為了從上述文章初稿A及複數篇文章B-J分別分離出初稿單詞組合A1及複數個文章單詞組合B1-J1,本發明之系統100進一步包含分離模組128,其係耦合該第二處理裝置116,而前述文章初稿A及該複數篇文章B-J係藉由該分離模組128之一第二演算模型分別分離出該初稿單詞組合及該複數個文章單詞組合;上述第二演算模型進一步包含彈性搜尋(elasticsearch)元件以分別從該文章初稿A、複數篇文章B-J分別找出該初稿單詞組合A1及該複數個文章單詞組合B1-J1;於一實施例中,上述第二演算模型又進一步包含ik分詞演算法以分別從該文章初稿A、複數篇文章B-J分別分離出該初稿單詞組合A1及該複數個文章單詞組合B1-J1。
In some embodiments, in order to separate the first draft word combination A1 and the plurality of article word combinations B1-J1 from the first draft A of the article and the plurality of articles B-J respectively, the
在一實施例中,上述轉換模組122則耦合上述第二處理裝置116,其具有一機器學習模型以將該初稿單詞組合A1轉換為與其相對應之一初稿向量,且該複數個文章單詞組合B1-J1亦被轉換為與其相對應之複數個向量,以令該初稿向量相對應於該文章初稿A,且該複數個向量相對應於該複數篇文章B-J(請參照圖3)。在一實施例中,上述機器學習模型係進一步為Doc2Vec模型。
In one embodiment, the above-mentioned
於某些實施例中,為了將上述複數篇文章B-J進行分類,於本發明之雲端伺服器又包含一分群模組124,其耦合該第二處理裝置116,將該複數篇文章與其相對應之該複數個向量分成多個群組並依據該多個群組建立一主題模型;上述運算模組126,耦合該第二處理裝置116,以將該初稿向量導入該主題模型並選出一群組,並利用一第一演算模型計算該群組之一向量與該初稿向量間的相似度值,當該相似度值高於一閥值時,則推薦與該向量相對應的文章(請參照以下圖3(a))。
In some embodiments, in order to classify the plurality of articles B-J, the cloud server of the present invention further includes a
於一實施例中,上述系統進一步包含一資料視覺化模組130,其係依據上述主題模型以建構一二維地圖(請參照圖3(b)),而前述資料視覺化模組係將該與該向量相對應的文章以編號顯示於該二維地圖。另一方面,上述運算模組進一步又將該複數個向量與該初稿向量坐標化而分別產生複數個坐標,而上述複數個坐標進一步透過該資料視覺化模組呈現於該二維地圖上。於一實施例中,本發明之系統進一步包含一表格生成模組以將該與該向量相對應的文章並加以編號及表格化。
In one embodiment, the above system further includes a
參照圖2所示,圖2係為本發明系統之文章推薦方法流程圖,其對應於上述用於推薦文章之編輯系統,以下配合本發明之方法步驟202~210進行詳細闡述。 Referring to Figure 2, Figure 2 is a flow chart of the article recommendation method of the system of the present invention, which corresponds to the above editing system for recommending articles. The method steps 202 to 210 of the present invention will be described in detail below.
本發明除了揭露一種用於推薦文章之編輯系統,亦揭露該系統之文章推薦方法,如步驟202所示,透過一編寫應用程式以產生一文章初稿並傳輸至該雲端伺服器。 The present invention not only discloses an editing system for recommending articles, but also discloses an article recommendation method of the system. As shown in step 202, a first draft of an article is generated through a writing application and transmitted to the cloud server.
如步驟204所示,從該文章初稿及複數篇文章分別分離出一初稿單詞組合及複數個文章單詞組合。 As shown in step 204, a first draft word combination and a plurality of article word combinations are separated from the first draft of the article and the plurality of articles respectively.
如步驟206所示,利用一機器學習模型以將該初稿單詞組合轉換為與其相對應之一初稿向量,且該複數個文章單詞組合亦被轉換為與其相對應之複數個向量,以令該初稿向量相對應於該文章初稿,且該複數個向量相對應於該複數篇文章。 As shown in step 206, a machine learning model is used to convert the first draft word combination into a corresponding first draft vector, and the plurality of article word combinations are also converted into a plurality of corresponding vectors, so that the first draft The vector corresponds to the first draft of the article, and the plurality of vectors corresponds to the plurality of articles.
如步驟208所示,將該複數篇文章與其相對應之該複數個向量分成多個群組並依據該多個群組建立一主題模型。 As shown in step 208, the plurality of articles and the plurality of corresponding vectors are divided into multiple groups and a topic model is established based on the multiple groups.
如步驟210所示,將該初稿向量導入該主題模型並選出一群組,並利用一第一演算模型計算該群組之一向量與該初稿向量間的相似度值,當該相似度值高於一閥值時,則推薦與該向量相對應的文章。於某些實施例中,上述之相似度值進一步為餘弦相似度值。在一實施例中,上述第一演算模型進一步計算上述初稿向量與上述向量間的餘弦相似度值,若上述初稿向量與上述向量間的夾角角度越小,則換算出的餘弦相似度值越高,表示與該向量相對應的文章及使用者的文章初稿高度相似,因此,當該相似度值高於一閥值時,本發明之系統將推薦與上述向量相對應的文章給使用者參考。 As shown in step 210, the first draft vector is imported into the topic model and a group is selected, and a first calculation model is used to calculate the similarity value between one vector of the group and the first draft vector. When the similarity value is high When it reaches a threshold, articles corresponding to the vector are recommended. In some embodiments, the above similarity value is further a cosine similarity value. In one embodiment, the first calculation model further calculates the cosine similarity value between the first draft vector and the above vector. If the angle between the first draft vector and the above vector is smaller, the calculated cosine similarity value is higher. , indicating that the article corresponding to the vector is highly similar to the user's first draft of the article. Therefore, when the similarity value is higher than a threshold, the system of the present invention will recommend the article corresponding to the vector to the user for reference.
請參照圖3(a)及(b),圖3(a)及(b)係為一實施例用以顯示本發明用於推薦文章之編輯系統300中之數據流程圖。上述文章初稿A則自該編寫應用程式傳輸至該雲端伺服器,而前述雲端伺服器又包含複數篇文章B-J,其中前述文章初稿A及複數篇文章B-J係藉由該分離模組302之一第二演算模型分別分離出該初稿單詞組合A1及該複數個文章單詞組合B1-J1;再利用轉換模組304之機器學習模型以將上述初稿單詞組合A1轉換為與其相對應之一初稿向量a,且上述複數個文章單詞組合B1-J1亦被轉換為與其相對應之複數個向量b-j,以令上述初稿向量a相對應於該文章初稿A,且該複數個向量b-j相對應於該複數篇文章B-J;於本發明之系統中,上述分群模組306其耦合該第二處理裝置,將上述複數篇文章B-J與其相對應之該複數個向量b-j分成多個群組並依據該多個群組建立一主題模型,舉例而言,將文章B-D與其相對應之該複數個向量b-d被分為第一群組Go1,文章E-G其相對應之該複數個向量e-g被分為第二群組Go2,文章H-J其相對應之該複數個向量h-j被分為第三群組Go3,使上述主題模型Mo包含上述第一群組Go1、第二群組Go2及第三群組Go3,其中上述初稿向量a導入該主題
模型Mo並選出上述第一群組Go1,並利用一第一演算模型計算該第一群組Go1之向量b與該初稿向量a間的相似度值,當該相似度值高於一閥值時,則推薦與該向量b相對應的文章B;若該相似度值低於一閥值時,則不推薦與該向量b相對應的文章B。
Please refer to FIGS. 3(a) and (b). FIGS. 3(a) and (b) are data flow diagrams showing an embodiment of the present invention in the
於某一實施例中,上述系統進一步包含一資料視覺化模組,其係依據上述主題模型以建構一二維地圖,而前述資料視覺化模組係將該與該向量b相對應的文章以編號顯示於該二維地圖。另一方面,上述運算模組進一步將該複數個向量b-j與該初稿向量a坐標化而分別產生複數個坐標,而上述複數個坐標進一步透過該資料視覺化模組呈現於該二維地圖上。 In an embodiment, the system further includes a data visualization module, which constructs a two-dimensional map based on the topic model, and the data visualization module converts the article corresponding to the vector b to The number is shown on the 2D map. On the other hand, the above-mentioned computing module further coordinates the plurality of vectors b-j and the first draft vector a to generate a plurality of coordinates respectively, and the above-mentioned plurality of coordinates are further presented on the two-dimensional map through the data visualization module.
請參照圖3(b),圖3(b)係本發明之一實施例用以顯示本發明之二維地圖。舉例而言,圖3(b)係進一步為依據圖3(a)中數據流程產生之結果而構成之二維地圖Ma。由於該複數篇文章與其相對應之該複數個向量進一步被分成三個群組,並依據該三個群組建立一主題模型Mo,因此,從上述二維地圖呈現出上述主題模型Mo由第一群組Go1、第二群組Go2及第三群組Go3所建立而成,另外,本發明係將該初稿向量a座標化而產生座標a,而為了推薦與該向量b相對應的文章B,本發明係將該向量b座標化而產生座標b,可見與該向量b相對應的文章B為前述資料視覺化模組以編號“1”顯示於該二維地圖的座標b,而與該初稿向量a相對應之文章初稿A則以“本篇”顯示於該二維地圖的座標a。另一方面,當第一群組Go1之其他向量與該初稿向量間的相似度值亦高於一閥值時,本發明係將該其他向量座標化而產生多個座標,則也推薦與該其他向量相對應的其他文章並透過前述資料視覺化模組以編號顯示於該二維地圖的多個座標,如圖3(b)所示的編號“2~7”。在一實施例中,當與該向量b相對應的文 章B為前述資料視覺化模組以編號“1”顯示時,則表示上述文章B之向量b與該初稿向量間具有最高的相似度值。 Please refer to Figure 3(b). Figure 3(b) is an embodiment of the present invention for displaying a two-dimensional map of the present invention. For example, Figure 3(b) is a two-dimensional map Ma formed based on the results generated by the data flow in Figure 3(a). Since the plurality of articles and their corresponding plurality of vectors are further divided into three groups, and a topic model Mo is established based on the three groups, therefore, the above-mentioned topic model Mo is presented from the above-mentioned two-dimensional map by the first The group Go1, the second group Go2 and the third group Go3 are established. In addition, the present invention coordinates the first draft vector a to generate the coordinate a, and in order to recommend the article B corresponding to the vector b, The present invention coordinates the vector b to generate the coordinate b. It can be seen that the article B corresponding to the vector b is the coordinate b displayed on the two-dimensional map with the number "1" by the aforementioned data visualization module, and is different from the first draft. The first draft of the article A corresponding to the vector a is displayed as "this article" at the coordinate a of the two-dimensional map. On the other hand, when the similarity value between other vectors of the first group Go1 and the first draft vector is also higher than a threshold value, the present invention coordinates the other vectors to generate multiple coordinates, then it is also recommended to coordinate the other vectors with the first draft vector. Other articles corresponding to other vectors are displayed in multiple coordinates of the two-dimensional map with numbers through the aforementioned data visualization module, such as the numbers "2~7" shown in Figure 3(b). In one embodiment, when the text corresponding to the vector b When Chapter B is displayed with the number "1" for the aforementioned data visualization module, it means that the vector b of the aforementioned article B has the highest similarity value with the first draft vector.
請參照圖4,圖4係利用一實施例顯示透過本發明推薦之文章。本發明之用於推薦文章之編輯系統400進一步包含一表格生成模組以生成一表格402。舉例而言,若使用者將該初稿向量導入“房產生活”分類的主題模型並選出一“投資思維”之群組,並利用上述第一演算模型計算該“投資思維”群組之多個向量與該初稿向量間的相似度值,當有5個向量該初稿向量間的相似度值均高於一閥值時,則推薦與該5個向量相對應的5個文章。為了推薦上述5個文章,本發明之系統所包含一表格生成模組以將上述5個文章依照相似度值由高至低排列並加以編號及表格化,因此,在表格的文章標題中排名第1的“都是三房差在哪?大三房、小三房、2+1房哪個好?”即為其向量與初稿向量間的相似度值最高的文章;在表格402中亦會呈現編號、每個文章的標題及作者、主題模型(例如,“房產生活”分類)、主題模型中之群組(例如,“投資思維”)、被推薦之每個文章的點擊總流量及近一月的流量。 Please refer to FIG. 4 , which shows articles recommended through the present invention using an embodiment. The editing system 400 for recommending articles of the present invention further includes a table generation module to generate a table 402. For example, if the user imports the first draft vector into the topic model of the "real estate life" category and selects a group of "investment thinking", and uses the above-mentioned first calculation model to calculate multiple vectors of the "investment thinking" group The similarity value between the first draft vector and the first draft vector. When there are five vectors whose similarity values between the first draft vector are all higher than a threshold, then five articles corresponding to the five vectors are recommended. In order to recommend the above-mentioned 5 articles, the system of the present invention includes a table generation module to arrange the above-mentioned 5 articles from high to low according to the similarity value, number and tabulate them. Therefore, the article title in the table is ranked first. 1's "What's the difference between all three-bedroom apartments? Which one is better, a large three-bedroom apartment, a small three-bedroom apartment, or a 2+1 bedroom apartment?" is the article with the highest similarity value between its vector and the first draft vector; the number will also be displayed in Table 402 , the title and author of each article, the topic model (for example, "real estate life" category), the group in the topic model (for example, "investment thinking"), the total click traffic of each recommended article and the past month of traffic.
本發明之方法中的若干者係以其最基礎的形式加以敘述,但在不脫離本發明之基礎範圍下仍可加入若干方法至其任一者或從其任一者刪除若干方法,且可增加若干資訊至此處所述訊息之任一者中或從其刪減若干資訊。此領域中具通常知識之技藝者將得以領會,可對本發明進一步做若干更動及改變。此處所提供之特定實施例並非用以限制本發明,而係用以說明本發明。 Several of the methods of the invention are described in their most basic form, but methods may be added to or deleted from any of the invention without departing from the basic scope of the invention, and may Add certain information to or delete certain information from any of the messages described herein. Those skilled in the art will appreciate that several further modifications and variations may be made to the present invention. The specific examples provided herein are not intended to limit the invention but to illustrate the invention.
以上敘述係為本發明之較佳實施例。此領域之技藝者應得以領會其係用以說明本發明而非用以限定本發明所主張之專利權利範圍。其專利保護範圍當視後附之申請專利範圍及其等同領域而定。凡熟悉此領域之技藝者, 在不脫離本專利精神或範圍內,所作之更動或潤飾,均屬於本發明所揭示精神下所完成之等效改變或設計,且應包含在下述之申請專利範圍內。 The above description is the preferred embodiment of the present invention. Those skilled in the art should understand that they are used to illustrate the present invention and not to limit the scope of the claimed patent rights of the present invention. The scope of patent protection shall depend on the appended patent application scope and its equivalent fields. Anyone who is familiar with the skills in this field, Any changes or modifications made without departing from the spirit or scope of this patent shall be equivalent changes or designs completed within the spirit disclosed in this invention, and shall be included in the following patent application scope.
100:用於推薦文章之編輯系統 100:Editing system for recommended articles
102:編寫應用程式 102:Writing Applications
104:雲端伺服器 104:Cloud server
106:第一電子裝置 106:First electronic device
108:第一處理裝置 108: First processing device
110:第一記憶體 110: First memory
112:無線傳輸模組 112:Wireless transmission module
114:顯示器 114:Display
116:第二處理裝置 116: Second processing device
118:通訊介面 118: Communication interface
120:語料庫 120:Corpus
122:轉換模組 122:Conversion module
124:分群模組 124:Group module
126:運算模組 126:Operation module
128:分離模組 128:Separation module
130:資料視覺化模組 130:Data visualization module
A:文章初稿 A: First draft of the article
A1:初稿單詞組合 A1: First draft word combination
B-J:複數篇文章 B-J: plural articles
B1-J1:複數個文章單詞組合 B1-J1: Plural article word combinations
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW111129666A TWI849472B (en) | 2022-08-04 | An editing system used for recommending articles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW111129666A TWI849472B (en) | 2022-08-04 | An editing system used for recommending articles |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202407575A true TW202407575A (en) | 2024-02-16 |
TWI849472B TWI849472B (en) | 2024-07-21 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tandoc Jr et al. | Man vs. machine? The impact of algorithm authorship on news credibility | |
US20130263019A1 (en) | Analyzing social media | |
US20130018892A1 (en) | Visually Representing How a Sentiment Score is Computed | |
US8706685B1 (en) | Organizing collaborative annotations | |
US9483462B2 (en) | Generating training data for disambiguation | |
US11055338B2 (en) | Dynamic facet tree generation | |
Castellanos et al. | LCI: a social channel analysis platform for live customer intelligence | |
US20060200341A1 (en) | Method and apparatus for processing sentiment-bearing text | |
US10324917B2 (en) | Methods and systems for data management | |
US11042689B2 (en) | Generating a document preview | |
WO2013161850A1 (en) | Text mining system, text mining method, and program | |
JP2015508514A (en) | Method and system for computer-assisted consumption of information from application data files | |
US9569510B2 (en) | Crowd-powered self-improving interactive visualanalytics for user-generated opinion data | |
US11574123B2 (en) | Content analysis utilizing general knowledge base | |
CN108710654A (en) | A kind of public sentiment data method for visualizing and equipment | |
US20210406270A1 (en) | Leveraging Interlinking Between Information Resources to Determine Shared Knowledge | |
WO2012078383A2 (en) | Evaluation assistant for online discussion | |
US9020962B2 (en) | Interest expansion using a taxonomy | |
US8458192B1 (en) | System and method for determining topic interest | |
US20220300762A1 (en) | Ordering presentation of training documents for machine learning | |
US20160260339A1 (en) | System for taking notes | |
US20240086452A1 (en) | Tracking concepts within content in content management systems and adaptive learning systems | |
TWI849472B (en) | An editing system used for recommending articles | |
TW202407575A (en) | An editing system used for recommending articles | |
WO2021117483A1 (en) | Information processing device, information processing method, and program |