TWI676110B - Semantic feature analysis system for article analysis based on readers - Google Patents

Semantic feature analysis system for article analysis based on readers Download PDF

Info

Publication number
TWI676110B
TWI676110B TW107129087A TW107129087A TWI676110B TW I676110 B TWI676110 B TW I676110B TW 107129087 A TW107129087 A TW 107129087A TW 107129087 A TW107129087 A TW 107129087A TW I676110 B TWI676110 B TW I676110B
Authority
TW
Taiwan
Prior art keywords
information
vector
semantic
read
reader
Prior art date
Application number
TW107129087A
Other languages
Chinese (zh)
Other versions
TW202009746A (en
Inventor
陳信翰
莊繼興
黃彥鈞
陳俊維
Original Assignee
良知股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 良知股份有限公司 filed Critical 良知股份有限公司
Priority to TW107129087A priority Critical patent/TWI676110B/en
Application granted granted Critical
Publication of TWI676110B publication Critical patent/TWI676110B/en
Publication of TW202009746A publication Critical patent/TW202009746A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本發明係關於一種以讀者為中心進行文章分析的語意特徵分析系統,包括一雲端伺服器與一個以上讀者電子裝置,該雲端伺服器透過網路連結該讀者電子裝置,並且接收一篇以上已閱讀文章資訊,該雲端伺服器根據該已閱讀文章資訊執行一語意計算程序及一詞彙計算程序,以分別得到一語意向量資訊及一詞彙向量資訊,該雲端伺服器根據該語意向量資訊及該詞彙向量資訊,評估讀者閱讀過的文章內的語意特徵,進而提供讀者感興趣的文章資訊供讀者參考,藉由以讀者為中心進行語意特徵分析,提供符合的文章給讀者參考,以達到提升文章分析及推薦準確性的目的。The invention relates to a reader-centered semantic feature analysis system for article analysis, which includes a cloud server and more than one reader electronic device. The cloud server connects the reader electronic device through a network and receives one or more read Article information, the cloud server executes a semantic calculation process and a vocabulary calculation process according to the read article information to obtain a semantic vector information and a lexical vector information, respectively, and the cloud server according to the semantic vector information and the Vocabulary vector information, assessing the semantic features of the articles that the readers have read, and then providing readers with information about the articles of interest to the readers. Through the reader-centered semantic feature analysis, the articles are provided to the readers for reference, so as to improve the articles Purpose of analysis and recommendation accuracy.

Description

以讀者為中心進行文章分析的語意特徵分析系統Semantic feature analysis system for reader-centered article analysis

本發明係關於一種分析系統,尤指一種以讀者為中心進行文章分析的語意特徵分析系統。The invention relates to an analysis system, especially a semantic feature analysis system that analyzes articles centered on readers.

隨著數據資訊的快速發展,每個讀者對於數據資訊的接受程度、喜愛程度均不相同,這些數據資訊包括有報章雜誌、旅遊文章、購物文章、消費推薦文章等各類文章資訊,例如有些讀者喜歡閱讀購物文章,有些讀者喜歡閱讀旅遊文章,因此,各家購物公司、旅遊公司等無不希望能夠將自家商品資訊的相關文章資訊推薦給合適的讀者。With the rapid development of data information, each reader's acceptance and love of data information are different. These data information include newspapers, magazines, travel articles, shopping articles, consumer recommended articles and other articles, such as some readers I like to read shopping articles, and some readers like to read travel articles. Therefore, various shopping companies, travel companies, etc. all want to be able to recommend relevant article information about their own product information to suitable readers.

目前對於數據資訊的服務方式,大多採取單方向的服務如各家公司將文章資訊推薦給所有的讀者,但是並非每個讀者均對這些文章資訊感興趣,容易造成讀者排斥而產生服務不佳的狀況。At present, most of the service methods for data information adopt unidirectional services. For example, various companies recommend article information to all readers, but not every reader is interested in these article information, which easily leads to reader rejection and poor service. situation.

此外,另有一種數據資訊服務的方式,由各家公司透過第三方系統如Google Analytics、Facebook Pixal、短網址服務器等方式,收集所有文章資訊被閱讀的狀況,並透過人工整理、判讀,以確認哪些文章資訊被閱讀率(被點集率)高,然而這樣的方式不僅費時,也無法得知使用者對於文章資訊的喜好,所以如何能以讀者為角度分析使用者所喜愛的文章資訊,以期提供更加的服務,確實是目前尚待改善的問題。In addition, there is another method of data information service. Each company collects the reading status of all article information through third-party systems such as Google Analytics, Facebook Pixal, short URL server, etc., and manually organizes and judges to confirm Which article information has a high reading rate (point collection rate), but this method is not only time-consuming, but also can not know the user's preferences for article information, so how to analyze the user's favorite article information from the perspective of readers, with a view to Providing more services is indeed an issue that needs to be improved.

有鑑於上述現有技術所存在的問題,本發明係提供一種以讀者為中心進行文章分析的語意特徵分析系統,透過接收讀者已閱讀過的文章資訊,進行語意特徵的分析,以掌握讀者所喜愛的文章資訊,以提供最合適的文章資訊給讀者。In view of the problems existing in the foregoing prior art, the present invention provides a reader-centered semantic feature analysis system for article analysis. By receiving information about articles that the reader has read, the present invention analyzes the semantic features in order to grasp the reader's favorite features. Article information to provide readers with the most suitable article information.

為了達成上述目的所採取的一主要技術手段,係令前述以讀者為中心進行文章分析的語意特徵分析系統,包括: 一個以上的讀者電子裝置,用以供讀者閱讀文章資訊; 一雲端伺服器,經由網路與該讀者電子裝置連接; 其中,該雲端伺服器接收該讀者電子裝置回饋一篇以上的已閱讀文章資訊,並且執行一語意計算程序,該語意計算程序執行以下步驟: 對該篇已閱讀文章資訊進行一斷詞分析,以取得一第一斷詞結果; 根據一語意向量演算法對該第一斷詞結果進行一高維度向量計算,以產生一語意向量資訊。A major technical means adopted in order to achieve the above purpose is a semantic feature analysis system for ordering the reader-centered article analysis, including: more than one reader electronic device for readers to read the article information; a cloud server, Connected to the reader electronic device via a network; wherein the cloud server receives the reader electronic device to return more than one read article information, and executes a semantic calculation program, the semantic calculation program performs the following steps: Read the article information and perform a word segmentation analysis to obtain a first word segmentation result; perform a high-dimensional vector calculation on the first word segmentation result according to a semantic vector algorithm to generate a semantic vector information.

根據上述內容可知,藉由雲端伺服器接收已閱讀文章資訊,並且執行該語意計算程序,以了解讀者閱讀過的該以閱讀文章資訊的語意向量資訊後,該雲端伺服器即可根據該語意向量資訊提供適合文章資訊給讀者,藉由以讀者為中心對閱讀過的文章進行語意特徵分析,以提供符合的文章給讀者參考,以達到提升文章分析及推薦準確性的目的。According to the above content, after the cloud server receives the read article information and executes the semantic calculation process to understand the semantic vector information of the read article information that the reader has read, the cloud server can then The intention vector information provides suitable article information to the readers, and analyzes the semantic characteristics of the articles read by the reader as the center, so as to provide the corresponding articles for the readers' reference, so as to improve the accuracy of the article analysis and recommendation.

為了達成上述目的所採取的另一主要技術手段,係令前述以讀者為中心進行文章分析的語意特徵分析系統,包括: 一個以上的讀者電子裝置,用以供讀者閱讀文章資訊; 一雲端伺服器,經由網路與該讀者電子裝置連接; 其中,該雲端伺服器接收該讀者電子裝置回饋一篇以上的已閱讀文章資訊,並且執行一詞彙計算程序,該詞彙計算程序執行以下步驟: 對該篇已閱讀文章資訊進行一斷詞分析,以取得一第二斷詞結果; 根據一使用詞彙向量演算法對該第二斷詞結果進行一維度向量計算,以產生一使用詞彙向量資訊。Another main technical means adopted to achieve the above purpose is to make the aforementioned semantic feature analysis system for reader-centered article analysis, including: more than one reader electronic device for readers to read the article information; a cloud server , And is connected to the reader electronic device via the network; wherein the cloud server receives the reader electronic device returning more than one read article information, and executes a vocabulary calculation program, the vocabulary calculation program performs the following steps: Read the article information for a word segmentation analysis to obtain a second word segmentation result; perform a one-dimensional vector calculation on the second word segmentation result according to a vocabulary vector algorithm to generate a vocabulary vector information.

根據上述內容可知,藉由雲端伺服器接收已閱讀文章資訊,並且執行該詞彙計算程序,以了解讀者閱讀過的該以閱讀文章資訊的使用詞彙向量資訊後,該雲端伺服器即可根據該使用詞彙向量資訊提供適合文章資訊給讀者,藉由以讀者為中心對閱讀過的文章進行語意特徵分析,以提供符合的文章給讀者參考,以達到提升文章分析及推薦準確性的目的。According to the above content, after the cloud server receives the read article information and executes the vocabulary calculation process to understand the reader's reading of the vocabulary vector information of the read article information, the cloud server can then use the The vocabulary vector information provides suitable article information to the readers. By analyzing the semantic characteristics of the articles read by the reader as the center, the article is provided to the readers for reference, so as to improve the accuracy of article analysis and recommendation.

為了達成上述目的所採取的又一主要技術手段,係令前述以讀者為中心進行文章分析的語意特徵分析系統,包括: 一個以上的讀者電子裝置,用以供讀者閱讀文章資訊; 一雲端伺服器,經由網路與該讀者電子裝置連接; 其中,該雲端伺服器接收該讀者電子裝置回饋一篇以上的已閱讀文章資訊,並且執行一語意計算程序以及一詞彙計算程序,該語意計算程序執行以下步驟: 對該篇已閱讀文章資訊進行一斷詞分析,以取得一第一斷詞結果; 根據一語意向量演算法對該第一斷詞結果進行一高維度向量計算,以產生一語意向量資訊; 其中,該詞彙計算程序執行以下步驟: 對該篇已閱讀文章資訊進行一斷詞分析,以取得一第二斷詞結果; 根據一使用詞彙向量演算法對該第二斷詞結果進行一維度向量計算,以產生一使用詞彙向量資訊。In order to achieve the above purpose, another major technical means is to make the aforementioned semantic feature analysis system for reader-centered article analysis, including: more than one reader electronic device for readers to read the article information; a cloud server Connected to the reader's electronic device via the network; wherein the cloud server receives the reader's electronic device to return more than one read article information, and executes a semantic calculation program and a vocabulary calculation program, the semantic calculation program executes the following Steps: Perform a word segmentation analysis on the information of the read article to obtain a first word segmentation result; perform a high-dimensional vector calculation on the first word segmentation result according to a semantic vector algorithm to generate a semantic intent The vocabulary calculation program performs the following steps: performing a word segmentation analysis on the read article information to obtain a second word segmentation result; performing a second word segmentation result according to a vocabulary vector algorithm One-dimensional vector calculation to generate a vocabulary vector information.

根據上述內容可知,藉由雲端伺服器接收已閱讀文章資訊,並且執行該語意計算程序以及該詞彙計算程序,以產生對應讀者閱讀過的文章資訊的語意向量資訊及使用詞彙向量資訊,藉此得以讀者為中心對讀者閱讀過的文章進行語意特徵分析,而準確掌握讀者閱讀過的文章資訊的語意特徵,並可準確提供適合讀者的文章供讀者閱讀,藉由以讀者為中心對閱讀過的文章進行語意特徵分析,以提供符合的文章給讀者參考,以達到提升文章分析及推薦準確性的目的。According to the above content, it is known that the cloud server receives the read article information and executes the semantic calculation process and the vocabulary calculation process to generate semantic vector information corresponding to the article information that the reader has read and use vocabulary vector information, thereby The reader-centered analysis of the semantic characteristics of the articles read by the readers, and accurately grasp the semantic characteristics of the information of the articles read by the readers, and accurately provide the articles suitable for the readers for the readers to read. The article conducts semantic feature analysis to provide readers with reference articles to achieve the purpose of improving the accuracy of article analysis and recommendation.

關於本發明以讀者為中心進行文章分析的語意特徵分析系統的較佳實施例,請參考圖1所示,包括一雲端伺服器10與一個以上的讀者電子裝置20,該雲端伺服器10係透過網路與該讀者電子裝置20連結。在本實施例中,該等讀者電子裝置20包括個人電腦、筆記型電腦、智慧型手機、平板電腦、智慧型穿戴裝置等具有連網、顯示、操作功能的電子裝置。Regarding a preferred embodiment of a semantic feature analysis system for analyzing articles based on readers of the present invention, please refer to FIG. 1, which includes a cloud server 10 and more than one reader electronic device 20. The cloud server 10 is The network is connected to the reader electronic device 20. In this embodiment, the reader electronic devices 20 include electronic devices such as a personal computer, a notebook computer, a smart phone, a tablet computer, and a smart wearable device with networking, display, and operation functions.

該雲端伺服器10包括一資料分析模組11、一資料擷取模組12、一資料儲存模組13、一閱讀行為回饋模組14。在本實施例中進一步還包括一資料通報模組15。該資料分析模組11用以對接收到的資料進行分析,該資料擷取模組12係經過網路連接到一個以上的數據資料庫,以擷取數據資料,並且儲存到該資料儲存模組13,該資料儲存模組13供儲存資料,該閱讀行為回饋模組14經過網路連結該讀者電子裝置20,以接收該讀者電子裝置20回饋的一篇以上的已閱讀文章資訊,該資料通報模組15係用以推播資訊給該讀者電子裝置20。The cloud server 10 includes a data analysis module 11, a data acquisition module 12, a data storage module 13, and a reading behavior feedback module 14. In this embodiment, a data reporting module 15 is further included. The data analysis module 11 is used to analyze the received data. The data acquisition module 12 is connected to more than one data database via the network to retrieve data and store it in the data storage module. 13. The data storage module 13 is used for storing data, and the reading behavior feedback module 14 is connected to the reader electronic device 20 via a network to receive one or more read article information returned by the reader electronic device 20, and the data report The module 15 is used to push information to the reader electronic device 20.

在本實施例中,該資料擷取模組12所連接的數據資料庫包括一學術論文資料庫、一研討會資料庫、一新聞資料庫、一辭典資料庫、一醫藥新聞資料庫等各種類資料庫,在此僅是舉例並非加以限制,透過擷取該些數據資料庫,以提升該資料分析模組11分析精準度。In this embodiment, the data database connected to the data extraction module 12 includes an academic thesis database, a seminar database, a news database, a dictionary database, a medical news database, etc. The database is just an example and is not limited here. By acquiring the data databases, the analysis accuracy of the data analysis module 11 is improved.

本系統主要是為了能夠以讀者為中心,根據讀者所閱讀的每一篇文章資訊進行分析,以掌握讀者感興趣、喜歡的文章資訊後,提供讀者可能感興趣、喜歡的文章,所以本系統在使用上,係由該閱讀行為回饋模組14經由網路接收該讀者電子裝置20回饋的該篇已閱讀文章資訊,並傳送到該資料分析模組11,該資料分析模組11接收該篇已閱讀文章資訊,並分別執行一語意計算程序以及一詞彙計算程序,以對該篇已閱讀文章資訊進行分析,關於該資料分析模組11執行該語意計算程序的步驟,請參考圖2所示,該語意計算程序係執行以下步驟: 對該篇已閱讀文章資訊進行斷詞分析,以得到一第一斷詞結果(S31);其中,該資料分析模組11係根據該資料儲存模組13所儲存的數據資料庫對該篇已閱讀文章資訊的詞語進行斷詞分析,以刪除低辨識度的斷詞,而得到優化後的第一斷詞結果; 根據一語意向量演算法對該第一斷詞結果進行一高維度向量計算,以產生一語意向量資訊(S32);在本實施例中,該語意向量演算法係為一word2vec類神經演算法,透過該word2vec類神經演算法對該第一斷詞結果進行的高維度向量計算,其中,所進行的高維度向量計算是維度大於400維度以上。The main purpose of this system is to be reader-centered and analyze according to the information of each article read by the reader, in order to grasp the information of the article that the reader is interested in and like, and then provide the article that the reader may be interested in and like. In use, the reading behavior feedback module 14 receives the read article information returned by the reader electronic device 20 via the network, and sends the read article information to the data analysis module 11, which receives the Read the article information and execute a semantic calculation program and a vocabulary calculation program to analyze the read article information. For the steps of the data analysis module 11 to execute the semantic calculation program, please refer to FIG. 2. The semantic calculation program executes the following steps: performing word segmentation analysis on the information of the read article to obtain a first word segmentation result (S31); wherein the data analysis module 11 is based on the data storage module 13 The stored data database performs word segmentation analysis on the words of the read article information to delete the low-recognition word segmentation and optimize the A word segmentation result; a high-dimensional vector calculation is performed on the first word segmentation result according to a semantic vector algorithm to generate semantic vector information (S32); in this embodiment, the semantic vector algorithm is It is a word2vec-type neural algorithm. The word2vec-type neural algorithm is used to calculate the high-dimensional vector of the first word segmentation result. The high-dimensional vector calculation is performed with a dimension greater than 400 dimensions.

藉由該資料分析模組11執行該語意計算程序,對接收到的該篇已閱讀文章資訊進行分析,而得到該語意向量資訊,藉此可以分析讀者所閱讀的文章資訊的語意內容,進而掌握讀者所感興趣閱讀的文章資訊。The data analysis module 11 executes the semantic calculation program, analyzes the received read article information, and obtains the semantic vector information, so that the semantic content of the article information read by the reader can be analyzed, and then Grasp the article information that readers are interested in reading.

在本實施例中,該語意計算程序進一步還包括以下步驟: 將該語意向量資訊與已閱讀文章資訊的總篇數進行一標準化計算,以產生一語意向量標準值(S33);在本實施例中,係將該語意向量資訊進行向量加總後除以已閱讀文章資訊的總篇數,以產生該語意向量標準值,以將文章資訊的語意量化,供公司快速了解讀者所閱讀的文章的語意特徵狀況。In this embodiment, the semantic calculation program further includes the following steps: a standardized calculation is performed on the semantic vector information and the total number of the read article information to generate a standard value of the semantic vector (S33); In the embodiment, the semantic vector information is subjected to vector addition and divided by the total number of articles that have been read to generate a standard value of the semantic vector to quantify the semantic meaning of the article information for the company to quickly understand what the readers want. The semantic characteristics of the articles read.

在本實施例中,當讀者在一設定時間內閱讀多篇文章後,該閱讀行為回饋模組14接收該讀者電子裝置20回饋多篇已閱讀文章資訊,該資料分析模組11將該等已閱讀文章資訊整理為一彙整已閱讀文章資訊,該資料分析模組11將該彙整已閱讀文章資訊執行該語意計算程序,以產生對應的語意向量資訊,並且,進一步將多篇已閱讀文章資訊所對應的語意向量資訊同樣執行該標準化計算,以將多篇已閱讀文章資訊所對應的語意向量資訊除以已閱讀文章資訊的總篇數,而得到對應的語意向量標準值。其中,整理為該彙整已閱讀文章資訊的方式,係為將第二篇已閱讀文章資訊整理在第一篇已閱讀文章資訊的末尾,藉此以頭尾依序相連的方式整理已閱讀文章資訊。In this embodiment, when a reader reads multiple articles within a set time, the reading behavior feedback module 14 receives the reader electronic device 20 to return information about multiple articles that have been read, and the data analysis module 11 The reading article information is organized into a summary of the read article information, and the data analysis module 11 executes the semantic calculation procedure on the aggregated read article information to generate corresponding semantic vector information, and further, the multiple read article information The corresponding semantic vector information also performs the normalization calculation, so as to divide the semantic vector information corresponding to the plurality of read article information by the total number of the read article information to obtain the corresponding semantic vector standard value. Among them, the way to organize the information of the read articles is to organize the information of the second read article at the end of the information of the first read article, so as to organize the information of the read articles in a sequential manner from head to end. .

在本實施例中,當為了評估兩個以上讀者所閱讀的文章是否相似時,該資料分析模組11對分析完該等讀者所閱讀文章的語意向量資訊進行一向量內積法計算,以取得一語意向量相似度資訊,藉此可以掌握不同讀者之間具有閱讀相似文章資訊的關聯性,藉此透過大量分析不同讀者所閱讀過的文章資訊,以將不同讀者分群匹配,而便於推播資訊。In this embodiment, in order to evaluate whether the articles read by two or more readers are similar, the data analysis module 11 performs a vector inner product calculation on the semantic vector information of the articles read by the readers after analysis. Obtaining semantic vector similarity information, so that you can grasp the relevance of reading similar article information between different readers, so as to analyze the article information read by different readers in a large amount to match different readers in groups, which is convenient for inference. Broadcast information.

該資料分析模組11執行該詞彙計算程序的步驟,請參考圖3所示,該詞彙計算程序執行以下步驟: 對該篇已閱讀文章資訊進行斷詞分析,以得到一第二斷詞結果(S41);其中,該資料分析模組11係根據該資料儲存模組13所儲存的數據資料庫對該篇已閱讀文章資訊的詞語進行斷詞分析,以刪除低辨識度的斷詞,而得到優化後的第二斷詞結果; 根據一使用詞彙向量演算法對該第二斷詞結果進行一維度向量計算,以產生一使用詞彙向量資訊(S42);在本實施例中,該語意向量演算法係為一tf-idf類神經演算法,透過該tf-idf類神經演算法對該第一斷詞結果維度向量計算,其中,tf-idf所進行的維度向量計算是不限制維度,並且透過該tf-idf類神經演算法可將重複無義的斷詞篩選出來,而不列入參考。The data analysis module 11 executes the steps of the vocabulary calculation program. Please refer to FIG. 3, the vocabulary calculation program performs the following steps: perform a word segmentation analysis on the information of the read article to obtain a second word segmentation result ( S41); wherein, the data analysis module 11 performs a word segmentation analysis on the words of the read article information according to the data database stored in the data storage module 13 to delete the low-recognition word segmentation and obtain Optimized second word segmentation result; performing a dimensional vector calculation on the second word segmentation result according to a lexical vector algorithm to generate a lexical vector information (S42); in this embodiment, the semantic vector The algorithm is a tf-idf-type neural algorithm, and the tf-idf-type neural algorithm is used to calculate the dimension vector of the first word segmentation result. Among them, the dimension vector calculation performed by tf-idf is not limited in dimension, and The tf-idf-type neural algorithm can filter out repetitive and non-sense word segmentation, which is not included in the reference.

在本實施例中,該詞彙計算程序進一步還包括以下步驟: 將該使用詞彙向量資訊與已閱讀文章資訊的總字數進行一標準化計算,以產生一使用詞彙向量標準值(S43);在本實施例中,係將該使用詞彙向量資訊進行向量加總後除以已閱讀文章資訊的總字數,以產生該使用詞彙向量標準值,以將文章資訊的字數量化,供公司快速了解讀者所閱讀的文章的字數狀況。In this embodiment, the vocabulary calculation program further includes the following steps: A standardized calculation is performed on the total number of words of the used vocabulary vector information and the read article information to generate a standard value of the used vocabulary vector (S43); In the embodiment, the vector information of the vocabulary vector information is summed up and divided by the total number of words of the read article information to generate a standard value of the vocabulary vector information to quantify the words of the article information for the company to quickly understand the reader The word count of the article being read.

本實施例中,當讀者在一設定時間內閱讀了多篇文章後,該閱讀行為回饋模組14接收在該設定時間內,該讀者電子裝置20回饋的多篇已閱讀文章資訊,該資料分析模組11將該等已閱讀文章資訊整理為一彙整已閱讀文章資訊,該資料分析模組11將該彙整已閱讀文章資訊執行該詞彙計算程序,以產生對應的使用詞彙向量資訊,並且進一步將多篇已閱讀文章資訊所對應的使用詞彙向量資訊同樣執行該標準化計算,以將多篇已閱讀文章資訊所對應使用詞彙向量資訊除以已閱讀文章資訊的總字數,而得到對應的使用詞彙向量資訊。In this embodiment, after the reader has read multiple articles within a set time, the reading behavior feedback module 14 receives information about the multiple articles read by the reader's electronic device 20 within the set time, and the data is analyzed. Module 11 organizes the read article information into a summary of the read article information, and the data analysis module 11 executes the vocabulary calculation procedure on the collected read article information to generate corresponding vocabulary vector information, and further converts The vocabulary vector information corresponding to the read article information is also subjected to the normalized calculation, so as to divide the vocabulary vector information corresponding to the read article information by the total number of words of the read article information to obtain the corresponding vocabulary. Vector information.

在本實施例中,當為了評估兩個以上讀者所閱讀的文章是否相似時,該資料分析模組11將該等讀者所閱讀文章分析後的使用詞彙向量資訊進行一向量內積法計算,以取得一使用詞彙向量相似度資訊,藉此可以掌握不同讀者之間具有閱讀相似文章資訊的關聯性,藉由大量分析不同讀者所閱讀過的文章資訊的使用詞彙,可將具有閱讀關聯性的不同讀者分群匹配,而便於推播資訊。In this embodiment, when evaluating whether the articles read by two or more readers are similar, the data analysis module 11 performs a vector inner product calculation using the vocabulary vector information after analyzing the articles read by the readers. Obtain a similarity information using vocabulary vectors, so that you can grasp the relevance of reading similar article information between different readers. By analyzing a large number of vocabularies of article information read by different readers, you can distinguish the differences in reading relevance. Readers are grouped and matched to facilitate the dissemination of information.

當本係統分析完讀者所閱讀的文章資訊後,該資料分析模組11根據該語意向量資訊、該使用詞彙向量資訊產生一篇以上的推薦文章資訊,並且透過該資料通報模組15將該等推薦文章資訊,經由一社群媒體、一網頁、一APP、一電子郵件、一數位影音撥放媒體推播到該等讀者電子裝置20,以供不同讀者參考感興趣的文章資訊。After the system analyzes the article information read by the reader, the data analysis module 11 generates more than one recommended article information according to the semantic vector information and the vocabulary vector information, and uses the data reporting module 15 to The recommended article information is broadcasted to these reader electronic devices 20 through a social media, a web page, an APP, an email, and a digital video playback media for different readers to refer to the article information of interest.

在本實施例中,該社群媒體包括LINE、FACEBOOK、微信、Whatsapp、weibo、Instagram、Twitter、Snapchat等各種社群媒體。In this embodiment, the social media includes various social media such as LINE, Facebook, WeChat, Whatsapp, Weibo, Instagram, Twitter, Snapchat and so on.

在本實施例中,該網頁包括http、html、php、asp、jsp等網頁。In this embodiment, the webpage includes webpages such as http, html, php, asp, and jsp.

在本實施例中,該數位影音撥放媒體包括Youtube、netflex、愛奇藝等,可在影音播放過程中投播文字訊息的數位影音撥放媒體。In this embodiment, the digital video playback media includes Youtube, netflex, iQiyi, etc. Digital video playback media that can broadcast text messages during video playback.

在本實施例中,上述社群媒體、網頁、APP、電子郵件、數位影音撥放媒體等所推播的推薦文章資訊,係可以二維條碼、網頁連結透過該等讀者電子裝置20顯示。In this embodiment, the above-mentioned recommended article information promoted by social media, web pages, APPs, emails, digital audiovisual media, etc., can be displayed through these reader electronic devices 20 through two-dimensional bar codes and web page links.

請參考圖4所示,係舉例以LINE說明,透過該資料通報模組15將讀者可能感興趣的推薦文章資訊,推播到讀者的讀者電子裝置20的LINE,以供讀者參考,並且顯示網頁連結供讀者點擊登入對應的網頁,以閱讀該推薦文章資訊。Please refer to FIG. 4 for an example of LINE. Through the data reporting module 15, the recommended article information that may be of interest to the reader is pushed to the reader ’s electronic device 20 ’s LINE for the reader ’s reference and the web page is displayed. The link is for readers to click on the corresponding webpage to read the recommended article information.

10‧‧‧雲端伺服器10‧‧‧ Cloud Server

11‧‧‧資料分析模組11‧‧‧Data Analysis Module

12‧‧‧資料擷取模組12‧‧‧Data Acquisition Module

13‧‧‧資料儲存模組13‧‧‧Data Storage Module

14‧‧‧閱讀形為回饋模組14‧‧‧Reading as feedback module

15‧‧‧資料通報模組15‧‧‧ Data Report Module

20‧‧‧讀者電子裝置20‧‧‧ Reader Electronic Device

圖1 本發明較佳實施例的系統架構方塊圖。 圖2 本發明較佳實施例的語意計算程序的流程圖。 圖3 本發明較佳實施例的詞彙計算程序的流程圖。 圖4 本發明較佳實施例的提供文章資訊的示意圖。FIG. 1 is a block diagram of a system architecture according to a preferred embodiment of the present invention. FIG. 2 is a flowchart of a semantic calculation program according to a preferred embodiment of the present invention. FIG. 3 is a flowchart of a vocabulary calculation program according to a preferred embodiment of the present invention. FIG. 4 is a schematic diagram of providing article information according to a preferred embodiment of the present invention.

Claims (8)

一種以讀者為中心進行文章分析的語意特徵分析系統,包括:一個以上的讀者電子裝置,用以供讀者閱讀文章資訊;一雲端伺服器,經由網路與該讀者電子裝置連接,以及經過網路連線到一個以上的數據資料庫以擷取數據資料;其中,該雲端伺服器接收該讀者電子裝置回饋一篇以上的已閱讀文章資訊,並且執行一語意計算程序,該語意計算程序執行以下步驟:對該篇已閱讀文章資訊進行一斷詞分析,以取得一第一斷詞結果;根據一語意向量演算法對該第一斷詞結果進行一高維度向量計算,以產生一語意向量資訊;以及將該語意向量資訊與已閱讀文章資訊的總篇數進行一標準化計算,以產生一語意向量標準值,其中,是將該語意向量資訊進行向量加總後除以已閱讀文章資訊的總篇數,以產生該語意向量標準值,以將文章資訊的語意量化。A reader-centered semantic feature analysis system for article analysis includes: more than one reader electronic device for readers to read article information; a cloud server connected to the reader electronic device via the network and via the network Connect to more than one data database to retrieve data; wherein the cloud server receives the reader's electronic device to return more than one read article information, and executes a semantic calculation program, which performs the following steps : Perform a word segmentation analysis on the information of the read article to obtain a first word segmentation result; perform a high-dimensional vector calculation on the first word segmentation result according to a semantic vector algorithm to generate a semantic vector Information; and a standardized calculation of the semantic vector information and the total number of articles that have been read to generate a standard value of the semantic vector, where the semantic vector information is summed by the vector and divided by the read The total number of article information to generate the standard value of the semantic vector to quantify the semantic meaning of the article information. 如請求項1所述之以讀者為中心進行文章分析的語意特徵分析系統,其中:當該雲端伺服器接收到多篇已閱讀文章資訊,將該等已閱讀文章資訊整理為一彙整已閱讀文章資訊,並且將該彙整已閱讀文章資訊執行該語意計算程序,以產生對應的語意向量資訊;其中,該雲端伺服器將該彙整已閱讀文章資訊所產生的語意向量資進行該標準化計算,以產生對應的語意向量標準值。The reader-centered semantic analysis system for article analysis as described in claim 1, wherein: when the cloud server receives information about multiple articles that have been read, the information about the articles that have been read is compiled into a collection of articles that have already been read Information, and execute the semantic calculation process for the aggregated read article information to generate corresponding semantic vector information; wherein the cloud server performs the standardized calculation on the semantic vector data generated by the aggregated read article information, To generate the corresponding standard value of the semantic vector. 如請求項2所述之以讀者為中心進行文章分析的語意特徵分析系統,其中:當該雲端伺服器接收到不同讀者電子裝置回饋的已閱讀文章資訊,則將不同讀者裝置的與閱讀文章資訊執行該語意計算程序,以分別產生對應的語意向量資訊,並且將不同讀者裝置的語意向量資訊進行一向量內積法計算,以取得一語意向量相似度資訊。The reader-centered semantic analysis system for article analysis as described in claim 2, wherein: when the cloud server receives the read article information returned by different reader electronic devices, the different reader devices and the read article information are compared. The semantic calculation program is executed to generate corresponding semantic vector information, and the semantic vector information of different reader devices is calculated by a vector inner product method to obtain the semantic vector similarity information. 一種以讀者為中心進行文章分析的語意特徵分析系統,包括:一個以上的讀者電子裝置,用以供讀者閱讀文章資訊;一雲端伺服器,經由網路與該讀者電子裝置連接,以及經過網路連線到一個以上的數據資料庫以擷取數據資料;其中,該雲端伺服器接收該讀者電子裝置回饋一篇以上的已閱讀文章資訊,並且執行一詞彙計算程序,該詞彙計算程序執行以下步驟:對該篇已閱讀文章資訊進行一斷詞分析,以取得一第二斷詞結果;根據一使用詞彙向量演算法對該第二斷詞結果進行一維度向量計算,以產生一使用詞彙向量資訊;以及將該使用詞彙向量資訊與已閱讀文章資訊的總字數進行一標準化計算,以產生一使用詞彙向量標準值,其中,是將該使用詞彙向量資訊進行向量加總後除以已閱讀文章資訊的總字數,以產生該使用詞彙向量標準值,以將文章資訊的字數量化。A reader-centered semantic feature analysis system for article analysis includes: more than one reader electronic device for readers to read article information; a cloud server connected to the reader electronic device via the network and via the network Connect to more than one data database to retrieve data; the cloud server receives the reader's electronic device to return more than one read article information, and executes a vocabulary calculation program, which performs the following steps : Perform a word segmentation analysis on the read article information to obtain a second word segmentation result; perform a one-dimensional vector calculation on the second word segmentation result based on a vocabulary vector algorithm to generate a vocabulary vector information ; And performing a normalized calculation of the total word count of the used vocabulary vector information and the read article information to generate a standard value of the used vocabulary vector, where the vector sum of the used vocabulary vector information is divided by the read article The total number of words of the information to generate the standard value of the vocabulary vector used to convert the article information Words quantified. 如請求項4所述之以讀者為中心進行文章分析的語意特徵分析系統,其中:當該雲端伺服器接收到多篇已閱讀文章資訊,將該等已閱讀文章資訊整理為一彙整已閱讀文章資訊,並且將該彙整已閱讀文章資訊執行該詞彙計算程序,以產生對應的使用詞彙向量資訊;其中,該雲端伺服器將該彙整已閱讀文章資訊所產生的使用詞彙向量資訊進行該標準化計算,以產生對應的使用詞彙向量標準值。The reader-centered semantic analysis system for article analysis as described in claim 4, wherein: when the cloud server receives information about multiple articles that have been read, the information about the articles that have been read is compiled into a collection of articles that have already been read Information, and execute the vocabulary calculation process for the aggregated read article information to generate corresponding vocabulary vector information; wherein the cloud server performs the standardized calculation on the vocabulary vector information generated from the aggregated read article information, To generate the corresponding standard value of the vocabulary vector. 如請求項5所述之以讀者為中心進行文章分析的語意特徵分析系統,其中:當該雲端伺服器接收到不同讀者電子裝置回饋的已閱讀文章資訊,則將不同讀者裝置的與閱讀文章資訊執行該詞彙計算程序,以分別產生對應的使用詞彙向量資訊,並且將不同讀者裝置的使用詞彙向量資訊進行一向量內積法計算,以取得一使用詞彙向量相似度資訊。The reader-centered semantic analysis system for article analysis as described in claim 5, wherein: when the cloud server receives the read article information returned by different reader electronic devices, the reader device and the read article information are compared. The vocabulary calculation program is executed to generate corresponding used vocabulary vector information, and the used vocabulary vector information of different reader devices is calculated by a vector inner product method to obtain a used vocabulary vector similarity information. 一種以讀者為中心進行文章分析的語意特徵分析系統,包括:一個以上的讀者電子裝置,用以供讀者閱讀文章資訊;一雲端伺服器,經由網路與該讀者電子裝置連接,以及經過網路連線到一個以上的數據資料庫以擷取數據資料;其中,該雲端伺服器接收該讀者電子裝置回饋一篇以上的已閱讀文章資訊,並且執行一語意計算程序以及一詞彙計算程序,該語意計算程序執行以下步驟:對該篇已閱讀文章資訊進行一斷詞分析,以取得一第一斷詞結果;根據一語意向量演算法對該第一斷詞結果進行一高維度向量計算,以產生一語意向量資訊;以及將該語意向量資訊與已閱讀文章資訊的總篇數進行一標準化計算,以產生一語意向量標準值,其中,是將該語意向量資訊進行向量加總後除以已閱讀文章資訊的總篇數,以產生該語意向量標準值,以將文章資訊的語意量化;其中,該詞彙計算程序執行以下步驟:對該篇已閱讀文章資訊進行一斷詞分析,以取得一第二斷詞結果;根據一使用詞彙向量演算法對該第二斷詞結果進行一維度向量計算,以產生一使用詞彙向量資訊;以及將該使用詞彙向量資訊與已閱讀文章資訊的總字數進行一標準化計算,以產生一使用詞彙向量標準值,其中,是將該使用詞彙向量資訊進行向量加總後除以已閱讀文章資訊的總字數,以產生該使用詞彙向量標準值,以將文章資訊的字數量化。A reader-centered semantic feature analysis system for article analysis includes: more than one reader electronic device for readers to read article information; a cloud server connected to the reader electronic device via the network and via the network Connect to more than one data database to retrieve data; wherein the cloud server receives the reader's electronic device to return more than one read article information, and executes a semantic calculation procedure and a vocabulary calculation procedure, the semantic meaning The calculation program performs the following steps: performing a word segmentation analysis on the read article information to obtain a first word segmentation result; performing a high-dimensional vector calculation on the first word segmentation result according to a semantic vector algorithm, and Generating semantic vector information; and performing a normalized calculation of the semantic vector information and the total number of articles read to generate a standard value of the semantic vector, wherein the semantic vector information is summed into a vector Divide by the total number of articles that have been read to generate the standard value of the semantic vector to Quantification; where the vocabulary calculation program performs the following steps: perform a word segmentation analysis on the read article information to obtain a second word segmentation result; perform a second word segmentation result based on a vocabulary vector algorithm Dimension vector calculation to generate a vocabulary vector information; and a standardized calculation of the total vocabulary vector information and the total number of words of the read article information to generate a vocabulary vector standard value, where the vocabulary vector is used The information is summed up by vectors and divided by the total number of words of the read article information to generate a standard value of the vocabulary vector used to quantify the words of the article information. 如請求項7所述之以讀者為中心進行文章分析的語意特徵分析系統,其中:該雲端伺服器根據該語意向量資訊、該使用詞彙向量資訊產生一篇以上的推薦文章資訊,並且透過一社群媒體、一網頁、一APP、一電子郵件或一數位影音撥放媒體推播到該等讀者電子裝置。The reader-centered semantic feature analysis system as described in claim 7, wherein: the cloud server generates one or more recommended article information according to the semantic vector information and the vocabulary vector information, and Social media, a webpage, an APP, an email, or a digital video playback media are broadcast to these reader electronic devices.
TW107129087A 2018-08-21 2018-08-21 Semantic feature analysis system for article analysis based on readers TWI676110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW107129087A TWI676110B (en) 2018-08-21 2018-08-21 Semantic feature analysis system for article analysis based on readers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW107129087A TWI676110B (en) 2018-08-21 2018-08-21 Semantic feature analysis system for article analysis based on readers

Publications (2)

Publication Number Publication Date
TWI676110B true TWI676110B (en) 2019-11-01
TW202009746A TW202009746A (en) 2020-03-01

Family

ID=69189191

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107129087A TWI676110B (en) 2018-08-21 2018-08-21 Semantic feature analysis system for article analysis based on readers

Country Status (1)

Country Link
TW (1) TWI676110B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI747246B (en) * 2020-04-24 2021-11-21 孫光天 A method for comprehension of text semantics based on neural network computing module and case change grammar

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130179252A1 (en) * 2012-01-11 2013-07-11 Yahoo! Inc. Method or system for content recommendations
CN103678620A (en) * 2013-12-18 2014-03-26 国家电网公司 Knowledge document recommendation method based on user historical behavior features
TW201508508A (en) * 2013-08-19 2015-03-01 Hon Hai Prec Ind Co Ltd System and method for recommending files
CN105022840A (en) * 2015-08-18 2015-11-04 新华网股份有限公司 News information processing method, news recommendation method and related devices
CN107133315A (en) * 2017-05-03 2017-09-05 有米科技股份有限公司 A kind of smart media based on semantic analysis recommends method
CN107832306A (en) * 2017-11-28 2018-03-23 武汉大学 A kind of similar entities method for digging based on Doc2vec

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130179252A1 (en) * 2012-01-11 2013-07-11 Yahoo! Inc. Method or system for content recommendations
TW201508508A (en) * 2013-08-19 2015-03-01 Hon Hai Prec Ind Co Ltd System and method for recommending files
CN103678620A (en) * 2013-12-18 2014-03-26 国家电网公司 Knowledge document recommendation method based on user historical behavior features
CN105022840A (en) * 2015-08-18 2015-11-04 新华网股份有限公司 News information processing method, news recommendation method and related devices
CN107133315A (en) * 2017-05-03 2017-09-05 有米科技股份有限公司 A kind of smart media based on semantic analysis recommends method
CN107832306A (en) * 2017-11-28 2018-03-23 武汉大学 A kind of similar entities method for digging based on Doc2vec

Also Published As

Publication number Publication date
TW202009746A (en) 2020-03-01

Similar Documents

Publication Publication Date Title
CN107679211B (en) Method and device for pushing information
Verma et al. Big data analytics: Challenges and applications for text, audio, video, and social media data
US9892109B2 (en) Automatically coding fact check results in a web page
JP2020509449A (en) Method and device for warning
CN112889042A (en) Identification and application of hyper-parameters in machine learning
US20140337328A1 (en) System and method for retrieving and presenting concept centric information in social media networks
CN108021651B (en) Network public opinion risk assessment method and device
US11423096B2 (en) Method and apparatus for outputting information
CN108959329B (en) Text classification method, device, medium and equipment
CN109325121B (en) Method and device for determining keywords of text
CN113806588B (en) Method and device for searching video
CN111427974A (en) Data quality evaluation management method and device
CN110737824B (en) Content query method and device
CN113836128A (en) Abnormal data identification method, system, equipment and storage medium
US9020962B2 (en) Interest expansion using a taxonomy
TWI676110B (en) Semantic feature analysis system for article analysis based on readers
CN113220974A (en) Click rate prediction model training and search recall method, device, equipment and medium
CN117391824A (en) Method and device for recommending articles based on large language model and search engine
CN112989118A (en) Video recall method and device
CN110971973A (en) Video pushing method and device and electronic equipment
CN113535939A (en) Text processing method and device, electronic equipment and computer readable storage medium
US9785404B2 (en) Method and system for analyzing data in artifacts and creating a modifiable data network
CN115269998A (en) Information recommendation method and device, electronic equipment and storage medium
CN114550157A (en) Bullet screen gathering identification method and device
CN114742573A (en) Marketing analysis method and device, electronic equipment and storage medium