TWI556121B - Gender prediction method by using webpage surfing behavior - Google Patents

Gender prediction method by using webpage surfing behavior Download PDF

Info

Publication number
TWI556121B
TWI556121B TW104128081A TW104128081A TWI556121B TW I556121 B TWI556121 B TW I556121B TW 104128081 A TW104128081 A TW 104128081A TW 104128081 A TW104128081 A TW 104128081A TW I556121 B TWI556121 B TW I556121B
Authority
TW
Taiwan
Prior art keywords
website
blog
prediction method
gender
user
Prior art date
Application number
TW104128081A
Other languages
Chinese (zh)
Other versions
TW201709088A (en
Inventor
呂承諭
施晨揚
Original Assignee
優像數位媒體科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 優像數位媒體科技股份有限公司 filed Critical 優像數位媒體科技股份有限公司
Priority to TW104128081A priority Critical patent/TWI556121B/en
Priority to CN201610723750.0A priority patent/CN106484762A/en
Application granted granted Critical
Publication of TWI556121B publication Critical patent/TWI556121B/en
Publication of TW201709088A publication Critical patent/TW201709088A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Description

利用網頁瀏覽行為之性別預測方法 Gender prediction method using web browsing behavior

下列敘述是有關於一種性別預測方法,特別是有關於一種在部落格網站上利用使用者之網頁瀏覽行為來進行預測性別之方法。 The following narrative is about a method of gender prediction, and in particular, a method for predicting gender using a user's web browsing behavior on a blog website.

部落格網站通常指以時間排序,並時時更新的網站,其多半以日記型態存在並且包含文章及迴響功能。網誌的主軸可以是個人生活,甚至到政治話題,而其主題可以是一個小眾的議題,或是大範圍的話題。 Blog sites usually refer to sites that are sorted by time and updated from time to time, most of which exist in a diary and contain articles and reverberation. The main axis of a blog can be a personal life, even a political topic, and its theme can be a niche issue or a wide range of topics.

在目前較知名的部落格網站中,其網頁上之廣告收入主要為其收入來源之一,而若是要吸引著名的廣告商加入其網站,則此網站必須要有極高的點閱率或是人氣。然而,無論是網站中的廣告或是其內部之網站內容,目前大多是以單向播放方式來進行,即將網頁之檔案在一定的時間週期內進行反覆地播放,然而此種方式卻極有可能招致使用者之反感,而使得點閱率下降。舉例來說,當一男性使用者瀏覽部落格網站時,看到的卻是有關女性的文章內容或是廣告資訊等,此男性使用者便可能不耐煩地直接跳離此部落格網站而不繼續瀏覽。 In the current well-known blog sites, the advertising revenue on the webpage is mainly one of its sources of income, and if it is to attract well-known advertisers to join its website, the website must have a very high click rate or Popularity. However, whether it is the advertisement on the website or the content of the website inside it, most of them are currently played in one-way playback mode, that is, the file of the webpage is repeatedly played back in a certain period of time, but this way is very likely Inducing the user's resentment, and reducing the click rate. For example, when a male user browses a blog website and sees the content of the article or advertisement information about the woman, the male user may jump out of the blog site impatiently without continuing. Browse.

歸究發生此情況之最主要原因在於,此網站無法得知在電腦前方 的使用者之性別為何,進而根據其性別來播放相關聯之廣告或文章,而使得使用者能願意停留在此部落格網站中。也因為如此,導致目前部份的部落格網站可能偏向於以一特定主題為主,如僅吸引部份的少女族群或是上班族群,無法廣泛的適用於不同的使用者。 The main reason for the occurrence of this situation is that this website cannot be known in front of the computer. The gender of the user, and then play the associated advertisement or article according to their gender, so that the user can stay in the blog website. Because of this, some of the current blog sites may be biased towards a specific theme, such as only attracting a part of the teenage ethnic group or the working group, which cannot be widely applied to different users.

如此一來,對於部落格網站來說,其廣告商的來源便只能拘限於一個領域,將導致其收入減少。而對於使用者而言,其也只能看到由部落格網站固定擺放的文章內容以及廣告。再者,對於廣告商而言,其行銷通路亦無法有效地進行擴增,而造成三輸之局面產生。 As a result, for blog sites, the source of their advertisers can only be restricted to one area, which will lead to a decrease in their income. For the user, it can only see the content of the article and the advertisements that are fixed by the blog site. Moreover, for advertisers, their marketing channels cannot be effectively expanded, resulting in a three-loss situation.

因此,本發明人提出一種透過部落格網站來進行性別預測的方法以解決以上的問題。 Therefore, the inventors have proposed a method of performing gender prediction through a blog website to solve the above problems.

有鑑於上述習知之問題,本發明係用以解決無法得知瀏覽一部落格網站之使用者性別之問題。 In view of the above-mentioned problems, the present invention is to solve the problem that it is impossible to know the gender of a user browsing a blog website.

有鑑於上述習知之問題,本發明係用以解決無法針對使用者播放適當訊息及廣告內容之問題。 In view of the above-mentioned problems, the present invention is to solve the problem that the appropriate information and advertisement content cannot be played for the user.

基於上述目的,本發明係提供一種利用網頁瀏覽行為之性別預測方法,其適用於一部落格網站,此部落格網站係架設於一網站伺服器上,網站伺服器係分別儲存複數個第一使用者之性別資料以及複數個第一使用者登入部落格網站時之一網頁瀏覽資訊,性別預測方法包含下列步驟:離散化複數個第一使用者之連續特徵以形成至少一離散特徵,連續特徵可包含一連續數值。根據至少一離散特徵以及複數個第一特徵產生複數個測試特徵,複數個第一特徵包含文章類別、登入來源網址、文章瀏覽資訊及登入部落格網站時之前一網站資訊。計算對應複數個測試特徵之複數個男性比例以及複數個女性比例。將第二使用者瀏覽部落格網站時之複數個第二 特徵代入至分類器,由分類器依據複數個男性比例以及複數個女性比例計算第二使用者之一性別比例。複數個第一使用者以及複數個第二使用者係透過部落格網站連續瀏覽網頁兩次以上。 Based on the above object, the present invention provides a gender prediction method using web browsing behavior, which is applicable to a blog website, which is installed on a web server, and the website server stores a plurality of first uses respectively. The gender information of the person and one of the web browsing information of the plurality of first users when logging into the blog website, the gender prediction method comprises the steps of: discretizing a plurality of consecutive features of the first user to form at least one discrete feature, the continuous feature may Contains a continuous value. Generating a plurality of test features according to the at least one discrete feature and the plurality of first features, the plurality of first features including an article category, a login source URL, an article browsing information, and a previous website information when logging into the blog site. Calculate the proportion of males and the proportion of females corresponding to a plurality of test characteristics. Multiple seconds when the second user browses the blog site The feature is substituted into the classifier, and the classifier calculates the sex ratio of one of the second users according to the plurality of male ratios and the plurality of female proportions. The plurality of first users and the plurality of second users continuously browse the webpage twice or more through the blog website.

較佳地,本發明之性別預測方法更包含利用一拉普拉斯平滑(Laplace Smoothing)以正規化男性比例以及女性比例。 Preferably, the gender prediction method of the present invention further comprises using a Laplace Smoothing to normalize the male ratio and the female proportion.

較佳地,分類器可包含單純貝氏分類器(Naïve Bayes Classifier)。 Preferably, the classifier may comprise a Naïve Bayes Classifier.

較佳地,登入來源網址可包含一國別資訊或一組織資訊。 Preferably, the login source URL may include a country information or an organization information.

較佳地,文章瀏覽資訊可包含一作者資訊。 Preferably, the article browsing information may include an author information.

較佳地,連續特徵可為瀏覽部落格網站之一時間。 Preferably, the continuous feature can be one of the time to browse the blog website.

較佳地,本發明之性別預測方法更包含利用一假設檢驗方法以減少複數個第一特徵之個數,假設檢驗方法可為一卡方檢驗測試(chi-square test)。 Preferably, the gender prediction method of the present invention further comprises using a hypothesis test method to reduce the number of the plurality of first features, and the hypothesis test method can be a chi-square test.

較佳地,登入部落格網站時之前一網站資訊可包含不同於部落格網站之一外部網頁以及在部落格網站內之另一網頁。 Preferably, the previous website information may include an external web page different from one of the blog websites and another web page within the blog website when logging into the blog website.

較佳地,本發明之性別預測方法更包含根據性別比例以即時變動部落格網站內廣告之內容、版面之設計以及推薦之文章。 Preferably, the gender prediction method of the present invention further comprises an instant change of the content of the advertisement in the blog website, the design of the layout, and the recommended article according to the gender ratio.

100‧‧‧部落格網站 100‧‧‧Blog Website

101‧‧‧網站伺服器 101‧‧‧Web server

20‧‧‧第二使用者 20‧‧‧ second user

21‧‧‧第二特徵 21‧‧‧ second feature

22‧‧‧更新頁面 22‧‧‧Update page

S11~S14‧‧‧步驟 S11~S14‧‧‧Steps

本發明之上述及其他特徵及優勢將藉由參照附圖詳細說明其例示性實施例而變得更顯而易知,其中:第1圖係為根據本發明之性別預測方法之流程圖。 The above and other features and advantages of the present invention will become more apparent from the detailed description of the embodiments of the invention.

第2圖係為根據本發明另一實施例之性別預測方法之第一示意圖。 2 is a first schematic diagram of a gender prediction method according to another embodiment of the present invention.

第3圖係為根據本發明另一實施例之性別預測方法之第二示意 圖。 Figure 3 is a second schematic diagram of a gender prediction method according to another embodiment of the present invention. Figure.

為利 貴審查員瞭解本發明之特徵、內容與優點及其所能達成之功效,茲將本發明配合附圖,並以實施例之表達形式詳細說明如下,而其中所使用之圖式,其主旨僅為示意及輔助說明書之用,未必為本發明實施後之真實比例與精準配置,故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍。 The features, the contents and advantages of the present invention, and the advantages thereof, will be understood by the present invention. The present invention will be described in detail with reference to the accompanying drawings, The use of the present invention is not intended to be a limitation of the scope of the present invention, and the scope of the present invention is not limited by the scope and configuration of the accompanying drawings.

本發明之優點、特徵以及達到之技術方法將參照例示性實施例及所附圖式進行更詳細地描述而更容易理解,且本發明或可以不同形式來實現,故不應被理解僅限於此處所陳述的實施例,相反地,對所屬技術領域具有通常知識者而言,所提供的實施例將使本揭露更加透徹與全面且完整地傳達本發明的範疇,且本發明將僅為所附加的申請專利範圍所定義。 The advantages and features of the present invention, as well as the technical methods of the present invention, are described in more detail with reference to the exemplary embodiments and the accompanying drawings, and the present invention may be implemented in various forms and should not be construed as limited thereby. The embodiments of the present invention, and the embodiments of the present invention are intended to provide a more complete and complete and complete disclosure of the scope of the present invention, and The scope of the patent application is defined.

請參閱第1圖,其係為根據本發明之性別預測方法之流程圖。如圖所示,此利用網頁瀏覽行為之性別預測方法係適用於一部落格網站,此部落格網站係安裝於一網站伺服器上,網站伺服器可為一工作站主機或是一電腦主機且其可用儲存複數個第一使用者之性別資料以及複數個第一使用者登入部落格網站時之一網頁瀏覽資訊,其性別資料可以在第一使用者加入此部落格網站之會員時加以記錄,而在每一次此第一使用者登入此部落格網站之後,在其瀏覽網頁時同時儲存此第一使用者之瀏覽記錄,如其IP位置、點選之網頁等等。而本發明之性別預測方法包含下列步驟。 Please refer to Fig. 1, which is a flow chart of a gender prediction method according to the present invention. As shown in the figure, the gender prediction method using web browsing behavior is applied to a blog website, which is installed on a web server, and the web server can be a workstation host or a computer host and its It is possible to store a plurality of first user gender data and one of the plurality of first users logging into the blog website, and the gender information can be recorded when the first user joins the member of the blog website, and After each time the first user logs into the blog website, the first user's browsing history, such as its IP location, the selected web page, and the like, are also stored when the user browses the web page. The gender prediction method of the present invention comprises the following steps.

步驟S11離散化複數個第一使用者之一連續特徵以形成至少一離散特徵,其中此連續特徵可包含一連續數值,如登入之時間或是瀏覽網頁之時間總和。 Step S11 discretizes one of the plurality of first users to form at least one discrete feature, wherein the continuous feature may include a continuous value, such as the time of login or the sum of time of browsing the webpage.

步驟S12係根據至少一離散特徵以及複數個第一特徵產生複數個測試特徵,其中此複數個第一特徵可包含一文章類別、一登入來源網址、一文章瀏覽資訊及登入部落格網站時之前一網站資訊,登入來源網址可包含一國別資訊(如.tw或cn)或一組織資訊(如com或org),文章瀏覽資訊可包含一作者資訊,登入部落格網站時之前一網站資訊可包含不同於部落格網站之一外部網頁,即透過外部網頁連結至此部落格網站,以及在部落格網站內之另一網頁,即透過此部落格網站之另一網頁而連結此目前瀏覽的網頁。 Step S12: generating a plurality of test features according to the at least one discrete feature and the plurality of first features, wherein the plurality of first features may include an article category, a login source URL, an article browsing information, and a previous login to the blog site Website information, the login source URL can include a country information (such as .tw or cn) or an organization information (such as com or org). The article browsing information can include an author information. When logging in to the blog website, the previous website information can include An external webpage that is different from one of the blog sites, that is, an external webpage links to the blog site, and another webpage within the blog site, that is, another webpage of the blog site is linked to the currently viewed webpage.

步驟S13係計算對應複數個測試特徵之複數個男性比例以及複數個女性比例,並將複數個男性比例以及複數個女性比例代入一分類器。 Step S13 is to calculate a plurality of male proportions and a plurality of female proportions corresponding to the plurality of test characteristics, and substitute a plurality of male proportions and a plurality of female proportions into a classifier.

步驟S14係將一第二使用者瀏覽部落格網站時之複數個第二特徵代入至一分類器,並由分類器依據複數個男性比例以及複數個女性比例計算第二使用者之一性別比例。其中此分類器可以為一單純貝氏分類器(Naïve Bayes Classifier),且此複數個第二特徵係包含於複數個測試特徵內,且複數個第一使用者以及複數個第二使用者係透過部落格網站連續瀏覽網頁兩次以上。 Step S14 is to substitute a plurality of second features when a second user browses the blog website into a classifier, and the classifier calculates a gender ratio of the second user according to the plurality of male ratios and the plurality of female proportions. The classifier can be a Naïve Bayes Classifier, and the plurality of second features are included in the plurality of test features, and the plurality of first users and the plurality of second users are transmitted through The blog site continuously browses the web page twice or more.

在本發明之性別預測方法中更可以包含根據步驟14所計算出之性別比例以即時變動部落格網站內廣告之內容、版面之設計以及推薦之文章。 The gender prediction method of the present invention may further include the gender ratio calculated according to step 14 to instantly change the content of the advertisement in the blog website, the design of the layout, and the recommended article.

本發明之性別預測方法更可包含利用一假設檢驗方法以減少複數個第一特徵之個數,其中此假設檢驗方法可為一卡方檢驗測試(chi-square test)。 The gender prediction method of the present invention may further comprise using a hypothesis testing method to reduce the number of the plurality of first features, wherein the hypothesis testing method may be a chi-square test.

透過上述可以得知,本發明之性別預測方法可以透過已註冊在部落格網站上會員之網頁瀏覽行為,來預測瀏覽此部落格網站之非會員使用 者之性別,並根據所預測到之性別立即產生與使用者性別有關之內容、廣告等等,以具體改進傳統部落格網站無法針對使用者播放適當訊息及廣告內容之問題。 As can be seen from the above, the gender prediction method of the present invention can predict the non-member usage of browsing the blog website through the web browsing behavior of the members registered on the blog website. The gender of the person, and immediately generate content, advertisements, etc. related to the user's gender based on the predicted gender, to specifically improve the problem that the traditional blog website cannot play appropriate information and advertisement content for the user.

本發明之另一較佳實施例係以一部落格網站之登入會員及非登入會員來舉例實施,其中此部落格網站係架設於一網站伺服器上。此網站伺服器中記錄了會員登入此部落格網站時之瀏覽資訊,包含會員之姓名、性別、年紀、工作性質、年紀以及其國別、登入之ip位址、使用之瀏覽器、在此部落格網站所瀏覽之頁面、瀏覽之部落格文章及其類別、登入時間、瀏覽一網頁之部落格文章之前後網頁等等資訊。 Another preferred embodiment of the present invention is embodied by a login member and a non-login member of a blog site, wherein the blog site is erected on a web server. This website server records the browsing information of the member when logging into the blog website, including the member's name, gender, age, nature of work, age and country, the ip address of the login, the browser used, and the tribe in the tribe. The pages viewed by the website, the blog posts and their categories, the login time, the blog page before and after browsing a web page, and so on.

表一說明五個會員(p1~p5)登入此部落格網站時所被記錄之瀏覽文章類別與登入時間,如表一所示,p1、p2為男性使用者,而p3~p5則均為女性使用者。值得一提的是,表一之內容為已經過離散化之記錄結果,此離散化之方式為根據將一天的登入時間切割為24等分,即以小時來進行區分,並記錄每小時內會員所瀏覽之文章類別,如p1之男性使用者曾在06:00AM以及07:00AM之時間區段瀏覽有關「財金」及「運動」類別之文章,而p3之女性使用者曾在03:00PM及04:00PM分別瀏覽「美妝」以及「美妝」與「旅遊」類別之文章。然表一中所挑選之「瀏覽文章類別」與「登入時間」僅為舉例實施,但不以此為限,亦可以以其他使用者瀏覽行為之特徵來舉例實施,如所觀看部落格文章之作者及類別、瀏覽一網頁時之前一個網頁資訊或是國別(即以ip來進行推定)。 Table 1 shows the categories and login times of the five members (p1~p5) recorded when logging into this blog site. As shown in Table 1, p1 and p2 are male users, while p3~p5 are female. user. It is worth mentioning that the content of Table 1 is the result of the discretization record. The discretization method is based on cutting the login time of one day into 24 equal parts, that is, distinguishing by hour, and recording the member every hour. For the category of articles viewed, for example, male users of p1 have browsed articles on the "finance" and "sports" categories in the time zone of 06:00AM and 07:00AM, while the female users of p3 were at 03:00PM. And at 04:00PM, we will browse the articles on "Makeup" and "Makeup" and "Travel" respectively. However, the "Browse Article Category" and "Login Time" selected in Table 1 are only examples, but they are not limited to this. They can also be implemented by the characteristics of other users' browsing behaviors, such as the blog posts viewed. Author and category, a web page information or country before browsing a web page (ie, presumping with ip).

表二為利用表一所整理出來之特徵特性表,其中以特徵「hour_6」之特徵來說,在男性特性”2/2”係表示”在2個男性會員中,此2個男性會員均有在6:00時登入”,而在女性特性”1/3”則表示”在3個女性會員中,只有一個會員會在6:00時登入”,此計算方式即計算「瀏覽文章類別」與「登入時間」相對於「性別」之條件機率值,其他特徵與其所對應之男性特性以及女性特性之表示內容係同於特徵「hour_6」與其所對應之男性特性以及女性特性之內容,故在此不進行贅述。 Table 2 shows the characteristic characteristics table compiled by Table 1. Among the characteristics of the feature "hour_6", the male characteristics "2/2" means "in 2 male members, both male members have Login at 6:00, and in the female feature "1/3" means "only one of the 3 female members will log in at 6:00", this calculation method is to calculate the "Browse Article Category" and The "login time" relative to the "gender" conditional probability value, the other characteristics of the corresponding male and female characteristics are the same as the feature "hour_6" and its corresponding male and female characteristics, so here Do not repeat them.

進一步地,可將表二進行一拉普拉斯平滑(Laplace Smoothing)以正規化男性比例以及女性比例以形成如下之表三,此拉普拉斯平滑正規化方式係利用分子加1以及分母加2之方式來加以進行正規化運算。 Further, Table 2 can be subjected to Laplace Smoothing to normalize the male ratio and the female proportion to form Table 3 below. This Laplacian smooth normalization method utilizes the numerator plus 1 and the denominator plus 2 ways to perform normalization operations.

在建立表三內之男性比例以及女性比例之後,網站伺服器可儲存此結果於其內之儲存單元內,其中此儲存單元可以為一實體記憶體或是一硬碟。當有一非會員之使用者瀏覽此部落格網站時,網站伺服器可以利用表三所儲存之男性比例以及女性比例來預測此非會員使用者之性別。 After establishing the male ratio and the female ratio in Table 3, the web server can store the result in the storage unit therein, wherein the storage unit can be a physical memory or a hard disk. When a non-member user browses the blog site, the web server can use the male ratio and the female ratio stored in Table 3 to predict the gender of the non-member user.

舉例來說,當一非會員使用者p6瀏覽此部落格網站時,若是其分別在06:30AM以及07:15AM瀏覽了運動以及旅遊之文章類別時,則此時使用者p6被記錄之「瀏覽文章類別」與「登入時間」可先被離散化成如表四所示。 For example, when a non-member user p6 browses the blog website, if the user browses the article category of sports and travel at 06:30AM and 07:15AM, then the user p6 is recorded as "browsing". The article category and login time can be first discretized as shown in Table 4.

在本實施例中係利用單純貝氏分類器(Naïve Bayes Classifier)作為分類器使用,其公式如下所述:,P(x|y j )=Π i=n (P(i|y j x i +(1-x i )(1-P(i|y i ))),i {1,2,~n} In the present embodiment, a Naïve Bayes Classifier is used as a classifier, and the formula is as follows: P ( x | y j ) = Π i = n ( P ( i | y j ) × x i +(1- x i )(1- P ( i | y i ))), i {1,2,~ n }

其中,為了方便計算,我們將其以上公式取對數以進行計算,而根據以上兩個公式,我們可以分別計算出使用者p6為男性之分數為-2.3(即log(2/5)+log(3/4)+log(2/4)+log(1-2/4)+log(1-1/4)+log(1-1/4)+log(1-1/4)+log(1-2/4)+log(3/4)+log(2/4)+log(1-1/4)~=-2.3),其中第一個(2/5)表示五個使用者中為男性之機率,並取出六點、七點、運動及旅遊所對應之值,及分別為3/4、2/4、3/4及2/4,其他值一律以1進行相減而產生(1-2/4)、(1-1/4)、(1-1/4)、(1-1/4)、(1-2/4)及(1-1/4),為女性之分數為-3.54(log(3/5)+log(2/5)+log(1/5)+log(1-1/5)+log(1-3/5)+log(1-2/5)+log(1-2/5)+log(1-1/5)+log(2/5)+log(2/5)+log(1-3/5)~=-3.54),其中第一個(3/5)表示五個使用者中為女性之機率,,並取出六點、七點、運動及旅遊所對應之值,及分別為2/5、1/5、2/5及2/5,其他值一律以1進行相減而產生(1-1/5)、(1-3/5)、(1-2/5)、(1-2/5)、(1-1/5)及(1-3/5),而由於男性之分數較高,故透過本發明之性別預測方法將可預測使用者p6 之性別為一男性。 Among them, in order to facilitate the calculation, we take the above formula as a logarithm to calculate, and according to the above two formulas, we can calculate the user p6 as a male score of -2.3 (ie log (2/5) + log ( 3/4)+log(2/4)+log(1-2/4)+log(1-1/4)+log(1-1/4)+log(1-1/4)+log( 1-2/4)+log(3/4)+log(2/4)+log(1-1/4)~=-2.3), where the first one (2/5) represents five users The probability of being a male, and taking the values corresponding to six points, seven points, sports and tourism, and 3/4, 2/4, 3/4 and 2/4 respectively, the other values are all subtracted by 1 (1-2/4), (1-1/4), (1-1/4), (1-1/4), (1-2/4), and (1-1/4), for women The score is -3.54 (log(3/5)+log(2/5)+log(1/5)+log(1-1/5)+log(1-3/5)+log(1-2 /5) +log(1-2/5)+log(1-1/5)+log(2/5)+log(2/5)+log(1-3/5)~=-3.54), The first one (3/5) indicates the probability of being a female among the five users, and takes the values corresponding to six points, seven points, sports and travel, and is 2/5, 1/5, 2/ respectively. 5 and 2/5, other values are always subtracted by 1 (1-1/5), (1-3/5), (1-2/5), (1-2/5), (1 -1/5) and (1-3/5), and because of the male score , Which is transmitted through sex predictive methods of the invention will be predictable user p6 The gender is a male.

而在網站伺服器計算出使用者p6為男性之後,其便可以將使用者p6正在瀏覽的網頁畫面進行一更新,如更新成有關男性使用者之廣告內容,包含運動器材、3C用品或是汽車廣告等等,或是更新成男性使用者有興趣之部落格文章,以增加p6停留在此部落格網站之瀏覽時間,進而增加部落格網站之點閱率或是人氣。 After the web server calculates that the user p6 is a male, it can update the webpage that the user p6 is browsing, such as updating the advertisement content of the male user, including sports equipment, 3C supplies, or a car. Advertising, etc., or updated to a blog post that is of interest to male users, to increase the browsing time of p6 staying on the blog site, thereby increasing the click rate or popularity of the blog site.

而值得一提的是,本發明所使用之使用者p1~p6之網頁瀏覽記錄係指其必須在此部落格網站連續瀏覽網頁兩次以上,換句話說,使用者必須停留在此部落格網站並進行最少一次的頁面切換動作,若是使用者點擊至此部落格網站後而直接離開此部落格網站者,則此種網頁瀏覽記錄將不被使用於本發明之內。 It is worth mentioning that the web browsing record of the user p1~p6 used in the present invention means that the webpage must be browsed twice or more on the blog website. In other words, the user must stay on the blog website. And at least one page switching action is performed. If the user clicks directly on the blog site and then leaves the blog site directly, such web browsing history will not be used in the present invention.

請參閱第2圖及第3圖,其係根據本發明另一實施例之性別預測方法之第一示意圖及第二示意圖。在此實施例中,部落格網站100係架設於一網站伺服器101上,其中此網站伺服器101可包含一電腦主機、一工作站或是一伺服器。如第2圖所示,當一第二使用者20利用一電腦瀏覽至本發明之部落格網站100之網頁時,若是此第二使用者20並未以會員的身份登入至此部落格網站100內,且其在此部落格網站100上瀏覽至少兩個以上頁面時,此時網站伺服器101可以透過網路接收第二使用者20點擊網頁時所送出之封包,以得知第二使用者20瀏覽網頁時之多個第二特徵21,其中此第二特徵21可包含登入之時間、文章類別、登入來源網址以及所瀏覽文章之類別、作者及其他相關資訊等。 Please refer to FIG. 2 and FIG. 3 , which are a first schematic diagram and a second schematic diagram of a gender prediction method according to another embodiment of the present invention. In this embodiment, the blog website 100 is installed on a website server 101. The website server 101 can include a computer host, a workstation, or a server. As shown in FIG. 2, when a second user 20 browses to the webpage of the blog website 100 of the present invention by using a computer, if the second user 20 does not log in to the blog website 100 as a member. And when the website server 101 browses at least two or more pages, the website server 101 can receive the packet sent by the second user 20 when the webpage is clicked through the network to learn the second user 20 A plurality of second features 21 when browsing the webpage, wherein the second feature 21 may include a time of login, an article category, a login source URL, a category of the article being viewed, an author, and other related information.

如第3圖所示,當網站伺服器101接收到此第二特徵21後,利用本發明之性別預測方法即可預測出目前瀏覽此部落格網站100之第二使用者20之一性別為何,而其利用第一使用者之網 頁瀏覽資訊進行預測之計算方式已於上述實施例揭露,故在此不進行贅述。而當預測出第二使用者20之性別為女性時,則此時部落格網站100便可以傳送一更新頁面22,包含女性之廣告頁面、部落格文章頁面,或是更新為一較柔合之版面設計至第二使用者20正在瀏覽的頁面上,以增加此第二使用者20在此部落格網站100之停留時間或是點閱率。 As shown in FIG. 3, after the website server 101 receives the second feature 21, the gender prediction method of the present invention can be used to predict the gender of the second user 20 who is currently browsing the blog website 100. And it utilizes the network of the first user The calculation method of the page browsing information for prediction is disclosed in the above embodiment, and therefore will not be described herein. When it is predicted that the gender of the second user 20 is female, then the blog website 100 can transmit an update page 22, including a female advertisement page, a blog article page, or update to a more flexible one. The layout is designed to be viewed by the second user 20 to increase the dwell time or click rate of the second user 20 on the blog website 100.

以上所述之實施例僅係為說明本發明之技術思想及特點,其目的在使熟習此項技藝之人士能夠瞭解本發明之內容並據以實施,當不能以之限定本發明之專利範圍,即大凡依本發明所揭示之精神所作之均等變化或修飾,仍應涵蓋在本發明之專利範圍內。 The embodiments described above are merely illustrative of the technical spirit and the features of the present invention, and the objects of the present invention can be understood by those skilled in the art, and the scope of the present invention cannot be limited thereto. That is, the equivalent variations or modifications made by the spirit of the present invention should still be included in the scope of the present invention.

S11~S14‧‧‧流程步驟 S11~S14‧‧‧ Process steps

Claims (8)

一種利用網頁瀏覽行為之性別預測方法,適用於一部落格網站,該部落格網站係架設於一網站伺服器上,該網站伺服器係分別儲存複數個第一使用者之性別資料以及該複數個第一使用者登入該部落格網站時之一網頁瀏覽資訊,該性別預測方法包含:離散化該複數個第一使用者之一連續特徵以形成至少一離散特徵,其中該連續特徵係包含一連續數值;根據該至少一離散特徵以及複數個第一特徵產生複數個測試特徵,其中該複數個第一特徵包含一文章類別、一登入來源網址、一文章瀏覽資訊及登入該部落格網站時之前一網站資訊;計算對應該複數個測試特徵之複數個男性比例以及複數個女性比例;以及將一第二使用者瀏覽該部落格網站時之複數個第二特徵代入至該分類器,由該分類器依據該複數個男性比例以及該複數個女性比例計算該第二使用者之一性別比例,其中該複數個第二特徵係包含對應該第二使用者的該連續特徵;其中該複數個第一使用者以及該複數個第二使用者係透過該部落格網站連續瀏覽網頁兩次以上,且該分類器為一單純貝氏分類器(Naïve Bayes Classifier)。 A gender prediction method using web browsing behavior is applicable to a blog website, the blog website is installed on a web server, and the website server stores a plurality of first user gender data and the plurality of web services respectively. a web browsing information when the first user logs into the blog website, the gender prediction method includes: discretizing one of the plurality of first users to form at least one discrete feature, wherein the continuous feature comprises a continuous And generating a plurality of test features according to the at least one discrete feature and the plurality of first features, wherein the plurality of first features include an article category, a login source URL, an article browsing information, and a previous login to the blog website Website information; calculating a plurality of male ratios corresponding to a plurality of test characteristics and a plurality of female proportions; and substituting a plurality of second features when a second user browses the blog website to the classifier Calculating one of the second users based on the plurality of male ratios and the proportion of the plurality of females In addition, the plurality of second features include the continuous feature corresponding to the second user; wherein the plurality of first users and the plurality of second users continuously browse the webpage through the blog website twice Above, and the classifier is a Naïve Bayes Classifier. 如申請專利範圍第1項之性別預測方法,更包含利用一拉普拉斯平滑(Laplace Smoothing)以正規化該男性比 例以及該女性比例。 For example, the gender prediction method of claim 1 includes the use of a Laplace Smoothing to normalize the male ratio. Example and the proportion of the woman. 如申請專利範圍第1項之性別預測方法,其中該登入來源網址係包含一國別資訊或一組織資訊。 For example, the gender prediction method of claim 1 of the patent scope, wherein the login source URL includes a country information or an organization information. 如申請專利範圍第1項之性別預測方法,其中該文章瀏覽資訊係包含一作者資訊。 For example, the gender prediction method of claim 1 of the patent scope, wherein the article browsing information includes an author information. 如申請專利範圍第1項之性別預測方法,其中該連續特徵係為瀏覽該部落格網站之一時間。 For example, the gender prediction method of claim 1 of the patent scope, wherein the continuous feature is one time to browse the blog website. 如申請專利範圍第1項之性別預測方法,更包含利用一假設檢驗方法以減少該複數個第一特徵之個數,其中該假設檢驗方法係為一卡方檢驗測試(chi-square test)。 For example, the gender prediction method of claim 1 further includes using a hypothesis test method to reduce the number of the plurality of first features, wherein the hypothesis test method is a chi-square test. 如申請專利範圍第1項之性別預測方法,其中登入該部落格網站時之前一網站資訊係包含不同於該部落格網站之一外部網頁以及在該部落格網站內之另一網頁。 For example, in the gender prediction method of claim 1, wherein the previous website information includes an external web page different from one of the blog websites and another web page within the blog website. 如申請專利範圍第1項之性別預測方法,更包含根據該性別比例以即時變動該部落格網站內廣告之內容、版面之設計以及推薦之文章。 For example, the gender prediction method of claim 1 includes an article that dynamically changes the content, layout, and recommendation of the advertisement in the blog website according to the gender ratio.
TW104128081A 2015-08-27 2015-08-27 Gender prediction method by using webpage surfing behavior TWI556121B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
TW104128081A TWI556121B (en) 2015-08-27 2015-08-27 Gender prediction method by using webpage surfing behavior
CN201610723750.0A CN106484762A (en) 2015-08-27 2016-08-25 Method for predicting gender by using webpage browsing behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW104128081A TWI556121B (en) 2015-08-27 2015-08-27 Gender prediction method by using webpage surfing behavior

Publications (2)

Publication Number Publication Date
TWI556121B true TWI556121B (en) 2016-11-01
TW201709088A TW201709088A (en) 2017-03-01

Family

ID=57851441

Family Applications (1)

Application Number Title Priority Date Filing Date
TW104128081A TWI556121B (en) 2015-08-27 2015-08-27 Gender prediction method by using webpage surfing behavior

Country Status (2)

Country Link
CN (1) CN106484762A (en)
TW (1) TWI556121B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145932A (en) * 2017-06-28 2019-01-04 中兴通讯股份有限公司 User's gender prediction's method, device and equipment
CN109034868A (en) * 2018-06-21 2018-12-18 上海二三四五网络科技有限公司 It is a kind of to browse the control method and control device that information determines user's gender based on user
CN109766955A (en) * 2019-02-12 2019-05-17 深圳乐信软件技术有限公司 Gender identification method, device, equipment and storage medium
CN113268654A (en) * 2020-02-17 2021-08-17 北京搜狗科技发展有限公司 User gender identification method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200802162A (en) * 2006-06-20 2008-01-01 Nu Channel Beijing Xplus Method and system for precisely distributing data messages
US7660468B2 (en) * 2005-05-09 2010-02-09 Like.Com System and method for enabling image searching using manual enrichment, classification, and/or segmentation
CN103186565A (en) * 2011-12-28 2013-07-03 中国移动通信集团浙江有限公司 Method and device for judging user preference according to web browsing behavior of user
CN104008184A (en) * 2014-06-10 2014-08-27 百度在线网络技术(北京)有限公司 Method and device for pushing information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8041082B1 (en) * 2007-11-02 2011-10-18 Google Inc. Inferring the gender of a face in an image
CN102902986A (en) * 2012-06-13 2013-01-30 上海汇纳网络信息科技有限公司 Automatic gender identification system and method
WO2013191931A1 (en) * 2012-06-21 2013-12-27 Thomson Licensing Method and apparatus for inferring user demographics
CN104036291A (en) * 2014-06-11 2014-09-10 杭州巨峰科技有限公司 Race classification based multi-feature gender judgment method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7660468B2 (en) * 2005-05-09 2010-02-09 Like.Com System and method for enabling image searching using manual enrichment, classification, and/or segmentation
TW200802162A (en) * 2006-06-20 2008-01-01 Nu Channel Beijing Xplus Method and system for precisely distributing data messages
CN103186565A (en) * 2011-12-28 2013-07-03 中国移动通信集团浙江有限公司 Method and device for judging user preference according to web browsing behavior of user
CN104008184A (en) * 2014-06-10 2014-08-27 百度在线网络技术(北京)有限公司 Method and device for pushing information

Also Published As

Publication number Publication date
TW201709088A (en) 2017-03-01
CN106484762A (en) 2017-03-08

Similar Documents

Publication Publication Date Title
US11049138B2 (en) Systems and methods for targeted advertising
US11681750B2 (en) System and method for providing content to users based on interactions by similar other users
US8799260B2 (en) Method and system for generating web pages for topics unassociated with a dominant URL
CN103440286B (en) It is a kind of to provide the method and device of recommendation information based on search result
US9754044B2 (en) System and method for trail identification with search results
US7921156B1 (en) Methods and apparatus for inserting content into conversations in on-line and digital environments
US8938463B1 (en) Modifying search result ranking based on implicit user feedback and a model of presentation bias
US8380563B2 (en) Using previous user search query to target advertisements
US8060524B2 (en) History answer for re-finding search results
US9798820B1 (en) Classification of keywords
US8775355B2 (en) Dynamic online communities
US20160299899A1 (en) Generating a user-specific ranking model on a user electronic device
US20140222834A1 (en) Content summarization and/or recommendation apparatus and method
US20150379571A1 (en) Systems and methods for search retargeting using directed distributed query word representations
US9679043B1 (en) Temporal content selection
US20110125759A1 (en) Method and system to contextualize information being displayed to a user
US20130080428A1 (en) User-Centric Opinion Analysis for Customer Relationship Management
US20080077494A1 (en) Advertisement Selection For Peer-To-Peer Collaboration
WO2010085773A1 (en) Hybrid contextual advertising and related content analysis and display techniques
TWI556121B (en) Gender prediction method by using webpage surfing behavior
JP2009532774A5 (en)
US9171045B2 (en) Recommending queries according to mapping of query communities
US20140025496A1 (en) Social content distribution network
US20140214548A1 (en) User Profiling Using Submitted Review Content
Martin et al. Mining newsworthy topics from social media