TW201003434A - Manpower data mining association rule and the system thereof - Google Patents

Manpower data mining association rule and the system thereof Download PDF

Info

Publication number
TW201003434A
TW201003434A TW97125638A TW97125638A TW201003434A TW 201003434 A TW201003434 A TW 201003434A TW 97125638 A TW97125638 A TW 97125638A TW 97125638 A TW97125638 A TW 97125638A TW 201003434 A TW201003434 A TW 201003434A
Authority
TW
Taiwan
Prior art keywords
data
level
association
conditions
manpower
Prior art date
Application number
TW97125638A
Other languages
Chinese (zh)
Inventor
xian-fa Yang
Sheng-Zhe Gao
Zhuan-Kai Zhang
wen-xing Gao
Original Assignee
All Chinese Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by All Chinese Internet Co Ltd filed Critical All Chinese Internet Co Ltd
Priority to TW97125638A priority Critical patent/TW201003434A/en
Publication of TW201003434A publication Critical patent/TW201003434A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a manpower data mining correlation rule and the system thereof, characterized by exploring the status between manpower-hunting conditions of the network bank. By virtue of using a data mining method, the relation between manpower-hunting conditions is analyzed to further understand the association between enterprise's demand for manpower and job-hunting conditions, which is especially applied and connected to the exploration in technical aspects of association rules, wherein the technique of association rules is used to find out mutual association between some commodity items in the database. This information is utilized properly to then effectively help the user to discover implicit potential associated knowledge and provide a real-time reference for two-party mutual matching between job-hunting and manpower-hunting sides during the process of employment. Furthermore, the purpose of the use for manpower-hunting analysis generalization and conditional association index can be attained.

Description

201003434 九、發明說明: 【發明所屬之技術領域】 本發明提供一種人力資料庫挖掘關聯法則及其系統,其中關 聯法則探勘的關鍵是找出交易中可能相關的變數項目,本發明架 構主要包含四項處理變數;資料收集、資料探勘、關聯分析及分 析結論,藉由採樣之樣品資料,依選擇關聯法則的方式,從各個 不同的觀點去分析所欲挖掘的資訊,透過瞭解產生的關聯性趨 勢、目標_的需求,以測得是否符合定義減求才條件之間的 特性’俾綜合整個求才環境偏好,即時反應應徵條件關聯給予求 職者作為參考依循之應用目的。 先前技術】 習用的求職_都只是把企業所提出的徵才條件内容給公告 出來(請參閱第—圖所示),為了改進此種企業徵才條件應用_ 失,讓舰者能根據已赖條件跟技能去找出符合或是相近的工 作,或是能讓應徵者知道要應徵某樣工作應該具備哪些主要條件與 次要條件’能讓應徵者事先針對不足的部份進行加強,好增加應徵 2機率。使應徵者可以更加快速的找出自己所能擔 Γ企業更加快速的找到所需要的相關人才。也可以讓在學的學t 知道將來想要從事缸作軌備哪錄 的予生 作做準備—舰輸。伽== 201003434 是可以針對附近的難讀所需要的人才加以開課加以培養學 生力:強學生畢業之後的就業機會與工作能力。人資主管及用人單 位主官也謂料統了 _單_專魏力絲_、市場供需是201003434 IX. Description of the Invention: [Technical Field] The present invention provides a human resource database mining association rule and a system thereof, wherein the key to the association law exploration is to find out possible variable items in the transaction, and the architecture of the present invention mainly includes four Item processing variables; data collection, data exploration, correlation analysis and analysis conclusions, by sampling sample data, according to the method of selecting association rules, analyzing the information to be mined from different viewpoints, through understanding the related trend The target _ demand, in order to determine whether it meets the characteristics of the definition of the reduction of the conditions between the conditions ' 俾 integrated the overall demand environment preferences, the real-time response to the conditions associated with the job seeker as a reference for the application purpose. Prior art] The job-seeking _ is only to announce the content of the conditions for the recruitment of the enterprise (please refer to the figure--), in order to improve the application of the conditions of such enterprises, the ship can be relied upon. Conditions and skills to find work that is consistent or similar, or to allow applicants to know which primary and secondary conditions should be met for a job. 'Enable the applicant to strengthen the insufficient part in advance, so as to increase The application rate is 2. It will enable applicants to find out more quickly what they can do to find the relevant talents they need. It is also possible to let the students learn to know what they want to do in the future. Gam == 201003434 It is possible to develop the training for the talents needed for the difficult reading in the vicinity: the employment opportunities and work ability of the strong students after graduation. The head of the personnel and the employer of the employer also said that they are all _ single _ special Wei Lisi _, market supply and demand is

否有落差’耕依公司目前觀作為碰徵核求條件及企评展 方向的建議及參考。 X 資料:=Γ析的方法多為統計式的分析,迄今,遞隨著 斷的增加,如果僅透過單純的統計方式已經不符滿足需 再回去計所應用的方式是預先做好問題結論_ ^ ^正確,其制驗證導她rif_—㈤ 動’現今有透過利用資料挖掘之技術,藉以自動、半自 法之邏==為然此應職術如同上述統計方 的卜道問題的所在,但是並不作結果 解決方案出現前,資料挖掘仍需要更多複雜的計算, 匕之别並不能猜測結果的產生,如同以下列表1容比較所示: —---—__ 表一 ____分析因素及項目 資料探勘方法 . :-----j 法 ----- 必猪 析資料屬性定義清楚 必須 _^決問題目標明確 必須 ^7\ 必須 提供分析演算法 ----— 統計分析方法、人工 智慧'決策樹、類神 經網路 ------ 統計分析方法 模式建立 — 提供多種模式,可在 内決定適合 5¾艺析者逐 201003434 --~~-- ,~~ 一 , ~~~~~~~_ 者 要性,模式才能 建立 相關變數 ---- 可以找出多個變數 間之相關性 Γ — Ί 一次只能檢查 一個變數對結 果的影響 可以預期分析結果 ------ 可以 執行方式 ·~__ 不段循環、不斷修正 的過程 — 以問題為導 向,相同問題通 常只需做一泠 綜上所述’該資料挖掘亦類似同採發現導向(⑽暖職)之統 4方式魏於此’若是使用統計的方式,沒有假設目標結果就無 法驗證’分析賴間意涵的相對_、,對於—般統計檢定的技術並 不適用如何有效運用資料挖掘(Data Mining)的統計方法,正是本 發明所欲提&改善之技術特徵所在。 按’美國專利號第6,735,571號「提供一種預估薪資的系統 及方法」發明專利說書内容所揭示,「此祕包含—通訊網路介 面、-儲雜置及—處理器。通訊網路介面用以耗合至—通訊網 路。錯存裝置用以儲韻資資料。處理H絲合至軌網路介面 =儲存裝置,以直線單元、多元或曲線多次方迴歸方法,確認薪 貧給付之迴歸因子,建立迴歸公式,計算各職務之齡薪資。」; ^述發明之理論基礎探討,其巾針對其發_容麟:「取得薪 資資料、檢核標準化薪資結構及調整,其中影響因子包括年資、 產業、公51顯、市場給薪參考等讎魏,透過未來預估薪資 與目别狀態巾所獲取之實際報酬比值,以迴歸參考因子作為調整 係數’當預鋪資與實際細二者祕在可接受範圍,該迴 7 201003434 歸方具效度’俾當雜未來可_之水準㈣,則必須辅以其它 因素’經加權合計,藉財得妥適預賴資之適用水平。」。 、然而’别述之先前技術無法將—個酸之薪資結構深掘,亦 ”,、法根據個呈曲線變化之薪資調整,於動態環境變化之因子干 擾下’適調整職位、升遷、轉職等等與預估薪資成正比,有鑑於 本案發日狀就其^料庫挖掘(Da1;abase她耐w)理論之基 礎’透過應用發明處理敎間之功簡性,推導出—更客觀、準 確之數據參考’基於此縣即為本案發日狀所麟決技術特徵之 【發明内容】 本發月中我们H網路人力銀行的求才條件之間狀況,並透過資 料探勘(触的技術,分析求才條件之間的關係,了解企業所 要的人才的_派__,_财賴娜關係,提 供求職者作為應徵的參考,其巾處理流程步驟為: 首先選擇所需要的資料攔 一資料選擇1〇,其係指資料前置處理 位; -資料淨化20 ’再㈣料縣,在資料中有些紀錄的值為空格, 有的誇大不實或錯誤,在分析的過程中必須將這些雜訊資料排 除在外,以免分析出來的結果不準確; 一豐富化3Q,透雜各财_解銳触集,獅產生的關 8 201003434 聯法則是否符合目標醜的需求; 依據所要挖掘的資 編碼4〇,在相_料與配備齊全的情況下, 料並設定相關的數值參數; 貝料挖掘5G,結合:#繼Μ_)的技術,分析求才條 件之間的_、,了解企業所要的人才的求職條件之間的關係, 期望發現有用的重要潛在關係; 及報4估60 ’依據關聯法則分析出來的條件特性,综合整個 求才環境及公司偏好’提供應徵條件關聯給求職者作為參考。 【貫施方式】 ^章社要敘述—人力㈣庫挖掘_法則及其系統,為使 :查委貝方便騎瞭解本發明之其他特_容與優點及其所達 、之功效㈣更為顯現,歸本發舰合_,詳細說明如下: 在資料挖辦,_賦法則(AssQeiatiQn加㈤是其中一種最 常=用到的技術方法,在#料挖掘的演算法中,是—種可以產生 大罝^J的方法。最早由敏繼!等學者提出Apri〇ri賴聯式法 則的演算法(Agrawai,Imielinski,andSwami,酬。其主要功能是 在找出資料㈣目__,同時—般_式法則(咖㈣, i—ki,ands職i,1993)以支持度(s_〇rt)、信賴度(或稱信心 f準·’ Confidence)作為評選的規範。例如,根據商店中消費者每次 父易所講買的產品資料’可挖掘出齡^下_聯法則:「職的 顧客如果顧碳粉g,則也會同時顧報表紙」,該關聯法則探勘 9 201003434 2問題^義如下:令1為商店中所販售的商品項目item的集 口在父易資料庫中,每一筆交易包含交易編號與一組被 =目,-組商品項目所成的集合稱之為「項目集」Η—。 疋一個項目集,若所有在X中的項目皆被包含在交易τ ,則我们稱交易Τ支持(Support)項目集X。一個項目集X的 寺個數(Support count)」則被定義為、、支持項目集X的交易 而項目集χ的支持度則是、、支持項目集χ的交易個數佔 王邰父易總數的比例。There is a gap in the current view of the company as a recommendation and reference for the conditions of the approval and the direction of the evaluation. X data: = The method of depreciation is mostly statistical analysis. So far, the increase in the number of breaks, if only through the simple statistical method has not met the need to go back to the application of the method is to pre-empt the problem conclusion _ ^ ^ Correct, its system verification guides her rif_-(5) Movement's use of data mining technology, through automatic, semi-self-legal logic == This is the job of the above-mentioned statistical party, but Before the emergence of the solution, data mining still needs more complicated calculations. The difference between the two cannot be guessed, as shown in the following table: 1.-----__ Table 1 ____ analysis factors And project data exploration method. :-----j method----- must specify the definition of the data attribute must be _^ determine the problem target must be ^7\ must provide analysis algorithm ----- statistical analysis method , artificial intelligence 'decision tree, neural network ------ statistical analysis method mode establishment - provide a variety of modes, can be determined within the appropriate 53⁄4 art analysts 201003434 --~~--, ~~ one, ~ ~~~~~~_ The person wants sex, the mode is Establish correlation variables ---- can find the correlation between multiple variables Γ - 只能 can only check the impact of one variable on the result at a time can anticipate the analysis results ------ can be executed ~~__ no loop, The process of continuous revision - problem-oriented, the same problem usually only needs to be summed up. 'The data mining is similar to the same way to find the guidance ((10) warm job). Without the assumption of the target result, it is impossible to verify the relative _ of the analysis. It is not applicable to the technique of statistical verification. How to effectively use the statistical method of Data Mining is exactly what the present invention wants to & The technical characteristics of the improvement. According to the US Patent No. 6,735,571 "A System and Method for Providing an Estimated Salary", the invention of the patent statement reveals that "this secret includes - communication network interface, - storage and processor - communication network interface for consumption Into the communication network. The faulty device is used to store the rhyme data. The H-wire-to-orbit network interface=storage device is processed, and the regression factor of the pay-per-payment is confirmed by the linear unit, multivariate or curve multiple regression method. Establish a regression formula to calculate the salary of each job."; The theory of the invention is discussed. The towel is directed at it. _ Rong Lin: "Get salary data, check the standardized salary structure and adjustment, and the impact factors include seniority, industry. , public 51, market salary reference, etc. Wei, through the future estimated salary and the actual salary ratio obtained by the state towel, with the regression reference factor as the adjustment factor 'when the pre-settlement and the actual fine are secret Acceptance scope, the return of 7 201003434 The correctness of the returning party's level of the future can be supplemented by other factors' Applicable level of funding. ". However, 'the prior art of the other is not able to dig into the structure of the salary structure of the acid, and the law is adjusted according to the curve of the curve, and under the interference of the dynamic environment changes, the job is adjusted, promoted, transferred. In order to be proportional to the estimated salary, in view of the fact that the case is based on the theory of the database (Da1; abase her resistance w) theory, through the application of inventions to deal with the simplification of the day, derived - more objective Accurate data reference 'Based on this county is the technical characteristics of the case of the case of the case. [Invention content] The situation between our H network human bank's seeking conditions in the month of the month, and through data exploration (touch The technology, analyze the relationship between the conditions of seeking talents, understand the relationship between the talents of the enterprise, the ___, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Block one data selection 1〇, which refers to the data pre-processing position; - Data purification 20 'Re- (four) material county, some records in the data are blank spaces, some exaggerated false or wrong, must be in the process of analysis Will this Some of the noise information is excluded, so as to avoid the inaccurate results of the analysis; a rich 3Q, a variety of wealth _ solution sharp set, lion generated Guan 8 201003434 joint law is in line with the target ugly demand; Code 4〇, in the case of phase and complete equipment, and set the relevant numerical parameters; shell material mining 5G, combined with: #继Μ_) technology, analysis between the conditions of the _, to understand the enterprise The relationship between the job-seeking conditions of the talents, expecting to find useful and important potential relationships; and reporting 4 estimates of the conditional characteristics analyzed according to the association rules, integrating the entire environment of seeking talents and company preferences to provide the conditions for the applicants to be associated with Reference. [Cross-application method] ^Zhangshe wants to describe - manpower (four) library mining _ rules and its system, in order to make: Chapeng easy to ride to understand the other special features and advantages of the invention and its effects (four) more For the sake of manifestation, return to the ship to ship _, the detailed description is as follows: In the data excavation, _ Fu rule (AssQeiatiQn plus (five) is one of the most commonly used = technical methods, in the #material mining algorithm, is A method that can produce a large 罝^J. The algorithm of Apri〇ri's Lailian rule (Agrawai, Imielinski, and Swami) was first proposed by Min Ji! and other scholars. Its main function is to find out the data (4) __ At the same time, the general law (Cai (4), i-ki, ands job i, 1993) is based on the degree of support (s_〇rt), reliability (or confidence, 'Confidence'). For example, According to the product information that the consumer in the store buys every time the product is bought by the father, the age can be mined. _ The law of the joint: If the customer of the job considers the toner g, the report paper will also be considered at the same time. 201003434 2 Question ^ is as follows: Let 1 be the collection of item items sold in the store in the parent database, each transaction contains a transaction number and a set of items that are made by the group of items. Call it the "project set" Η-. For a project set, if all the items in X are included in the transaction τ, then we call the transaction support set X. The number of support items in a project set X is defined as , and supports the transaction of the project set X. The support degree of the project set is, and the number of transactions supporting the project set accounts for the total number of the masters. proportion.

續’關聯法則的形式Π 4 γ〔支持度,信心水準广其 χ和Υ代表項目集,我们稱χ為“條件句” γ為“結論 w、。關聯法則X 4 γ的支持度定義為項目# χυγ的支持度,而 ^聯貝j X — γ的信心水準(c〇nfid_)則是符合條件句與結論 p的父易個婁丈佔全體符合條件句的交易個數之比例’亦即 信心水準==xuy + X的支持度 二上t提到的“支持度”及“信心水準,’分別為襲法則的兩個 =估標準,這兩谢旨標通常湘來作為評估_法則是否成立的標 ;’支持度和信心水準的門檻設定過高會不容易產生規則,易造成 遺漏聯反之門麟定值過低又會產生許多雜亂 :可靠的綱’魏於此’支持度和信心轉的設定值需要靠分析 者的經驗方能達到精準及實際。 關聯法則必須滿足事先設定的^個參數值:最小支持度 (minimum support)與最小信心水準㈤⑴難c〇nmence)。 201003434 最小支持度和資料庫中交易總數的乘積即是最小支持個數 (minimum support count)。 假設最小支持度與最小信d準分贴Q, 2和Q. 5;關聯法則{1,3} {5}的支持個數為2 ’所以支持度為〇. 2,且項目集丨1,3丨的支持 度為〇.3,因此關聯法則U,3}— {5}的信心水準為 〇. 2/0. 3=0. 67。 關聯法則探勘_題可以再細分為兩個子問題: 首先找!ϋ所有支持度大於或等於最小支持度的項目集,稱之 為“大型項目集,,(largeitemset); 接著從大31項目集中產生信心水準大於或等於最小信心水準 的關聯法則。 假設Z為大型項目集,財形式為χ—γ,滿足χυγ=ζ、χηγ=Continued the form of the association rule Π 4 γ [support, confidence level Guangqi and Υ representative project set, we call it "conditional sentence" γ is "conclusion w,. The association law X 4 γ support is defined as the project #χυγ的支持度,和^联贝j X — γ confidence level (c〇nfid_) is the ratio of the number of transactions that satisfy the conditional sentence and the conclusion of the parent of the p. Confidence level ==xuy + X support degree 2 "support" and "confidence level" mentioned in the t, respectively, are the two criteria for the attack law, the two standards are usually evaluated as _ The standard of establishment; 'the threshold of support and confidence level is too high to be easy to produce rules, easy to cause omissions. If the threshold is too low, there will be a lot of confusion: a reliable platform's support and confidence. The set value of the transfer needs to rely on the experience of the analyst to achieve accuracy and practicality. The association rule must satisfy the previously set parameter values: minimum support (minimum support) and minimum confidence level (five) (1) difficult c〇nmence). 201003434 The product of minimum support and the total number of transactions in the database is the minimum support count. Assume that the minimum support and the minimum letter d are assigned Q, 2 and Q. 5; the number of support for the association rule {1,3} {5} is 2 ', so the support is 〇. 2, and the project set is ,1. The support level of 3丨 is 〇.3, so the correlation rule U, 3}—the confidence level of {5} is 〇. 2/0. 3=0. 67. The association rule _ questions can be subdivided into two sub-questions: First find! ϋ All items with support greater than or equal to the minimum support, called "large project set, (largeitemset); then from the big 31 project A correlation rule that produces a level of confidence greater than or equal to the minimum level of confidence. Suppose Z is a large project set, and the financial form is χ-γ, which satisfies χυγ=ζ,χηγ=

生切轉於最小信心轉_聯法_應該被產 ί將==明物項目編叫_法則的產 在本矣明巾’我恤討網路人力 、 並透過資料探勘(Data Mining)_淋八之間狀况 了解企業所钱人相求^求才條狀間的關係, i 4 職條件之__,期望發現有用的重 2潛在_,祕錢㈣為應_ 括以下六項步驟:資料撰摆1Λ #, 贵议釘将徵包 40、資料挖掘50、以及報Α化2G、豐f化30、編碼 本發明中藉由下估⑼’流程如第二_示:因此在 下其中至少一種關聯法則技術之應用,藉以 11 201003434 201003434 充份,有效達到分析資 揭露如下: 訊需求之使用目的,就各種關聯法則技術 夕層人關聯法則探勘(Muitiievei Associati〇n Ruie ing)使用相无念階層來進行跨層次的探勘,並且給予不同層次 疋不同最J、支持度的彈性,這種使用相义念階層所發掘出來的關 聯法則稱之為“多層次關聯法則,,Onultilevel _ciation rules) ’在衫應用中,使用不_齡階層來發掘關聯法則是非 常有用的。例: 法貝].的顧各如果購買pc主機,也會同時購買螢幕。 法則二:70%的顧客如果講買IBMpc主機,也會同時講買仏心沁 螢幕。 法則二是以一個較低的概念層次(1〇wer c〇ncept ievei)來表示, 但是它可以比法則-提供更細部的資訊。雖然以‘ ‘較低” 〇嶋_) 的概念層次所表示的關聯法則可以比“較高,,(highe0的概念 c層次呈現更多的資訊,但是可能產生“在較低的概錢次的項7 之支持度偏低”的情況。 若想要在較低的概念層次發掘關聯法則,最小支持度相對的就必 須被降低。 例如,HP商用電腦-> ViewSonic液晶螢幕(支持度=〇 〇1) 但這種作法可能會造成職生的法狀實際效益大為降低。 在較高的概念層次所產生的關聯法則雖然具有較高的支持声 可能是一般根據經驗即可預測出來的明顯的結果。 — 12 201003434 例如们人電知—螢幕(支持度=〇. 95),這種明顯的結果 之實際效益亦不高。 多層次關聯法則探勘的基本精神;使用由上而下的處 =方式,先計算在層次丨(level—0的大型項目集,接著計算在層 二人2 (levei-2)的大型項目集,依此類推下去,直到沒有其 型項目集產生為止。 在每-個層次中’可以使用Apriori演算法來產生大型項目集。 ^進行錢次關聯法職勘時,可以讓不關層次有不同的 最小支持度’也可明加下列二項於_的條件限制: ㈠、在層次/的項目,,只有當4層次yi的父節點所代 表的項目是大型項目時,χ才需要被考慮。 例!:爾f,念階層中1 “桌上型電腦,,個大型項目’則 豕用電腦和“商用電腦”才需要被考廣。反之,若、 型電腦,,不是大型項目,則“家用带鹏”心 桌上 不用被考慮。 豕用㈣和“商用電腦”就 (二)、考慮父節點支持度的多層次 時,只有當它在層次η ^勘/考慮層次綱目尤 ., 1的又郎點是大型1-項目隼的 時候’項目增被考慮,否則就忽略不予考慮 另外亦提供二項於不同的條件限制. 13 201003434 (二)、考慮父節點左-項目集支持度的多層次探勘:考慮層次y 的左-項目集I時,只有當它在層次/-1所對應的父節 點之項目集是大型項目集的時候,j才需要被考 慮,否則就忽略不予檢查。 以下就多層次關聯法則探勘演算法處理步驟做一簡述: 步驟 1 : for (卢1 ; Z[/, 1]#0 and /最大層次;/f+) do begin /*從層次1開始依序產生各層次的大型項目集V 步驟 2 : if /=1 then { 步驟 3 : 1] = Large」tem_gen(7[l],/) ; /* 從交易資料庫 7[1]中找出在層次1的大型1-項目集*/ 步驟4: 712] = Filtered_table(7[l],Z[l,l]); /*使用Z[l, 1]對7[1]進行過滤的動作*/ 步驟5 : } 步驟 6 : else Z[y·,1] = Large—item_gen(7[2],_/); 步驟 7 ·· for (女=左++) do begin /*產生層次/的大型於項目集*/ 步驟 8 : Or = Candidate_gen(Z[/,左-1]); 步驟9 : for each在7[2]中的交易ί 步驟10 :對於α中的每一個候選項目集c,若ί包含c,則將c 的支持個數增加1 ; 步驟11 : Z[y;左]=在fit中滿足最小支持個數minsup[/j的候選 左-項目集所成的集合; 步驟12 : end 步驟13 : return ZZ[/]=所有在層次/的大型項目集的集合 步驟14 : end 14 201003434 根據上述處理步驟, ,一實施例提出實務應用: 相關定義: 7為原始的交易資料庫。 7Π]是將交易中購買的 义[/,幻表示在層次y•的大目使用項目編碼方式轉換而成的資料庫 从[/]則表示在曰層幻.的項目集所成的集合。 minsup[/]表示層次.的田大型項目集所成的集合。 ⑺假設有_品編號3:^持個數。,賴”:數數,ϊ: 表 知1¾器。 代表 印表機”,數字“丨 代表 代表 5”代 1600 1601 1602 1603 1604 1605 1606 1607 1608 攜帶型電月g_ 品牌Health cuts to the least confidence turn _ Lianfa _ should be produced ί will == Ming project 编 的 的 的 的 产 ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' Between the eight conditions to understand the relationship between the company's money and people seeking to find the relationship between the strips, i 4 conditions __, expect to find useful weight 2 potential _, secret money (four) is _ including the following six steps: Information 撰 Λ Λ , , , , , 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 The application of a related law technology, by 11 201003434 201003434, to effectively achieve the analysis of the disclosure of the following: The purpose of the use of the demand, the use of the law of various related laws and techniques (Muitiievei Associati〇n Ruie ing) The class conducts cross-level exploration, and gives different levels of different J, support degree of flexibility. This association rule discovered by the use of the similar class is called "Multi-level association rule, Onultilevel _ciation rules" In the shirt should In the use of non-age class to explore the association rules is very useful. For example: Fabe]. If you buy a pc host, you will also buy a screen. Rule 2: 70% of customers who buy IBMpc host, At the same time, I will buy a 仏 沁 screen. The second rule is expressed by a lower concept level (1〇wer c〇ncept ievei), but it can provide more detailed information than the law. Although it is 'lower' 〇 The concept hierarchy represented by 概念_) can be higher than “higher,” (highe0 concept c level presents more information, but may result in “lower support for item 7 at lower cost” If you want to explore association rules at a lower conceptual level, the minimum support level must be reduced. For example, HP Business Computer-> ViewSonic LCD screen (support = 〇〇1) but this approach may The actual benefits of the law will be greatly reduced. The association law generated at the higher conceptual level, although having a higher support voice, may be an obvious result that can be predicted by experience. — 12 20 1003434 For example, people know that the screen (support degree = 〇. 95), the actual benefit of this obvious result is not high. The basic spirit of multi-level association law exploration; using the top-down approach = first calculation At the level 丨 (level 0 large project set, then calculate the large project set in layer 2 2 (levei-2), and so on, until there is no type of project set. In each level You can use the Apriori algorithm to generate large project sets. ^ When you perform the money association method, you can make the minimum support level different for the level. You can also add the following two conditions to the condition limit: (1) At the level / Project, only when the project represented by the parent node of the 4 level yi is a large project, the χ need to be considered. example! : er f, in the class 1 "desktop computer, a large project" is the use of computers and "commercial computers" only need to be widely tested. Conversely, if, computer, not a large project, then "home belt Peng's heart table does not have to be considered. Use (4) and "commercial computer" (2), consider the multi-level of parent node support, only when it is in the level η ^ survey / consider the level of the outline, especially Lang points is a large 1-item ' when the project is considered to be considered, otherwise it will be ignored and will be provided in addition to the different conditions. 13 201003434 (b), consider the parent node left-project set support Hierarchical exploration: When considering the left-item set I of the level y, only when it is a large project set of the parent node corresponding to the level /-1, j needs to be considered, otherwise it will be ignored. The following is a brief description of the processing steps of the multi-level association rule exploration algorithm: Step 1: for (Lu 1 ; Z[/, 1] #0 and / maximum level; /f+) do begin /* starting from level 1 Generate a large set of items at each level V Step 2: if /=1 then { Step 3 : 1] = Large"tem_gen(7[l],/) ; /* Find the large 1-item set in level 1 from transaction database 7[1]*/ Step 4: 712] = Filtered_table(7[ l],Z[l,l]); /*Use Z[l, 1] to filter 7[1]*/ Step 5: } Step 6: else Z[y·,1] = Large—item_gen (7[2], _/); Step 7 ·· for (female=left++) do begin /* produces a hierarchy/larger than the project set*/ Step 8: Or = Candidate_gen(Z[/, left-1 ]); Step 9: For each transaction in 7[2] ί Step 10: For each candidate item set c in α, if ί contains c, increase the number of support for c by 1; Step 11: Z [y;left]=A set of candidate left-item sets satisfying the minimum supported number minsup[/j in fit; Step 12: end Step 13: return ZZ[/]=All large items in the hierarchy/ The collection step 14: end 14 201003434 According to the above processing steps, an embodiment proposes a practical application: Related definitions: 7 is the original transaction database. 7Π] is a collection of items that are converted from the transaction in the transaction [/, phantom representation in the level y• of the big item using the project encoding method. [/] is the collection of the item set in the layered illusion. Minsup[/] represents a collection of large-scale project sets of the hierarchy. (7) Suppose there is a number of _ article number 3: ^. , Lai": counting, ϊ: table knowing 13⁄4 device. Representative printer", number "丨 represents representative 5" generation 1600 1601 1602 1603 1604 1605 1606 1607 1608 portable electricity month g_ brand

IBMIBM

COMPAQ 家用雷BiL<COMPAQ Home Ray BiL<

平板靈凰 —- 土极 表-τ_·_缠聲交易資料雇 交易編號 商品編號 1 {1600,2301} 2 {1602, 3457,4563} 3 {1601,2305, 4563} 4 {1606,2302}Tablet Linghuang --- Earth pole Table - τ_·_ entanglement transaction data employment transaction number Item number 1 {1600,2301} 2 {1602, 3457,4563} 3 {1601,2305, 4563} 4 {1606,2302}

ASUSASUS

HPHP

IBMIBM

ACERACER

IBMIBM

ACERACER

TOSHIBA 15 201003434 5 {1605,2302, 46231 6 {1607, 2307, 54571 7 {1608, 2306} 8 {1606, 2302,45631 9 13001,54531 10 {5455} 念階層重新編瑪,TOSHIBA 15 201003434 5 {1605, 2302, 46231 6 {1608, 2306} 8 {1606, 2302,45631 9 13001,54531 10 {5455} The class is re-edited,

例如’在表二的第一筆交易中包含商品編號為“16〇0”的物品, 根據表三的個人電腦分類描述,它對應到IBM家用腦’再依據個 人電腦的分類階層,則可得到對應的編碼為“1ΠΓ° csf於個人電腦分類階層中的項目,可以用下列的編碼方式來表 示: .代表 下在層次 代表往下 代表 l電腦”,第二個數字i多勺| (桌上型電腦,,,第三彳霜數字 分支“商用電腦”,第四倜取予 pM商用電腦被表示為“1122” ,其中第 在層次1的“個人, 2的第一個分支“先一 杳層次3的第二個各支“商用電腦’ 在下在層次4的第二個分支“IBM電腦’ 後的交易資料庫7ί 11 交易編號 編碼後的項目集 1 {1111,2111} 2 Ί {1113, 3214,41111 3 」 {1112, 2211,4111} 4 {1212,21121 5 {1211,2112,42121 6 {1221,2231,52141 7 {1222, 2223} 8 {1212,2112,41111 9 {3112,5211} 10 {5213} 16 201003434 在步驟3 ’ Large_item_gen (7[1], /)程序的目的是從交易資料庫 芥i]中,找出在層次/的大型卜項目集,亦即Z[乂丨]。 ◎當考慮層次1時,使用Large_item_gen(T[l],1)產生大型卜 項目集 Ζ[1,Π。 ^ @ 它層次/(y>1),使用 Large」tem-gen(7t2],》來 產生大型卜項目集Z[/, 1]。 @ 有在z[/—丨,1]的項目才可以被考慮在4·7·,1] 分類階層為例,在層次2的“桌上型電腦,,被 次3的‘二气:桌十型電腦”不是大型卜項目集’則 fit”Λ豕用電腦(“111*”)和“商用電腦” (112* )就不被考慮。 在步驟4, Filtered—程序的目的是使用 Z[U]來檢查7[1]中的每一筆交易卜並且執行下列過爐的動 作: Θ冊彳除ί巾任何支持個數小於最小支持健的項目。 ◎若ί不包含任何大型項目,則將力從仙中刪除。 mtered 的目的是縮減資料庫的大小,它 執行元的結果則儲存在資料庫凡2]中。 2上述’ f先從層:幻開贿序產生各層次的大型項目集。假設 層-人1的最小支持個數為4。 易資料庫仙之後,制層幻的大型1-項目集 ’ ^員目,,ί4***}出現在交易編號“2”、“3”、 17 201003434 • 中的支持2個裡^ 在過滤的過程中被刪除的支持個數小於4 ’所以此項目 •交易編號為 9” 和 -B- S a 含的項目都不是大型項目1 if f f所包 濾的過程中被刪除。、 化兩個父易也會在過 二、數量化關巧法則探勘(Qu磁也㈣Α·^ί〇η驗For example, 'the first transaction in Table 2 contains the item with the item number "16〇0". According to the personal computer classification description in Table 3, it corresponds to the IBM household brain's and then according to the classification level of the personal computer. The corresponding code is "1ΠΓ° csf in the personal computer classification class, which can be represented by the following coding method: . Under the representative, the lower level represents the l computer", the second number i more spoons | The type of computer,,, the third arsenic digital branch "commercial computer", the fourth to the pM business computer was expressed as "1122", which in the first level of the "personal, the first branch of 2" first Level 2 of the second "commercial computer" in the second branch of level 4 "IBM computer" after the transaction database 7ί 11 transaction number coded item set 1 {1111, 2111} 2 Ί {1113, 3214,41111 3 ” {1112, 2211,4111} 4 {1212,21121 5 {1211,2112,42121 6 {1221,2231,52141 7 {1222, 2223} 8 {1212,2112,41111 9 {3112,5211} 10 {5213} 16 201003434 In step 3 ' Large_item_gen (7[1], /) program's purpose From the transaction database mustard i], find the large-scale project set in the hierarchy/, ie Z[乂丨]. ◎ When considering level 1, use Large_item_gen(T[l], 1) to generate large-scale project Set Ζ [1, Π. ^ @ 层级/(y>1), use Large" tem-gen(7t2]," to generate a large set of items Z[/, 1]. @有在z[/—丨, 1] the project can be considered in the 4·7·, 1] classification class as an example, in the level 2 of the "desktop computer, the second 3 'two gas: table ten type computer" is not a large project The set of 'fit' computers ("111*") and "commercial computers" (112*) are not considered. In step 4, the purpose of the Filtered-program is to use Z[U] to check 7[1] Each transaction in the transaction and the following actions are performed: Θ 彳 ί ί any support items less than the minimum support health. ◎ If ί does not contain any large projects, then remove the force from the fairy. mtered The purpose is to reduce the size of the database, and the results of its execution are stored in the database. 2) The above-mentioned 'f first from the layer: the magical open bribe produces a large set of items at all levels. The minimum number of support for the layer-person 1 is 4. After the easy database, the large 1-item set of the layered illusion ' ^ 目 , , ί 4 *** } appears in the transaction number "2", "3", 17 201003434 • Supported in 2) The number of supports deleted during the filtering process is less than 4 'So this item • The transaction number is 9” and the items contained in -B-S a are not large items 1 if ff The packet filtering process was deleted. And the two fathers will also be in the second. The quantitative and succinct law exploration (Qu magnetic also (four) Α·^ί〇η test

Mmmg),同時考慮物品”與“購買數量,,時,會造 最小支持度的項目集變得更少,甚至可能無法產生任何滿足最小 持度的關聯法則,例: 物品與購買數量這層關係對於行銷策略的決定有重要的影響。 ◎例如,40%的顧客如果購買一個碳粉匣,也會購買三包報 表紙”。我們稱這種法則為“數量化關聯法則” (Quantitative association rule) 〇 數量化關聯法則有助於決策者制定出更有效益的行銷策略。 ◎例如,當考慮碳粉匣”的促銷策略時,就可以根據探勘 結果的資訊,提出類似“買一個碳粉匣送兩包報表紙”的 促鎖方案。 考慮將數里为副成許多區間(Intervals),提高每一個項目在 其所屬區間的支持度,以發掘出更多潛在有用的關聯法則。其相關 疋義係包含· 18 201003434 (1) 、假設在資料庫中每一筆交易是由一個交易編號以及一組 α一項目(<3Litem)所組成的,^_項目的形式為.,少,它代表項目y 以及被賭買的數量7。 (2) 、一組1項目所組成的集合稱之為t項目集 (g_itemset) ° (3)、一個q_項目χ的支持度被定義為‘ 易總數佔全部交易總數的比例”。 …—項目χ的- 的目的為’具有相_目的u目可能會有不同的數量Mmmg), considering both the item and the quantity of purchase, the set of items that will create the minimum support becomes less, and may not even produce any association rule that satisfies the minimum holding degree. For example: the relationship between the item and the quantity of purchase It has an important impact on the decision of the marketing strategy. ◎ For example, 40% of customers will purchase three packs of report paper if they purchase a toner cartridge.” We call this rule “Quantitative association rule” Quantitative association rule helps decision makers Develop a more effective marketing strategy. ◎ For example, when considering the promotion strategy of toner cartridges, you can propose a lock-up scheme similar to “Buy a toner to send two packs of report paper” based on the information of the survey results. Consider adding a number of intervals to the Intervals to increase the support of each project in its own interval to discover more potentially useful association rules. Its related 疋 meanings include · 18 201003434 (1), assuming that each transaction in the database is composed of a transaction number and a set of α-items (<3Litem), the form of the ^_ item is . It represents the item y and the number of bets bought. (2) The set of 1 set of items is called the t item set (g_itemset) ° (3), and the support of a q_ item is defined as 'the ratio of the total number of transactions to the total number of transactions.'... Project χ - the purpose of 'has a phase _ purpose u may have a different number

目^支持度可能非f小’如果大部分I項目的支持 可以將嘯祕發掘絲的數量化法職更少了,我們 I將母-個項目所有可能的“數量” 高每一個項目在其所細的繼。刀彻㈣,藉收 欠1___舉—實關說明如下: 現如表五t個數為2 ’根據τ表四找出所有大型項目集,呈 \ - ----~~~- <7一項目 ------- 項目的土目關資訊 ~~ _. TS ^—~- SP <Α, 1> 3 <Α, 2> —----- —---- JM0J5} 3 <Β,1> <Β, 2> -—- 4 il〇iiU2,13,14} ——— — — — 5 <Β, 3> ----- iMilL 3 <C,1> -——J JiAi§,7,8,13i 7 19 201003434 <C, 2> {4,10,11,14,15} 5 <D, 1> {2,7} 2 <E, 1> {7} 1 <F, 1> {1,7,10,11} 4 <F, 2> {3, 9,14} 3 <G, 1> {2, 3, 4,12} 4 <G, 2> {1,8,13,14,15} 5 表五:大型1-L項目集 乂麥項目集 TS SP {<A,1>} {5,12,14} 3 {<A,2>} {4,10,15} 3 «Β, 1>} {1,4, 5,8} 4 {<B,2>} {10,11,12,13,14} 5 {<B,3>} {6, 7,9} 3 {<C,1>} {1,2, 3, 6, 7, 8,13} 7 {<C,2>} {4,10,11,14,15} 5 {<D,1>} {2,7} 2 {<E, 1>} {7} 1 {<F, 1>} {1,7,10,11} 4 20 201003434 (<F, 2» {3, 9,14} 3 I i<G,1>丨 」 {2, 3, 4,12} 4 i<G,2>_| 」 {1,8,13,14,15} rd 在大型項目缝生驗,使職 β,並且奴其是否為大型h項目* = Α產生候避1項目集 項目集的相關資訊。 木,表六顯示所有大型2-β ΚΜ^<01>} ΚΜχ<α2>} {^2>^2^ {<Β, 2>, <F, 1>} KCJ^Dj^j. {<C, 1>, <F, 1>} 一· KCJ^<GJ>} {<C, 1>, <G, 2>}The support level may not be f small 'If most of the I project support can quantify the number of smuggling excavations, we will be all the possible "quantity" of the mother-item project in each project The finer step. Knife Che (4), borrowed and owed 1___ lift - the actual description is as follows: Now the table five t number is 2 'Based on τ table four to find all large project sets, presented \ - ----~~~- < 7 item ------- Project's land information~~ _. TS ^-~- SP <Α, 1> 3 <Α, 2> —----- —---- JM0J5} 3 <Β,1><Β,2> -—- 4 il〇iiU2,13,14} ——— ——— — 5 <Β, 3> ----- iMilL 3 <C ,1> -——J JiAi§,7,8,13i 7 19 201003434 <C, 2> {4,10,11,14,15} 5 <D, 1> {2,7} 2 < E, 1> {7} 1 <F, 1> {1,7,10,11} 4 <F, 2> {3, 9,14} 3 <G, 1> {2, 3, 4 , 12} 4 <G, 2> {1,8,13,14,15} 5 Table 5: Large 1-L project set buckwheat project set TS SP {<A,1>} {5,12, 14} 3 {<A,2>} {4,10,15} 3 «Β, 1>} {1,4, 5,8} 4 {<B,2>} {10,11,12, 13,14} 5 {<B,3>} {6, 7,9} 3 {<C,1>} {1,2, 3, 6, 7, 8,13} 7 {<C, 2>} {4,10,11,14,15} 5 {<D,1>} {2,7} 2 {<E, 1>} {7} 1 {<F, 1>} { 1,7,10,11} 4 20 201003434 (<F, 2» { 3, 9,14} 3 I i<G,1>丨" {2, 3, 4,12} 4 i<G,2>_| ” {1,8,13,14,15} rd in large projects Sewing the test, making the job β, and whether it is a large h project* = Α generates information about the evasion 1 project set project. Wood, Table 6 shows all large 2-β ΚΜ^<01>} ΚΜχ<22>}{^2>^2^{<Β,2>,<F,1>} KCJ^Dj^j. {<C, 1>, <F, 1>} A KCJ^ <GJ>} {<C, 1>, <G, 2>}

1〇^&gt;^2&gt;} ΚΑ^χ&lt;〇2&gt;} 201003434 {&lt;C,2&gt;,&lt;F, 1&gt;} {10,11} 2 {&lt;C, 2&gt;, &lt;G, 2&gt;} {14,15} 2 在大型3-g_項目集產生階段,使用表六的資訊產生候選t項目集 &lt;3,並且決定其是否為大型3-&lt;7_項目集。表七顯示所有大型3-(?_ 項目集的相關資訊, 表七:大型3-ζ?_項目集 大型3U員目集 TS SP {&lt;B, 2&gt;, &lt;C, 2&gt;, &lt;F, 1&gt;} {10,11} 2 {&lt;B, 1&gt;, &lt;C, 1&gt;, &lt;G, 2&gt;} {1,8} 2 因為64=0,Ζ4=0,所以結束大型項目集的產生程序。 接續上述,假設最小信心水準為0. 65,使用表七的對應資訊可以產生 的數量化關聯法則如下: {&lt;Α,1&gt;} ◊ {&lt;Β,2&gt;} 信心水準=2/3=0. 67 {&lt;Α, [2.. 3]&gt;} ◊ {&lt;C,[3.. 4]&gt;}信心水準=3/3=1 {&lt;B,[3. .4]〉} ◊ {C,[1..2]} 信心水準=2/3=0. 67 信心水準=2/2=1 {&lt;D,1&gt;丨◊ {&lt;C, [1..2]〉} {〈B, 2&gt;,&lt;C, [3.. 4]&gt;} ◊ {&lt;F,1&gt;}信心水準=2/3=0. 67 {&lt;B,2&gt;,&lt;F,1&gt;丨◊ {&lt;C,[3.. 4]&gt;丨信心水準=2/2=1 {&lt;C,[3.. 4]&gt;,&lt;F, 1&gt;} ◊ {&lt;B,2&gt;}信心水準=2/2=1 {&lt;B,1&gt;,&lt;C,[1. · 2]&gt;丨◊ {&lt;G,[3.. 5]&gt;}信心水準=2/2=1 22 201003434 {&lt;B’1&gt;,&lt;G,[3. .5]〉} ◊ {&lt;C,[1..2]&gt;}信心水準=2/2=1 {&lt;C,[1.. 2]&gt;,&lt;G,[3. · 5]&gt;} ◊ {&lt;β,;[〉丨信心水準=2/3=〇. 67 、综合以上相關關聯法則探勘之應用,最後經關聯分析之統計, 而求得目標資料之挖掘結論,該關聯分析過程至少可達下有效 之資訊輪廓: (一) 、滿足最小支持度與最小信心水準的關聯法則不一定保 證可以提供具體有用的資訊。 ◎假設在包含10000筆交易記錄的資料庫中’有6〇〇〇筆交 易包含奴粉匣,7500筆交易包含報表紙,並且有筆交易同時 包含碳粉匣和報表紙。假設最小支持度為3〇%,最小信心水準為 60%。下列的關聯法則會產生誤導的作用: 碳粉報表紙[支持度=40%,信心水準=67%] ◎購買報表紙的機率為75%,大於信心水準67%,但是購買碳 粉匣反而降低購買報表紙的可能性。 (二) 、若=八j)八釣,則表示項目集j的出現和 項目集厶的出現無關(independent),否則表示項目集 J和項目集方是有關的(dependent and correlated)。 ◎項目集4和項目集厶的相關程度(correlati〇n)之計算方式如 下:1〇^&gt;^2&gt;} ΚΑ^χ&lt;〇2&gt;} 201003434 {&lt;C,2&gt;,&lt;F, 1&gt;} {10,11} 2 {&lt;C, 2&gt;, &lt;G , 2&gt;} {14,15} 2 In the large 3-g_ project set generation phase, use the information in Table 6 to generate the candidate t item set &lt;3, and determine whether it is a large 3-&lt;7_ item set. Table 7 shows all the large 3-(?_ project sets related information, Table 7: Large 3-ζ?_ project set large 3U staff set TS SP {&lt;B, 2&gt;, &lt;C, 2&gt;, &lt;;F,1&gt;} {10,11} 2 {&lt;B, 1&gt;, &lt;C, 1&gt;, &lt;G, 2&gt;} {1,8} 2 Since 64=0, Ζ4=0, so End the generation process of the large project set. Continued above, assuming a minimum confidence level of 0.65, the quantitative association rule that can be generated using the corresponding information in Table 7 is as follows: {&lt;Α,1&gt;} ◊ {&lt;Β,2&gt ;} Confidence level = 2/3=0. 67 {&lt;Α, [2.. 3]&gt;} ◊ {&lt;C,[3.. 4]&gt;}Confidence level=3/3=1 { &lt;B,[3. .4]〉} ◊ {C,[1..2]} Confidence level = 2/3=0. 67 Confidence level = 2/2=1 {&lt;D,1&gt;丨◊ {&lt;C, [1..2]〉} {<B, 2&gt;,&lt;C, [3.. 4]&gt;} ◊ {&lt;F,1&gt;}Confidence level = 2/3=0 . 67 {&lt;B,2&gt;,&lt;F,1&gt;丨◊ {&lt;C,[3.. 4]&gt;丨Confidence level=2/2=1 {&lt;C,[3.. 4 ]&gt;,&lt;F, 1&gt;} ◊ {&lt;B,2&gt;}Confidence level=2/2=1 {&lt;B,1&gt;,&lt;C,[1. · 2]&gt;丨◊ {&lt;G,[3.. 5]&gt;}Confidence level=2/2=1 22 201003434 {&lt;B'1&gt;,&lt;G,[3. .5]〉} ◊ {&lt;C,[1..2]&gt;}Confidence level=2/2=1 {&lt;C,[1.. 2]&gt;, &lt;G,[3. · 5]&gt;} ◊ {&lt;β,;[〉丨 confidence level=2/3=〇. 67, the application of the above related correlation law exploration, and finally the statistics of association analysis, To obtain the conclusion of the target data mining, the correlation analysis process can at least reach the effective information profile: (1) The association rule that satisfies the minimum support level and the minimum confidence level does not necessarily guarantee that the specific useful information can be provided. In the database containing 10,000 transaction records, there are 6 transactions including slaves, 7500 transactions including report paper, and a transaction containing both toner and report paper. Assuming a minimum support of 3〇% The minimum confidence level is 60%. The following correlation method will be misleading: Toner Report Paper [Support = 40%, Confidence Level = 67%] ◎ The probability of purchasing report paper is 75%, which is greater than the confidence level of 67%. However, buying toner cartridges reduces the possibility of purchasing report paper. (2) If = eight j) eight fishing, it means that the occurrence of project set j is independent of the occurrence of the project set, otherwise it means that the project set J and the project set are related (dependent and correlated). ◎ The degree of correlation between project set 4 and project set (correlati〇n) is calculated as follows:

CorrA,B = PCAuB) / ΚΑ) Ι\β) 若correlation &lt; 1,表示j和方是逆相關(negatively correlated),亦即J的出現會造成万出現的機率降低。 若correlation &gt; !,表示j和方是正相關(p〇sitively correlated),亦即J的出現會造成万出現的機率增高。 23 201003434 若correlation = 1,表示j和厶是無關的。 當相關程度大於1時,表示使用法則的效果會比較顯著。但是當 相關程度小於1時’則表示使用法則的效果可能不佳。 綜上所述,本發明透過人力銀行網路為基礎的求才求職平台, 以依靠網雜速的效益,配合政府搶救失料,協助求職者進入良 好的企業與幫助徵才企業能以最快速度找到所需要的人才,藉以滿 足各行各業對於人力的需求,以及讓失業或待業中的人能盡快的找 到所想的工作’該資料挖掘實際應用係依照資料挖掘的的步驟下去 制定-套實作計晝’以進行求才條件關聯的分析。首先定義問題及 需求;在現有的求職管道中,網路人力銀行主要是以有使用網路的 人口為主要縣’祕有麟傳統的械管道,所財才的企業與 求職的人所要具備本能力可能有別於—般傳統求職管道的企業 與人才。因此資料挖掘的目的在於分析人力銀行資料庫中的求才條 件關聯’分析出經常出現的條件或ff—起重翻現的條件,並將 結果搭配學校_規劃,_銀行則仰縣果翻於求才上。 以下應用貫關之資料麵係由人力綺資料庫所提供,資料性斯 為資訊類裡喻式設計、⑽設計、瓣的^ 探勘之要求條件,騎施步下: 的人才為 a 置處理’稍所需要的欄位:人力銀行資料庫所 ^、貝料胃中她多攔位是不適用或是需要先經過事 處理之後才能使用的。本發明採用的攔位有 0/1 201003434 A_CodeNameA &gt; B_CodeNameA ^ C_CodeNameA ^ D_CodeNameA &gt; _C〇deNameA,並將這五個攔位合併為一個,並給予編號。 (二)、過雜料’資料中有些紀錄的值為空格,有的誇大不實 或錯誤’在分析的過程中必須將這些雜訊資料排除在外, 以免分析出來的結果不準確。 (二)、關聯法則挖掘與規則分析’在相關資料與配備齊全的情 況下,依據所要挖掘的資料設定相關的數值,本研究以求 職條件所需具備的電腦專長為執行條件,經過篩選處理之 後的資料總共有8862筆》採用的軟體為SQL2〇〇5,使用 SQL2005的關聯法則程序處理篩選之後的資料,結果如下 於第三圖中所示項目集之相關内容定義: 最小支援度:此參數即是關聯規則中的最小支援度,支援 低於此參數值之物件將會被過濾,以MySQL 為例,MySQL在全部資料中總共出現981次。 項目集大小下限:項目集之物件數低於此數值者將會被過 遽。 顯示完整名稱:勾選此項,則項目集内容會顯示完整名稱 (包含資料表名稱)。 篩選項目集:在方格中輸入關鍵字後按「ENTER」,則會_ 選出包含此關鍵字之項目集。 25 201003434 顯示:可切換顯示屬性名稱以及值,如果覺得晝面會出現 「xx=現有的」很多餘,則可以切換至「只顯示屬性 名稱」。 最大資料列數:顯示檢視器所能顯示之項目集筆數。 又於第四圖中所示項目集之相關規則定義: 敢小機率.此參數即是關聯規則中的最小信心水準,彳古心 水準低於此數值之規則將會被過渡。 最低重要性:機率高不一定等於有意義的規則,舉例來 說’「有了 A則有80%的有B」這條規則到底 有沒有意義呢?如果隨機找一個企業求才 條件有B的機率只有20%,那麼這就是—條 有意義的規則(找到比較容有Β的族群),作 如果隨機企業有Β的機率高達90%時,此時 這條規則意義就不大。我們必須要比較在有 Α的條件下’發生Β的機率比例,由於這個 比例可能相當懸殊’因此,透過開對數的方 式來取付重要性指標,其公式如下: 重要性Μ =&gt;扪=在贿條件下發生乂的機率) '在沒有膽機率1¾¾¾ 26 201003434 根據公式’當「在B的條件下發生A的機率」高於「在沒 有B的條件下發生A的機率」時’則該對之後會大於零,且此 指標越大,則代表此規則越顯著。反之,小於零,則代表A對 於B的發生有抑制的作用。 而在SQL 2005中,關聯法則常用的「支持度」資料,並沒有直 接計算列表出來,必須切換至「項目集」頁面,如第三圖所示: (第二列資料MySQL與MSSQL同時出現次數為664,而資料筆數 共有8862筆,搭配「規則」選項的第一條(MSSQL_&gt;MySQL第四 圖)則可以解釋成: 「若需要MSSQL則也需要MySQL的支持度 =664/8862=0· 074926=7. 4926%,信賴度=40. 6%,重要性=0. 965」 且又在於第五圖中所示項目集之相關相依性網路定義: 该相依性網路是為了讓使用者了解變數之間關聯性的圖形檢 視。每一個箭頭連結代表著預測的關係(起點是輸入變數,箭頭 碥是輸出變數),同時可以根據每個箭頭連結的強弱來了解變數 之間預測關聯性的強度。在第五圖左邊圖示内容,我們可以透 過刻度切換來檢視關聯性較強的連結。也可以透過滑鼠點選, 並透過圖形顏色了解產品之間的相依性。 就此實施例欲麵之目社要為進行町四項過程: 資料收集-由人力銀行資料庫提供樣品資料。 27 201003434 2_資料探勘—選擇關聯法 關聯。 則的方式,是為了挖掘 徵才條件之間的 输物崎訊,瞭解產 解求職侔件貝^否付合目標問題的需求,當初所定義的是要瞭 条件之間的特性,實驗後的關聯法則和求職條件是否相 關,法則中的條件是否都是真正在求職時該公司所會首先考慮 到的’是否是一般公司所選擇的;從條件、職位、公司名稱、 應徵者資料等去分析關聯法則。如果發掘出來的法則不合理、 不正確或不符合’則需要回到前面的階段再重新進行。 4.分析與結論—依據關聯法則分析出來的條件特性,綜合整個求 才核境及公司偏好’提供應徵條件關聯給求職者作為參考。 上述詳細說明係針對本發明之可行實施例之具體說明,惟該實 施例並_以限制本發明之專利賴,凡未脫離本發簡藝精神所 為之等效實施或變更,均應包含於本案之專利範圍中。 為使本發明更加顯現其進步性與實用性,茲與習用作一比較分 析如下: 習用缺失 1、 使用者職涯目標達成效率差,職場變化調整反應過慢。 2、 對於現求職及剛就學者’其未來理想職涯規劃之調整導引稍嫌 不足。 本發明優點 28 201003434 具客觀準確預 l、提供使用者根據個人未來職涯規劃目標設定 測評估應用之優異性。CorrA, B = PCAuB) / ΚΑ) Ι\β) If correlation &lt; 1, it means that j and the square are negatively correlated, that is, the appearance of J will cause the probability of 10,000 to decrease. If correlation &gt; !, it means that j and the square are positively correlated (p〇sitively correlated), that is, the appearance of J will cause the probability of 10,000 to increase. 23 201003434 If correlation = 1, it means that j and 厶 are irrelevant. When the degree of correlation is greater than 1, it means that the effect of using the rule will be more significant. However, when the correlation is less than 1, it means that the effect of using the law may be poor. In summary, the present invention seeks a job-seeking platform based on the human banking network, relies on the efficiency of the network speed, cooperates with the government to rescue lost materials, assists job seekers to enter a good enterprise and helps the talented enterprise to be the fastest. To find the talents needed to meet the needs of manpower in all walks of life, and to let the unemployed or unemployed people find the work they want as quickly as possible. The data mining practical application is based on the steps of data mining. The actual calculations are based on the analysis of the conditional association. First of all, we define the problems and needs. In the existing job-seeking channels, the network manpower bank mainly uses the population of the Internet as the main county, and the enterprises with the wealth and the job seekers must have the The ability may be different from the traditional companies and talents in the traditional job search pipeline. Therefore, the purpose of data mining is to analyze the conditions of the talents in the human bank database. 'Analyzing the conditions that often occur or the conditions of ff-lifting, and matching the results with the school_planning, _ bank is turning the county Seeking talent. The following application data is provided by the human resources database. The data is for informational design, (10) design, and the requirements for the exploration of the valve. A slightly needed field: the human bank database, and her multiple barriers in the stomach are not applicable or need to be processed before they can be used. The block used in the present invention has 0/1 201003434 A_CodeNameA &gt; B_CodeNameA ^ C_CodeNameA ^ D_CodeNameA &gt; _C〇deNameA, and combines the five blocks into one and gives the number. (2) Some of the records in the data of the miscellaneous materials are blank, some are exaggerated or wrong. In the process of analysis, these noise data must be excluded to avoid inaccurate analysis. (II) Association Rules Mining and Rule Analysis 'In the case of relevant data and complete equipment, according to the data to be mined, the relevant values are set. The computer expertise of the job conditions required for this job is the execution condition. After screening, The total data of 8862 is SQL2〇〇5. The SQL2005 correlation rule program is used to process the filtered data. The results are as follows in the definition of the project set shown in the third figure: Minimum support: This parameter That is, the minimum support level in the association rule. Objects that support values below this parameter will be filtered. Taking MySQL as an example, MySQL has a total of 981 times in all the data. Minimum item size limit: The number of items in the item set below this value will be exceeded. Show full name: Check this box to display the full name (including the data table name) of the project set content. Filter the item set: Press ENTER after entering the keyword in the box, then _ select the item set containing this keyword. 25 201003434 Display: You can switch the display attribute name and value. If you feel that there is a lot of "xx=existing" in the face, you can switch to "Show only attribute name". Maximum number of data columns: Displays the number of items that the viewer can display. The definition of the relevant rules of the project set shown in the fourth figure: Dare to take the opportunity. This parameter is the minimum confidence level in the association rule, and the rule that the ancient heart level is lower than this value will be transitioned. The minimum importance: the high probability does not necessarily mean a meaningful rule. For example, “Is there 80% of B with A.” Does this rule have any meaning? If you randomly find a company to find a condition with a chance of B is only 20%, then this is a meaningful rule (find a group that is more prone to ambiguity), if the probability of a stochastic enterprise is as high as 90%, then this time The rules are not meaningful. We must compare the probability of occurrence of sputum under the ambiguous conditions, because this ratio may be quite disparate. Therefore, by taking the logarithm to pay the importance index, the formula is as follows: Importance Μ => 扪 = The probability of embarrassment under bribery conditions] 'In the absence of a gallbladder rate 13⁄43⁄43⁄4 26 201003434 According to the formula 'When the probability of A occurring under B conditions is higher than the probability of A occurring under the condition of B,' then the pair It will then be greater than zero, and the larger the metric, the more significant this rule is. Conversely, if it is less than zero, it means that A has an inhibitory effect on the occurrence of B. In SQL 2005, the "support" data commonly used by association rules is not directly calculated. You must switch to the "Project Set" page, as shown in the third figure: (The second column data MySQL and MSSQL occur simultaneously It is 664, and the total number of data is 8862. The first item of the "rule" option (MSSQL_&gt; MySQL fourth picture) can be interpreted as: "If you need MSSQL, you also need MySQL support =664/8862=0 · 074926=7. 4926%, reliability = 40.6%, importance=0. 965" and related to the related dependency network definition of the project set shown in the fifth figure: The dependency network is for The user understands the graphical view of the correlation between the variables. Each arrow link represents the predicted relationship (the starting point is the input variable, the arrow 碥 is the output variable), and the predictive association between the variables can be understood based on the strength of each arrow link. The intensity of the sex. On the left side of the fifth picture, we can use the scale switch to view the more relevant links. You can also click through the mouse to understand the dependencies between products through the graphic color. The purpose of this embodiment is to carry out the four processes of the town: data collection - sample data provided by the human bank database. 27 201003434 2_ data exploration - selection of association method. The way is to explore the conditions of recruitment Between the exchanges and the news, I understand the need for the target problem. The original definition is to determine the characteristics of the conditions, and the post-experimental association rules and job-seeking conditions are related. Whether the conditions are the ones that the company will first consider when actually seeking a job is whether it is selected by the general company; analyze the law of association from the conditions, position, company name, applicant information, etc. If the law is unreasonable , incorrect or non-conformity, then need to go back to the previous stage and then re-run. 4. Analysis and conclusions - based on the conditional characteristics analyzed by the association law, comprehensively the entire talent and the company's preferences 'provide the application conditions associated with the job seeker The above detailed description is directed to specific embodiments of the invention, which are intended to limit the invention Lilai, the equivalent implementation or modification of the present invention should be included in the patent scope of this case. In order to make the invention more progressive and practical, a comparative analysis is used as follows. : Lack of usage 1. The user's career goal is poor, and the workplace change response is too slow. 2. For the current job search and just the scholar's adjustment guide for the future ideal career planning is not enough. Advantages of the present invention 28 201003434 With objective and accurate pre-l, the user is provided with the superiority of the measurement and evaluation application according to the individual's future career planning goals.

結合資料挖掘理論之推究分析,適合教 涯趨勢之參考。 職使用者導引學生職 綜上所述,本發明在突破先前之技術結構下,確實已達到所 欲增進之功效,且也非熟顏抛藝麵胁思及,再者 明申請前未f公開,其所具之進步性、實職,顯已符合發明專 利之申請要件’爰依法提出申請,懇請貴局核准本件發明^利申 請案’以勵發明,至感德便。 【圖式簡單說明】 圖式簡單說明 |習知求職網站求才條件公告内容示意圖。 f了圖係本發明技術實施流程示意圖。 系本發明實施例項目集之相關内容定義。 ίϋ巧本發明實施例項目集之相關規則定義。 第五圖係本發明實施例項目集之相關相依性網路定義 【主要元件符號說明】 10資料選擇 20資料淨化 豐富化 40編石馬 50資料挖掘 報告評估 29Combining the data mining theory with the analysis of the data, it is suitable for the reference of the trend of the world. The user guides the student's job description. The invention has achieved the desired effect under the previous technical structure, and it is not familiar with the face and the face, and the application is not before the application. Public, its progressive, practical, has been in line with the application requirements of the invention patent '爰 apply in accordance with the law, I ask you to approve the invention of the invention ^ application" to encourage invention, to the sense of virtue. [Simple description of the schema] Simple description of the schema | Schematic diagram of the content of the announcement of the job-seeking website. f is a schematic diagram of the implementation process of the technology of the present invention. The definition of related content of the project set of the embodiment of the present invention.相关 The definition of the relevant rules of the project set of the embodiment of the present invention. The fifth figure is the definition of the relevant dependency network of the project set of the embodiment of the present invention. [Description of main component symbols] 10 data selection 20 data purification enrichment 40 series stone horse 50 data mining report evaluation 29

Claims (1)

201003434 十、申請專利範園: 卜-種人力資料賴掘_法麻其系統,其處理流程步驟為: -資料選擇’其係指·前置處理,首先選擇所料搁位; -資料淨化,再將資料過遽’在資料t有些紀錄的值為空格,有的 绔大不實或錯誤,在分析的過程中必須將這些雜訊資料排除在 外’以免分析出來的結果不準確; 一豐富化,透過從各個不同資料庫龍的收集,進而產生的關聯 法則是否符合目標問題的需求; 一編碼’在相關資料與配備齊全的情況下,依據所要挖掘的資料 並設定相關的數值參數; 一資料挖掘’結她梅_咖ning)的猶,分·才條件之 間的關係,了解企鱗制人相求職條件之__,期望 發現有用的重要潛在關係; 及一報告評估’依據關聯法則分析出來的條件特性,综合整個求 才環境及公司偏好,提供應徵條件_給求職者作為參考。 所述之人力資料庫挖掘_關 換不同類型資料糊位之聯結。 。併及轉 、如5青求項1所述之人力資料庫挖掘關躺、本目,丨另甘么 化流程步驟中主要透過不同網段^統,其中於豐富 又聯結溝通,如網際網路 30 201003434 :_w(Extranet) '她猶(ί咖触)狀訊息互 通、傳輸,猎以即時獲取回應目標資料之收集。 ^掘人力胃料庫挖掘獅法财其細,其中於資料 、結合個—組以上之_法則探勘技術,此處 ===___包括,__勘、多層姻聯法則 才木勘及數置化關聯法則探勘。 5、如請求項4所述之人力資料庫挖掘關聯法則及其系統,其中關聯法 則探勘的技術主要為處理以下兩個子問題: 一、首先,找出所有支大於或等於最小趙度的項目集,稱之 為大型項目集” (large itemset); -、接著,從大型項目射產生信心水準大於或等於最小信心水準 的關聯法則。 ‘如請求項4所述之人力資料庫挖掘關聯法則及其系統,其中多層次 關聯法則探勘的技術特徵精神;在於使用由上而下(t〇p d〇wn)的處 理方式,先計算在層次1 〇evel-l)的大型項目集,接著計算在層 次2 (leve卜2)的大型項目集,依此類推下去,直到沒有其它的大 型項目集產生為止;在每一個層次中,可以使用Apri〇ri演算法來產 生大型項目集。 31201003434 X. Applying for a patent garden: Bu-type manpower data _ _ Ma Maqi system, the process steps are: - data selection 'the system refers to the pre-processing, first select the expected position; - data purification, Then the information is too 遽 'in the data t some records of the value of the space, some of the big or false or wrong, in the process of analysis must be excluded from these noise data 'to avoid the analysis of the results are not accurate; Through the collection of different data bases, the related rules are in line with the needs of the target problem; a code 'in the case of relevant information and complete equipment, according to the data to be mined and set the relevant numerical parameters; Exploring the relationship between the conditions of judging, judging, and qualifying, understanding the conditions of job-seeking conditions, and expecting to find useful and important potential relationships; and a report evaluation 'analysis based on association rules The characteristics of the conditions that come out, the overall requirements of the environment and the company's preferences, provide the conditions for application _ to the job seeker as a reference. The manpower database described above mines _ the connection of different types of data pastes. . And transfer, such as the 5 human resources database described in the human resources database to lie down, the original, the other process steps are mainly through different network segments, which are rich and connected communication, such as the Internet 30 201003434 :_w(Extranet) 'She is still in the form of intercommunication, transmission, and hunting to collect the target data in real time. Dig the human stomach library to mine the lion's fortune, which is based on the data, combined with the _ rule exploration technology above, the group ===___ including, __ prospect, multi-level marriage rule only wood survey and number The association rule is explored. 5. The human resource database mining association rule and system thereof as claimed in claim 4, wherein the technology of the association law exploration mainly deals with the following two sub-problems: 1. First, find out all items with a branch greater than or equal to the minimum Zhao degree. Set, called "large itemset"; -, then, from a large project, the law of confidence that the confidence level is greater than or equal to the minimum confidence level. 'The human resources database mining association law as described in claim 4 and The system, in which the multi-level association rule explores the technical characteristics of the spirit; it uses the top-down (t〇pd〇wn) processing method, first calculates the large-scale project set at level 1 〇evel-l), and then calculates the level 2 (leve 2) large project sets, and so on, until no other large project sets are produced; in each level, the Apri〇ri algorithm can be used to generate large project sets.
TW97125638A 2008-07-08 2008-07-08 Manpower data mining association rule and the system thereof TW201003434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97125638A TW201003434A (en) 2008-07-08 2008-07-08 Manpower data mining association rule and the system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97125638A TW201003434A (en) 2008-07-08 2008-07-08 Manpower data mining association rule and the system thereof

Publications (1)

Publication Number Publication Date
TW201003434A true TW201003434A (en) 2010-01-16

Family

ID=44825563

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97125638A TW201003434A (en) 2008-07-08 2008-07-08 Manpower data mining association rule and the system thereof

Country Status (1)

Country Link
TW (1) TW201003434A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI426395B (en) * 2010-12-01 2014-02-11 Inventec Corp System for displaying query result using relation graph representation and method thereof
TWI490704B (en) * 2013-03-07 2015-07-01 Univ Southern Taiwan Sci & Tec Related vocabulary generation system and method
TWI646477B (en) * 2017-08-03 2019-01-01 崑山科技大學 Method for applying exploration technology into work schedule for operators

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI426395B (en) * 2010-12-01 2014-02-11 Inventec Corp System for displaying query result using relation graph representation and method thereof
TWI490704B (en) * 2013-03-07 2015-07-01 Univ Southern Taiwan Sci & Tec Related vocabulary generation system and method
TWI646477B (en) * 2017-08-03 2019-01-01 崑山科技大學 Method for applying exploration technology into work schedule for operators

Similar Documents

Publication Publication Date Title
Sagiyeva et al. Intellectual input of development by knowledge-based economy: problems of measuring in countries with developing markets
Tiba et al. Firms with benefits: A systematic review of responsible entrepreneurship and corporate social responsibility literature
Shi et al. A novel approach for reducing attributes and its application to small enterprise financing ability evaluation
Jemala Evolution of foresight in the global historical context
Olorunnimbe et al. Deep learning in the stock market—a systematic survey of practice, backtesting, and applications
Placier The impact of recession on the implementation of corporate social responsibility in companies
Xiao et al. Investors' inertia behavior and their repeated decision-making in online reward-based crowdfunding market
Kádárová et al. The proposal of an innovative integrated BSC–DEA model
He et al. Processing trade and energy efficiency: Evidence from Chinese manufacturing firms
Li et al. Philanthropy, political connection and debt finance: reciprocal behavior of governments and private enterprises
Lien et al. Using institutional grammar to improve understanding of the form and function of payment for ecosystem services programs
Naderpajouh et al. Exploratory framework for application of analytics in the construction industry
Francioli et al. Exploring the blurred nature of strategic linkages across the BSC: The relevance of “loose” causal relationships
Shahzad et al. Emerging interaction of artificial intelligence with basic materials and oil & gas companies: A comparative look at the Islamic vs. conventional markets
Madeira et al. Assessing some models for city e-government implementation: a case study
Gan XGBoost‐Based E‐Commerce Customer Loss Prediction
Li et al. Capturing and analyzing e-WOM for travel products: A method based on sentiment analysis and stochastic dominance
Deng et al. Firm-level carbon risk awareness and green transformation: a research on the motivation and consequences from government regulation and regional development perspective
Zhou et al. Are Large Language Models Rational Investors?
TW201003434A (en) Manpower data mining association rule and the system thereof
Olofsson Mining futures: predictions and uncertainty in Swedish mineral exploration
Grigore et al. The impact of sentiment-driven feedback on knowledge reuse in online communities
Feng et al. State capacity and innovation policy performance: A comparative study on two types of innovation projects in China
Lerner et al. Appropriate entrepreneurship? The rise of China and the developing world
Montenegro et al. Impact of Construction Project Managers’ Emotional Intelligence on Project Success. Sustainability 2021, 13, 10804