TWI807780B

TWI807780B - Turnover rate prediction method and electronic apparatus thereof

Info

Publication number: TWI807780B
Application number: TW111114385A
Authority: TW
Inventors: 陳沛瑜
Original assignee: 和碩聯合科技股份有限公司
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2023-07-01
Also published as: TW202343317A

Abstract

A turnover rate prediction method and an electronic apparatus thereof are provided. The turnover rate prediction method includes the following steps. An employee data set corresponding to employees is obtained. The employees are grouped in multiple subsets based on the employee data set and an influencing factor, and multiple survival curves corresponding to the subsets are calculated respectively. The survival curves are grouped in multiple curve groups based on similarity of the survival curves to obtain a plurality of curve groups. A plurality of clustered survival curves corresponding to the curve groups are calculated respectively, and at least one segmentation time point is obtained based on slope changes of the clustered survival curves. A proportional hazard model is used to calculate a turnover rate prediction result of the employees corresponding to the influencing factor based on the segmentation time point.

Description

Turnover rate prediction method and its electronic device

本揭示是有關於一種統計分析機制，且特別是有關於一種離職率預測方法及其電子裝置。 The disclosure relates to a statistical analysis mechanism, and in particular to a turnover rate prediction method and its electronic device.

一般而言，離職率會隨著年資的不同或其他重要變因的影響而不同。故，倘若直接基於一種變因來判斷離職率並無來獲得客觀的評估結果。例如，住宿員工的離職率比未住宿員工的離職率高出多倍。但實際上，離職率是會隨著在職時間不同而改變。例如，初期無住宿舍比較容易離職，但當在職超過一定時間後，有無宿舍對於是否離職就會較沒影響。據此，關於連續型的變因需將其進行類別化，而如何進行合適的切分則是待考量因素。 In general, turnover rates vary with seniority or other important variables. Therefore, it is not possible to obtain objective evaluation results if the turnover rate is judged directly based on a variable factor. For example, the turnover rate of housing employees is many times higher than that of non-housing employees. But in fact, the turnover rate will change with the length of employment. For example, it is easier to leave a job without a dormitory at the beginning, but after a certain period of time, having a dormitory will have little effect on whether to leave the job. Accordingly, continuous variables need to be categorized, and how to properly divide them is a factor to be considered.

本揭示提供一種離職率預測方法及其電子裝置，可提高分析結果的參考價值。 This disclosure provides a turnover rate prediction method and its electronic device, which can improve the reference value of the analysis results.

本揭示的離職率預測方法，適以藉由處理器來執行，所述離職率預測方法包括：取得對應於複數個員工的員工資料集；依據員工資料集及影響變因對所述員工進行分群至複數個子集合，並計算這些子集合分別對應的多個存活曲線；依據所述存活曲線的相似度將所述存活曲線進行分群至複數個曲線群；計算這些曲線群分別對應的複數個分群後存活曲線，並依據所述分群後存活曲線的斜率變化取得至少一切分時間點；以及基於所述切分時間點，利用比例風險模型計算所述員工對應於影響變因之離職率預測結果。 The turnover rate prediction method disclosed in this disclosure is suitable for execution by a processor, the said The turnover rate prediction method includes: obtaining employee data sets corresponding to a plurality of employees; grouping the employees into multiple subsets according to the employee data sets and influencing factors, and calculating multiple survival curves respectively corresponding to these subsets; grouping the survival curves into multiple curve groups according to the similarity of the survival curves; calculating the plurality of grouped survival curves corresponding to these curve groups respectively, and obtaining at least a split time point according to the slope change of the survival curve after the grouping; The employee corresponds to the predicted result of the turnover rate of the influencing variable.

本發明的用於存活率分析的電子裝置，包括：儲存器，儲存至少一程式碼指令；處理器，耦接至儲存器以執行至少一程式碼指令來實現所述存活率分析方法離職率預測方法。 The electronic device for survival rate analysis of the present invention includes: a memory storing at least one program code instruction; a processor coupled to the memory to execute at least one program code instruction to implement the survival rate analysis method and turnover rate prediction method.

基於上述，本揭示可將變因進行合理切分再進行評估，據此可提高分析結果的參考價值。 Based on the above, this disclosure can reasonably divide the variables before evaluating them, thereby improving the reference value of the analysis results.

100:電子裝置 100: Electronic device

110:處理器 110: Processor

120:儲存器 120: storage

121:資料庫 121: database

S205~S225:離職率預測方法的步驟 S205~S225: Steps of the turnover rate prediction method

301~306:存活曲線 301~306: Survival curve

401~403:分群後存活曲線 401~403: Survival curve after clustering

T1、T2:切分時間點 T1, T2: Segmentation time points

圖1是依照本揭示一實施例的電子裝置的方塊圖。 FIG. 1 is a block diagram of an electronic device according to an embodiment of the disclosure.

圖2是依照本揭示一實施例的離職率預測方法的流程圖。 FIG. 2 is a flowchart of a method for predicting turnover rate according to an embodiment of the present disclosure.

圖3是依照本揭示一實施例的存活曲線的示意圖。 FIG. 3 is a schematic diagram of survival curves according to an embodiment of the present disclosure.

圖4是依照本揭示一實施例的分群後存活曲線的示意圖。 FIG. 4 is a schematic diagram of survival curves after clustering according to an embodiment of the present disclosure.

圖1是依照本揭示一實施例的電子裝置的方塊圖。請參照圖1，電子裝置100包括處理器110以及儲存器120。處理器110耦接至儲存器120。儲存器120中包括資料庫121及至少一程式碼指令，資料庫121中儲存有對應多個員工的員工資料集。 FIG. 1 is a block diagram of an electronic device according to an embodiment of the disclosure. Please refer to FIG. 1 , the electronic device 100 includes a processor 110 and a storage 120 . The processor 110 is coupled to the storage 120 . The storage 120 includes a database 121 and at least one code instruction, and the database 121 stores employee data sets corresponding to a plurality of employees.

處理器110例如為中央處理單元(Central Processing Unit，CPU)、物理處理單元(Physics Processing Unit，PPU)、可程式化之微處理器(Microprocessor)、嵌入式控制晶片、數位訊號處理器(Digital Signal Processor，DSP)、特殊應用積體電路(Application Specific Integrated Circuits，ASIC)或其他類似裝置。 The processor 110 is, for example, a central processing unit (Central Processing Unit, CPU), a physical processing unit (Physics Processing Unit, PPU), a programmable microprocessor (Microprocessor), an embedded control chip, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuits, ASIC) or other similar devices.

儲存器120例如是任意型式的固定式或可移動式隨機存取記憶體(Random Access Memory，RAM)、唯讀記憶體(Read-Only Memory，ROM)、快閃記憶體(Flash memory)、硬碟或其他類似裝置或這些裝置的組合。儲存器120用以儲存資料庫121以及一或多個程式碼片段，上述程式碼片段在被安裝後，會由處理器110來執行下述離職率預測方法。 The storage 120 is, for example, any type of fixed or removable random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), flash memory (Flash memory), hard disk or other similar devices or a combination of these devices. The storage 120 is used for storing the database 121 and one or more code fragments. After the above code fragments are installed, the processor 110 will execute the following turnover rate prediction method.

圖2是依照本揭示一實施例的離職率預測方法的流程圖。請參照圖2，在步驟S205中，取得對應於複數個員工的員工資料集。接著，在步驟S210中，依據員工資料集及影響變因將所述員工分群至複數個子集合，並計算所述子集合分別對應的複數個存活曲線。於一些實施例中，影響變因是影響離職率的變因，且影響變因是連續型變數，例如為年齡、年資、加班時數或薪資等具有連續性特質的變因。 FIG. 2 is a flowchart of a method for predicting turnover rate according to an embodiment of the present disclosure. Please refer to FIG. 2 , in step S205 , employee data sets corresponding to a plurality of employees are acquired. Next, in step S210, the employees are grouped into a plurality of subsets according to the employee data sets and influencing variables, and a plurality of survival curves corresponding to the subsets are calculated. In some embodiments, the influencing variable is a variable affecting the turnover rate, and the influencing variable is a continuous variable, such as age, seniority, overtime hours or salary, and other variables with continuous characteristics.

具體而言，處理器110基於影響變因對應的數值範圍來劃分多個區間，並將員工資料集劃分對應至所述多個區間的多個子集合。在一實施例中，可利用一個設定值v來劃分出多個區間。所述區間例如為：<N-2×v、N-2×v~N-v、N-v~N、N~N+v、N+v~N+2×v、>N+2×v，其中N為基準值，其為大於2×v的數值。例如，N=35000，v=5000，則劃分出下述區間：<25000、25000~30000、30000~35000、35000~40000、40000~45000、>45000。然，在此僅為舉例說明，並不以此限。 Specifically, the processor 110 divides multiple intervals based on the numerical ranges corresponding to the influencing variables, and divides the employee data set into multiple subsets corresponding to the multiple intervals. In an embodiment, a set value v can be used to divide multiple intervals. The intervals are, for example: <N-2×v, N-2×v~N-v, N-v~N, N~N+v, N+v~N+2×v, >N+2×v, wherein N is a reference value, which is a value greater than 2×v. For example, if N=35000 and v=5000, the following intervals are divided: <25000, 25000~30000, 30000~35000, 35000~40000, 40000~45000, >45000. However, this is only for illustration, not limitation.

接著，處理器110將員工資料集劃分為對應至各區間的子集合。舉例來說，假設影響變因設定為“薪資”，單位為“元”，可將薪資劃分出下述6個區間：<25000(對應至子集合A1)、25000~30000(對應至子集合A2)、30000~35000(對應至子集合A3)、35000~40000(對應至子集合A4)、40000~45000(對應至子集合A5)、>45000(對應至子集合A6)，且員工資料集包括各員工的在職天數、薪資、年齡、加班時數等資料。基於上述薪資的區間，將員工資料集劃分為6個資料群，即，薪資<25000的資料群、薪資位於25000~30000之間的資料群、薪資位於30000~35000之間的資料群、薪資位於35000~40000之間的資料群、薪資位於40000~45000之間的資料以及薪資大於450000的資料群。而所述6個資料群即分別為子集合A1~A6。 Next, the processor 110 divides the employee data set into sub-sets corresponding to each section. For example, assuming that the influencing variable is set to "salary" and the unit is "yuan", the salary can be divided into the following six intervals: <25000 (corresponding to sub-set A1), 25000-30000 (corresponding to sub-set A2), 30000-35000 (corresponding to sub-set A3), 35000-40000 (corresponding to sub-set A4), 40000-45000 (corresponding to sub-set Set A5), >45000 (corresponding to sub-set A6), and the employee data set includes the working days, salary, age, overtime hours and other data of each employee. Based on the salary range above, the employee data set is divided into 6 data groups, namely, the data group whose salary is <25,000, the data group whose salary is between 25,000 and 30,000, the data group whose salary is between 30,000 and 35,000, the data group whose salary is between 35,000 and 40,000, the data whose salary is between 40,000 and 45,000, and the data group whose salary is greater than 450,000. The six data groups are respectively sub-sets A1-A6.

在獲得子集合之後，處理器110利用存活分析演算法分別計算這些子集合對應的多個存活曲線。存活分析演算法例如為 Kaplan-Meier法。詳細來說，處理器110利用存活分析演算法分別在子集合A1~A6中計算不同時間點的存活率，以獲得對應的存活曲線。 After obtaining the subsets, the processor 110 uses a survival analysis algorithm to calculate a plurality of survival curves corresponding to the subsets. Survival analysis algorithms such as Kaplan-Meier method. In detail, the processor 110 uses a survival analysis algorithm to calculate the survival rates at different time points in the subsets A1 - A6 to obtain corresponding survival curves.

底下以表1為例來進行說明。在表1中，分別對應至6個區間(<N-2×v、N-2×v~N-v、N-v~N、N~N+v、N+v~N+2×v、>N+2×v)獲得子集合A1~A6，並分別針對各子集合來計算出在時間點t1~tn所對應的存活率。以子集合A1而言，在其對應的區間<N-2×v的資料群中，計算在時間點t1~tn對應的存活率S(A1,t1)~S(A1,tn)，其餘亦以此類推，獲得子集合A1~A6各自對應於時間點t1~tn的多個存活率，進而獲得對應的存活曲線。 Table 1 is taken as an example below for illustration. In Table 1, corresponding to six intervals (<N-2×v, N-2×v~N-v, N-v~N, N~N+v, N+v~N+2×v, >N+2×v) to obtain sub-sets A1~A6, and calculate the corresponding survival rate at time point t1-tn for each sub-set. Taking the subset A1 as an example, in the data group corresponding to the interval <N-2×v, calculate the survival rate S(A1,t1)~S(A1,tn) corresponding to the time point t1~tn, and the rest can be deduced by analogy to obtain multiple survival rates corresponding to the time point t1~tn in each of the subsets A1~A6, and then obtain the corresponding survival curve.

底下再舉例來說明如何計算存活率。 Below is an example to illustrate how to calculate the survival rate.

請參照表2，表2例如是以子集合A1及其對應區間<N- 2×v的資料群進行說明。在子集合A1對應的區段中設置多個時間點t1~tn，並逐一針對各時間點而自對應的資料群中進行查詢，而獲得在時間點t-1時的在職人數I_t-1及在時間點t-1~時間點t內的離職人數d_t。 Please refer to Table 2. Table 2 is, for example, described with the data group of subset A1 and its corresponding interval<N−2×v. Set multiple time points t1~tn in the section corresponding to the subset A1, and query each time point from the corresponding data group one by one to obtain the number of employees I _t-1 at time point t-1 and the number of people leaving the job d t within time point t-1~time point _t .

接著，利用下述公式來計算在時間點t-1~時間點t內的離職率H(t)：H(t)=d_t/I_t-1。 Next, use the following formula to calculate the turnover rate H(t) from time point t-1 to time point t: H(t)=d _t /I _t-1 .

然後，再利用下述公式來計算時間點t對應的存活率S(t)：S(t)=S(t-1)×(1-H(t))，其中S(0)=1。 Then, the following formula is used to calculate the survival rate S(t) corresponding to the time point t: S(t)=S(t-1)×(1-H(t)), where S(0)=1.

以時間點t1而言，查詢區間<N-2×v的資料群(子集合A1)，而獲得在時間點t0(=0)時的在職人數I_t0(=300)及在時間點t0~t1內的離職人數d_t1(=98)。之後，計算出離職率(t1)=0.33 (98/300)。最後算出存活率S(t1)=1×(1-0.33)=0.673。以此類推來獲得表2中的各數值。 Taking the time point t1 as an example, query the data group (subset A1) in the interval <N-2×v, and obtain the number of employees I _t0 (=300) at the time point t0 (=0) and the number of people leaving the job d _t1 (=98) within the time point t0~t1. After that, the turnover rate (t1)=0.33 (98/300) is calculated. Finally, the survival rate S(t1)=1×(1-0.33)=0.673 was calculated. The values in Table 2 are obtained by analogy.

在算出各時間點對應的存活率之後，便可以時間作為橫軸，存活率為縱軸來繪示出對應的存活曲線。圖3是依照本揭示一實施例的存活曲線的示意圖。在圖3中，存活曲線301~306分別對應至子集合A1~A6。 After calculating the survival rate corresponding to each time point, the corresponding survival curve can be drawn with time as the horizontal axis and the survival rate as the vertical axis. FIG. 3 is a schematic diagram of survival curves according to an embodiment of the present disclosure. In FIG. 3 , the survival curves 301 - 306 correspond to the subsets A1 - A6 respectively.

在獲得各子集合對應的存活曲線之後，在步驟S215中，依據所述存活曲線的相似度將所述存活曲線分群至複數個曲線群。在此，各曲線群包括至少一個存活曲線。例如，可利用階層式分群法(hierarchical clustering)或K平均(K-means)法等分類演算法來進行分群。 After the survival curves corresponding to each subset are obtained, in step S215, the survival curves are grouped into a plurality of curve groups according to the similarity of the survival curves. Each curve group here includes at least one survival curve. For example, classification algorithms such as hierarchical clustering or K-means method can be used for clustering.

以圖3而言，將存活曲線301~306分別視為是一個矩陣，藉此帶入分類演算法來進行分群。例如，存活曲線301、302(對應至子集合A1、A2)被分群至曲線群G1，存活曲線303、304(對應至子集合A3、A4)被分群至曲線群G2，存活曲線305、306(對應至子集合A5、A6)被分群至曲線群G3。故，基於各子集合對應的區間，可獲得曲線群G1~G3對應的區間分別為：<N-v、N-v~N+v、>N+v，如下述表3所示。 Referring to FIG. 3 , the survival curves 301 to 306 are respectively regarded as a matrix, which is brought into a classification algorithm for grouping. For example, survival curves 301, 302 (corresponding to subsets A1, A2) are grouped into curve group G1, survival curves 303, 304 (corresponding to subsets A3, A4) are grouped into curve group G2, survival curves 305, 306 (corresponding to subsets A5, A6) are grouped into curve group G3. Therefore, based on the intervals corresponding to each subset, the intervals corresponding to the curve groups G1~G3 can be obtained respectively: <N-v, N-v~N+v, >N+v, as shown in Table 3 below.

之後，在步驟S220中，計算所述曲線群分別對應的複數個分群後存活曲線，並依據所述分群後存活曲線的斜率變化取得至少一切分時間點。 After that, in step S220, calculate the plurality of post-group survival curves respectively corresponding to the curve groups, and obtain at least one split time point according to the slope change of the post-group survival curves.

處理器110將員工資料集劃分為對應至曲線群G1~G3 各自的資料群。之後，利用存活分析演算法在每一個資料群中計算多個時間點所對應的多個存活率，以獲得對應於每一曲線群的分群後存活曲線。在此，存活率的計算可參照上述表2的說明。 The processor 110 divides the employee data set into curve groups G1-G3 corresponding to respective data groups. Afterwards, a survival analysis algorithm is used to calculate multiple survival rates corresponding to multiple time points in each data group, so as to obtain a grouped survival curve corresponding to each curve group. Here, the calculation of the survival rate can refer to the description in Table 2 above.

以曲線群G1而言，在其對應的區間<N-v的資料群中，計算在時間點t1~tn對應的存活率S(G1,t1)~S(G1,tn)，其餘亦以此類推，獲得曲線群G1~G3各自對應於時間點t1~tn的多個存活率，進而獲得對應的分群後存活曲線。 Taking the curve group G1 as an example, in the data group corresponding to the interval <N-v, calculate the survival rate S(G1,t1)~S(G1,tn) corresponding to the time point t1~tn, and the rest can be deduced by analogy to obtain multiple survival rates of the curve group G1~G3 corresponding to the time point t1~tn, and then obtain the corresponding survival curve after grouping.

舉例來說，圖4是依照本揭示一實施例的分群後存活曲線的示意圖。在圖4中，分群後存活曲線401~403分別對應至曲線群G1~G3。 For example, FIG. 4 is a schematic diagram of survival curves after grouping according to an embodiment of the present disclosure. In FIG. 4 , survival curves 401 to 403 after grouping correspond to curve groups G1 to G3 respectively.

在獲得分群後存活曲線401~403之後，在分群後存活曲線401~403中找出切分時間點。於一些實施例中，切分時間點的尋找方法可以先利用移動平均(Moving Average)法來平滑化分群後存活曲線401~403，之後，在經平滑化後的各分群後存活曲線中計算多個時間點的斜率，以找出斜率變化最大的時間點來作為切分時間點。例如，在圖4所示的實施例中，在分群後存活曲線 401中找到切分時間點T1，在分群後存活曲線403中找到切分時間點T2。 After the post-group survival curves 401-403 are obtained, the cut-off time points are found in the post-group survival curves 401-403. In some embodiments, the method for finding the segmentation time point may first use the Moving Average (Moving Average) method to smooth the survival curves 401-403 after clustering, and then calculate the slopes of multiple time points in the smoothed survival curves after clustering, so as to find the time point with the largest slope change as the segmentation time point. For example, in the example shown in Figure 4, the survival curve after clustering In step 401 , the splitting time point T1 is found, and in the survival curve 403 after clustering, the splitting time point T2 is found.

最後，在步驟S225中，基於所述切分時間點，利用比例風險模型(proportional hazard model)計算所述員工對應於影響變因之離職率預測結果。例如，利用獲得的切分時間點來設定對應的時間範圍。以圖4的切分時間點T1、T2而言，可設定3個時間範圍，即，<T1、T1~T2、>T2。所述比例風險模型例如為Cox比例風險模型。 Finally, in step S225, based on the split time point, a proportional hazard model is used to calculate the predicted result of the turnover rate of the employee corresponding to the influencing variable. For example, the corresponding time range is set by using the obtained segmentation time point. Taking the division time points T1 and T2 in FIG. 4 as an example, three time ranges can be set, namely, <T1, T1~T2, >T2. The proportional hazards model is, for example, a Cox proportional hazards model.

Cox比例風險模型是一種半參數回歸模型，可以用來預測一個或多個不同變因在某一時間對存活率的影響。Cox比例風險模型的公式如下：

The Cox proportional hazards model is a semiparametric regression model that can be used to predict the effect of one or more different variables on survival at a certain time. The formula for the Cox proportional hazards model is as follows:

其中，β為迴歸係數；

；t代表存活的時間點；h(t|x)代表在第t個時間點的情況下給定x的風險；h₀(t)表示在第t個時間點時的基礎風險，例如為任意一個基線危險函數(baseline hazard function)；x=I(影響變因，年資)，在符合條件下x=1，在不符合條件下x=0。舉例來說，假設影響變因為“薪資”，假設條件為年資<30，即，I(薪資，年資<30)，則x=1，反之x=0。另，影響變因亦可以為年齡、加班時數等。 Among them, β is the regression coefficient;

;t represents the time point of survival; h(t|x) represents the risk of a given x at the tth time point; h ₀ (t) represents the base risk at the tth time point, for example, any baseline hazard function; x=I (influence variable, seniority), x=1 when the conditions are met, and x=0 when the conditions are not met. For example, assuming that the influencing variable is "salary", the assumed condition is seniority<30, that is, I(salary, seniority<30), then x=1, otherwise x=0. In addition, the influencing factors can also be age, overtime hours, etc.

將所述Cox比例風險模型的公式推導展開為如下，其中ε為誤差項：

The formula derivation of the Cox proportional hazards model is expanded as follows, wherein ε is an error term:

首先使用員工資料集來估計出β(包括β₁~β_k)。例如可利用最大概似估計(Maximum Likelihood Estimation，MLE)來估計出β。接著再將β代入公式來算出風險比值HR。β越大代表影響變因x_i的離職風險越高。 First use the employee data set to estimate β (including β ₁ ~ β _k ). For example, β can be estimated by using Maximum Likelihood Estimation (MLE). Then β is substituted into the formula to calculate the hazard ratio HR. The larger β means the higher the turnover risk of variable _xi .

另，可以其中一個曲線群作為基準線，將其他曲線群對應的風險比值與基準線對應的風險比值進行比對。例如，可利用下述公式來算出基於基準線的風險程度：HR(x=x₁)=exp(β₁)。 In addition, one of the curve groups can be used as the baseline, and the risk ratios corresponding to the other curve groups are compared with the risk ratios corresponding to the baseline. For example, the following formula can be used to calculate the risk level based on the baseline: HR(x=x ₁ )=exp(β ₁ ).

其中β1在前面透過最大概似估計的計算獲得；當HR>1，代表影響變因x₁相較基準線離職風險程度高，當HR<1則代表影響變因x₁相較基準線離職風險程度低。 Among them, β1 is obtained through the calculation of maximum likelihood estimation; when HR>1, it means that the turnover risk of the influencing variable _x1 is higher than the baseline, and when HR<1, it means that the turnover risk of the influencing variable _x1 is lower than the baseline.

如表4，其以曲線群G1作為基準線，藉此來獲得其他曲線群G2、G3會比基準線高多少。 As shown in Table 4, the curve group G1 is used as the baseline to obtain how much the other curve groups G2 and G3 are higher than the baseline.

綜上所述，本揭露可將影響變因進行合理切分再進行評估，據此可提高分析結果的參考價值。具體而言，本揭露將預估的存活曲線進行分群，藉此可降低連續型變因的人為分群的不合理性，並且，利用切分時間點獲得時間區間(年資)的變因，進而可同時針對年資與其他影響變因進行評估，以更有效制定後續留才政策。 To sum up, this disclosure can reasonably divide the influencing variables before evaluating Therefore, the reference value of the analysis results can be improved. Specifically, this disclosure grouped the estimated survival curves, thereby reducing the irrationality of the artificial grouping of continuous variables, and using the segmentation time points to obtain the variables of the time interval (seniority), and then simultaneously evaluating seniority and other influencing variables to formulate follow-up talent retention policies more effectively.

Claims

A method for predicting turnover rate, suitable for execution by a processor, comprising: obtaining an employee data set corresponding to a plurality of employees; grouping these employees into multiple subsets according to the employee data set and an influencing variable, and using a survival analysis algorithm to calculate a plurality of first survival rates corresponding to multiple time points for the subsets, and then obtaining plural survival curves corresponding to the subsets respectively; grouping the survival curves into multiple curve groups according to the similarity of the survival curves; The employee data set is divided into a plurality of data groups corresponding to the curve groups; using the survival analysis algorithm to calculate multiple second survival rates corresponding to the time points in the data groups to obtain a plurality of grouped survival curves respectively corresponding to the curve groups, and obtain at least a split time point according to the slope changes of the grouped survival curves;

The turnover rate prediction method as described in claim item 1, wherein the step of grouping the employees into the subsets according to the employee data set and the influencing variable includes: dividing a plurality of intervals based on the numerical range corresponding to the influencing variable; and dividing the employee data set into the subsets corresponding to the intervals.

The turnover rate prediction method as described in Claim 1, wherein the step of obtaining at least a cut-off time point according to the slope change of the grouped survival curves includes: using a moving average method to smooth the grouped survival curves; and finding at least one time point with the largest slope change in the smoothed grouped survival curves as the at least cut-off time point.

The turnover rate prediction method as described in claim item 1, wherein the step of using the proportional hazards model to calculate the turnover rate prediction results of the employees corresponding to the influencing variable includes: using the proportional hazards model to calculate the risk ratios of the curve groups in multiple time ranges, wherein the time ranges are set based on the at least one split time point; using one of the curve groups as a baseline, comparing the risk ratios corresponding to the other curve groups with the risk ratio corresponding to the baseline; and using the comparison results as the turnover rate predict the outcome.

The method for predicting turnover rate as claimed in item 1, wherein the influencing variable is a continuous variable.

The method for predicting turnover rate as described in Claim 1, wherein the influencing variable includes one of age, seniority, overtime hours and salary.

An electronic device for predicting turnover rate, comprising: a memory storing an employee data set corresponding to a plurality of employees and at least one code instruction; a processor coupled to the memory to execute the at least one code instruction to implement a turnover rate prediction method, the processor is used for: Obtain the employee data set; group these employees into multiple subsets according to the employee data set and an influencing factor, and use a survival analysis algorithm to calculate a plurality of survival curves corresponding to the subsets; group the survival curves into multiple curve groups according to the similarity of the survival curves; divide the employee data set into multiple data groups corresponding to the curve groups; use the survival analysis algorithm to calculate multiple survival rates corresponding to multiple time points in the data groups to obtain complex numbers corresponding to the curve groups respectively Survival curves after several clusters, and at least one cut-off time point is obtained according to the slope changes of the survival curves after clustering; and based on the at least one cut-off time point, a proportional hazards model is used to calculate the prediction result of turnover rate of the employees corresponding to the influencing variable.

The electronic device as claimed in claim 7, wherein the processor is further configured to: divide a plurality of intervals based on the value range corresponding to the influencing variable; and divide the employee data set into the subsets corresponding to the intervals.

The electronic device according to claim 7, wherein the processor is further configured to: use a moving average method to smooth the survival curves after grouping; and find at least one time point with the largest slope change from the smoothed survival curves after grouping as the at least one division time point.

The electronic device as claimed in item 7, wherein the processor is further used for: Using the proportional hazards model to calculate the risk ratios of the curve groups in multiple time ranges, wherein the time ranges are set based on the at least slicing time point; using one of the curve groups as a baseline, comparing the risk ratios corresponding to the other curve groups with the risk ratio corresponding to the baseline; and using the comparison results as the turnover rate prediction results.