TWI607324B

TWI607324B - Data feature selection and data grouping system and method for vehicle travel time

Info

Publication number: TWI607324B
Application number: TW104144148A
Authority: TW
Inventors: Chi Hua Chen; Ya Ting Yang; Shih Chuan Liao; Chia Min Hsieh; Jia Hong Lin; Ta Sheng Kuan
Original assignee: Chunghwa Telecom Co Ltd
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2017-12-01
Also published as: CN105913154A; TW201723890A

Description

Data feature selection and data grouping system and method for vehicle travel time

本發明係關於一種資料選擇和分群系統與方法，特別為有效將資料劃分到不同的群集，提高群內相依性的資料特徵選擇和資料分群系統與方法。 The present invention relates to a data selection and grouping system and method, and particularly relates to a data feature selection and data grouping system and method for effectively classifying data into different clusters and improving intra-group dependencies.

目前到站時間估計方法運作上仍主要收集過去車載機所回報之到站資訊來估計站到站之間的平均車速和旅行時間，並可依不同的星期和時段來統計，當使用者查詢時可以給予歷史平均車速和旅行時間。雖然此方法可以快速地提供預估到站時間，然而主要是採用歷史資料平均值，而無法因即時路況來進行到站時間的預測，故有可能造成到站時間預測上較大的誤差 At present, the arrival time estimation method still mainly collects the arrival information reported by the in-vehicle machine in the past to estimate the average speed and travel time between stations, and can be counted according to different weeks and time periods. Historical average speed and travel time can be given. Although this method can quickly provide the estimated arrival time, it is mainly based on the historical data average, and it is impossible to predict the arrival time due to the immediate road condition, so it may cause a large error in the arrival time prediction.

由此可見，上述習用方式仍有諸多缺失，仍有改善空間，亟待加以改良。 It can be seen that there are still many shortcomings in the above-mentioned methods of use, and there is still room for improvement, which needs to be improved.

發明人鑑於上述習用方式所衍生的各項缺點，乃亟思加以改良創新，並經多年苦心孤詣潛心研究後，終於成功研發完成本發明一種雲端環境之資源使用率分析預測系統與方法。 In view of the shortcomings derived from the above-mentioned conventional methods, the inventors have improved and innovated, and after years of painstaking research, finally succeeded in research and development of a resource utilization analysis and prediction system and method for a cloud environment of the present invention.

本發明提供一種資料特徵選擇和資料分群系統與方法，包含：一資料平台伺服器，該資料平台伺服器設有複數個伺服器通訊模組，各該伺服器通訊模組係接受複數個感測資料，並將各該感測資料儲存至一資料庫模組，並透過一處理運算模組接收對各該感測資料的資料預測和分析查詢要求；複數個感測模組，各該感測模組係即時收包含集環境資訊之各該感測資料，並將各該感測資料即時經由一感測器通訊模組傳送至該資料平台伺服器；複數個客戶端模組，各該客戶端模組經由所設的一客戶端通訊模組定期或不定期向該資料平台伺服器發資料預測和分析查詢要求，並接收資料預測和分析查詢要求之結果；以及一資料分析模組，該資料分析模組係將該資料平台伺服器內該資料庫模組儲存之各該感測資料進行資料預測和分析查詢，並將資料預測和分析查詢要求之結果回傳至該資料平台伺服器。 The present invention provides a data feature selection and data grouping system and method, comprising: a data platform server, the data platform server is provided with a plurality of server communication modules, each of the server communication modules receiving a plurality of sensing And storing the sensing data in a database module, and receiving, by a processing module, data prediction and analysis query requirements for each of the sensing data; a plurality of sensing modules, each of the sensing The module receives the sensing data including the environmental information, and transmits the sensing data to the data platform server via a sensor communication module; a plurality of client modules, each of the customers The terminal module sends a data prediction and analysis query request to the data platform server periodically or irregularly via a set of client communication modules, and receives data prediction and analysis query request results; and a data analysis module, The data analysis module performs data prediction and analysis query on the sensing data stored in the database module in the data platform server, and predicts and analyzes the data. The results of seeking the information back to the server platform.

其中，該資料分析模組係將各該感測資料進行資料特徵選擇和資料分群後，產生資料預測和分析查詢要求之結果回傳至該資料平台伺服器。 The data analysis module performs data feature selection and data grouping of the sensing data, and the result of generating the data prediction and analysis query request is transmitted back to the data platform server.

本發明提供一種資料特徵選擇和資料分群方法，步驟包含：設定複數個資料群集合併之一相依性門檻值，當各該資料群集間相依性高於該相依性門檻值時，則將各該資料群集進行合併；隨機選擇複數個感測資料並依設定之特徵屬性進行各該資料群集的分群；將同特徵屬性之各該感測資料集合，產生各該資料群集並分別計算各該資料群集的群中心；計算各該資料群集與其各該資料群集間的相依性；當各該資料群集與其各該資料群集間的相依性高於該相依性門檻值時，依序以相依性高之各該資料群集進行合併，並且合併後計算合併後之資料群集的群中心；再次計算各該資料群集與其各該資料群集間的相依性，並依序將相依性高於該相依性門檻值之各該資料群集進行合併，當各該資料群集與其各該資料群集間的相依性皆低於該相依性門檻值時停止；以及再次依設定之特徵屬性進行各該資料群集的分群，當各該感測資料無特徵屬性可以被選取時停止。 The invention provides a data feature selection and data grouping method, and the method comprises: setting a dependency threshold of a plurality of data clusters, and when the correlation between the data clusters is higher than the dependency threshold, each data is The cluster is merged; a plurality of sensing data are randomly selected, and each of the data clusters is grouped according to the set characteristic attribute; Generating each of the sensing data sets of the same feature attribute, generating each of the data clusters and separately calculating the group centers of the data clusters; calculating the dependencies between each of the data clusters and each of the data clusters; When the correlation between the data clusters is higher than the dependency threshold, the data clusters are merged in a high degree of dependency, and the group centers of the merged data clusters are calculated after the combination; the data clusters are calculated again Dependence between each data cluster, and sequentially merging each data cluster with a higher dependency than the dependency threshold, and the dependency between each data cluster and each of the data clusters is lower than the dependency. The threshold is stopped; and the grouping of the data clusters is performed again according to the set characteristic attribute, and stops when each of the sensing data has no feature attribute can be selected.

其中，係藉由卡方分佈累加機率密度函數已計算各該資料群集與其各該資料群集間的相依性。 Among them, the dependence of each data cluster and its respective data clusters has been calculated by the chi-square distributed cumulative probability density function.

本發明提供之資料特徵選擇和資料分群系統方法相較於現有技術而言，其特色在於本發明係將資料進行資料特徵選擇和資料分群系統與方法可依特徵屬性值進行分群，並計算群間的相依性，再將相依性高的群集進行合併，並計算合併後群集中心。可以分析每個特徵屬性所產生之群集是否具有相依性，從而選取出合適的特徵屬性，並且可分析該特徵屬性對資料的影響。此外，資料特徵選擇和資料分群系統與方法得運用平方差比例來計算群間差異，並運用卡方累加機率密度來判斷群間的相依性，再將相依性高的群集進行合併，產生新的群集。 The data feature selection and data grouping system method provided by the present invention is compared with the prior art, and the invention is characterized in that the data feature selection and the data grouping system and method can be grouped according to the characteristic attribute values, and the inter-group calculation is performed. Dependency, then merge the highly dependent clusters and calculate the merged cluster center. It is possible to analyze whether the cluster generated by each feature attribute has a dependency, thereby selecting an appropriate feature attribute, and analyzing the influence of the feature attribute on the data. In addition, data feature selection and data grouping systems and methods use the squared ratio to calculate the difference between groups, and use the chi-squared probability density to judge the interdependence between groups, and then merge the clusters with high dependence to generate new ones. Cluster.

100‧‧‧資料分析模組 100‧‧‧Data Analysis Module

200‧‧‧資料平台伺服器 200‧‧‧Data Platform Server

300‧‧‧感測模組 300‧‧‧Sensor module

400‧‧‧客戶端模組 400‧‧‧Client Module

210‧‧‧資料庫模組 210‧‧‧Database Module

310‧‧‧感測器通訊模組 310‧‧‧Sensor communication module

S201~S207‧‧‧步驟流程 S201~S207‧‧‧Step procedure

圖1係為本發明之資料特徵選擇和資料分群系統之示意圖。 1 is a schematic diagram of a data feature selection and data grouping system of the present invention.

圖2係為本發明之資料特徵選擇和資料分群方法之流程圖。 2 is a flow chart of the data feature selection and data grouping method of the present invention.

以下將參照相關圖式，說明依本發明之多語系語音辨識裝置及其方法之實施例，為使便於理解，下述實施例中之相同元件係以相同之符號標示來說明。 The embodiments of the multi-lingual speech recognition apparatus and the method thereof according to the present invention will be described below with reference to the related drawings. For the sake of understanding, the same components in the following embodiments are denoted by the same reference numerals.

在本實施例中將以清潔車到站時間估計為例進行說明，配合本發明提透過收集各個路段和時段的站到站之間的旅行時間，並運用分群方法來分析資料特徵和判斷資料的相依性，將資料獨立性高的資料切割為不同的群組，可避免極端值的影響並提升預測準確度，據此來預測到站時間，詳述如下。 In the present embodiment, the estimation of the arrival time of the cleaning vehicle will be taken as an example, and the traveling time between the stations and the stations in each section and time period is collected in conjunction with the present invention, and the clustering method is used to analyze the data characteristics and the judgment data. Dependency, cutting data with high data independence into different groups, can avoid the impact of extreme values and improve the accuracy of prediction, based on which to predict the arrival time, as detailed below.

請參閱圖1，如圖所示，為本發明之資料特徵選擇和資料分群系統之示意圖。資料特徵選擇和資料分群系統包含資料分析模組100、資料平台伺服器200、複數個感測模組300以及複數個客戶端模組400。感測模組300得週期性和非週期性感測環境資訊並將感測結果回報至資料平台伺服器200，其中，感測模組300為一車載機設備得運用位置之感測裝置(如：全球定位系統)感知車載機位置，並得運用感測模組300偵測環境資訊(包含溫度和溼度等)，並判斷是否到達清運站點位置，且當到達清運站點時，得運用感測器通訊模組310回報其到站資訊至資料平台伺服器200。例如，車載機編號1於2014/04/01行駛路線編號1之路線，並在14：08：02到達站點編號1，當天的相對溼度為85.8%。當資料平台伺服器200接收到感測模組300所回報的感測資料時，得將儲存之連續型資料轉換為名目型資料或離散型資料，再將資料儲存至資料平台伺服器200內的資料庫模組210。以相對溼度為例，得設定每10%為一個區間，則0%~10%為1，10%~20%，如表1所示。 Please refer to FIG. 1, which is a schematic diagram of a data feature selection and data grouping system of the present invention. The data feature selection and data grouping system includes a data analysis module 100, a data platform server 200, a plurality of sensing modules 300, and a plurality of client modules 400. The sensing module 300 obtains periodic and non-periodic sensing environment information and reports the sensing result to the data platform server 200. The sensing module 300 is a sensing device that uses a position on the vehicle-mounted device (eg, The GPS system senses the position of the vehicle-mounted device, and uses the sensing module 300 to detect environmental information (including temperature and humidity, etc.), and determines whether it has reached the location of the clearing station, and when it arrives at the clearing station, it must be used. The sensor communication module 310 reports its arrival information to the data platform server 200. example For example, the vehicle-mounted machine number 1 travels on route No. 1 on 2014/04/01, and arrives at station number 1 at 14:08:02, and the relative humidity of the day is 85.8%. When the data platform server 200 receives the sensing data reported by the sensing module 300, the stored continuous data may be converted into a name data or a discrete data, and then stored in the data platform server 200. Database module 210. Taking the relative humidity as an example, it is set to be an interval of 10%, and 0% to 10% is 1, 10% to 20%, as shown in Table 1.

此外，資料平台伺服器200亦得進行相關資料之前處理運算，此資料平台伺服器200將得計算出每筆站到站之間的旅行時間，例如：2014/04/01第1站到第2站的旅行時間為2.77分鐘，如下表2所示。資料分析模組100將可向資料平台伺服器200取得資料，並運用資料特徵選擇和資料分群方法對取得之資料進行資料特徵分析和資料分群，再將分析結果儲存至資料平台伺服器200。客戶端模組400得定期或不定期向資料平台伺服器200發出資料預測和分析之查詢要求，再由資料平台伺服器200查詢分析結果，並將資料預測和分析結果回覆予客戶端模組400。 In addition, the data platform server 200 also has to perform related data processing operations, and the data platform server 200 will calculate the travel time between each station and the station, for example: 2014/04/01 1st station to 2nd The travel time of the station is 2.77 minutes, as shown in Table 2 below. The data analysis module 100 will be able to obtain data from the data platform server 200 and use data feature selection. And the data grouping method performs data feature analysis and data grouping on the obtained data, and then stores the analysis result to the data platform server 200. The client module 400 may periodically or irregularly send a query request for data prediction and analysis to the data platform server 200, and then the data platform server 200 queries the analysis result, and returns the data prediction and analysis result to the client module 400. .

請參閱圖2，如圖所示，為本發明之資料特徵選擇和資料分群方法之示意圖。主要步驟如下：步驟S201：設定參數設定；步驟S202：選定特徵屬性；步驟S203：依特徵屬性值進行分群，並設定初始群集；步驟S204：計算群間的相依性；步驟S205：將相依性高的群集進行合併，並計算合併後群集中心；步驟S206：確認是否有群集未計算，並重覆計算至無群集可合併；以及步驟S207：確認是否有特徵屬性未計算。 Please refer to FIG. 2, which is a schematic diagram of the data feature selection and data grouping method of the present invention. The main steps are as follows: step S201: setting parameter settings; step S202: selecting feature attributes; step S203: grouping according to the feature attribute values, and setting an initial cluster; step S204: calculating inter-group dependencies; step S205: high dependency The clusters are merged and the merged cluster center is calculated; step S206: confirming whether there are clusters not calculated, and repeating the calculation to no clusters to merge; and step S207: confirming whether the feature attributes are not calculated.

其中步驟S201之設定參數設定係設定群集合併之相依性門檻值，作為後續當群間相依性高於此門檻值時，將把群集進行合併。如本實施例中將設定相依性門檻值為90%，群間相依性需高於90%得進行群集合併。 The setting parameter setting of step S201 is to set the dependency threshold of the cluster merge, and the cluster will be merged when the inter-group dependency is higher than the threshold. Such as this In the embodiment, the dependency threshold is set to 90%, and the inter-group dependency needs to be higher than 90% for cluster consolidation.

其中步驟S202之選定特徵屬性：依資料集合得隨機選擇一特徵屬性進行分群，以及進行群間相依性計算，再得由具有相依性高之群集的特徵屬性優先進行分群計算。 The selected feature attribute of step S202 is that the feature set is randomly selected according to the data set to perform grouping, and the inter-group dependency calculation is performed, and then the feature attribute of the cluster with high dependency is prioritized for group calculation.

以2014/04/01~2014/10/23期間之站到站旅行時間紀錄為例，其主要的名目型資料或離散型資料有週次、星期、相對溼度區間等資料特徵屬性，並得選擇複數個資料特徵屬性作為初始分群設定。以選擇週次和星期之資料特徵屬性並分析第5站到第6站的旅行時間為例，如：第1週星期二第5站到第6站旅行時間為4.90分，依此類推可得表3之結果。並且得選擇不同的資料特徵屬性進行分群分析。 Taking the station-to-station travel time record from 2014/04/01 to 2014/10/23 as an example, the main name data or discrete data has data characteristics such as week, week, relative humidity interval, and have to be selected. A plurality of data feature attributes are set as initial grouping. Take the data characteristics of the week and week and analyze the travel time from the 5th stop to the 6th stop. For example, the travel time from the 5th stop to the 6th stop on the first week of Tuesday is 4.90 points, and so on. 3 results. And you have to choose different data feature attributes for cluster analysis.

其中步驟S203之依特徵屬性值進行分群，並設定初始群集。依選定之特徵屬性的值進行初始群集設定，得令每個屬性值之資料集合為一群集，並依此將資料分為複數個群集，以及得分別計算每個群集的群中心。以選擇週次和星期之資料特徵屬性後之第5站到第6站旅行時間紀錄為例，將以每個資料特徵屬性值作為初始群集，例如：星期一為第1群C ₁={null,null,13.33,14.20,...,16.70}、星期二為第2群C ₂={4.90,null,null,..,4.77}、…、星期六為第5群C ₅={null,null,3.92,...,null}，每個群集得表示為公式(1)，其中共有n個群集(在此例中n為5)，每個群集資料數共m筆(在此例中m為30)。 The feature attribute values in step S203 are grouped and the initial cluster is set. The initial cluster setting is performed according to the value of the selected feature attribute, so that the data of each attribute value is collected into a cluster, and the data is divided into a plurality of clusters, and the group center of each cluster is calculated separately. Taking the 5th to 6th travel time records of the week and week data feature attributes as an example, each data feature attribute value will be used as the initial cluster. For example, Monday is the first group C ₁ = {null ,null,13.33,14.20,...,16.70}, Tuesday is the second group C ₂ ={4.90,null,null,..,4.77},..., Saturday is the 5th group C ₅ ={null,null, 3.92,...,null}, each cluster is represented as formula (1), where there are n clusters (in this case, n is 5), and the number of data per cluster is m (in this case, m is 30).

C _i={c _i,1,c _i,2,c _i,3,...,c _i,m} 公式(1) C _i ={ c _{i ,1} , c _{i ,2} , c _{i ,3} ,..., c _{i , m} } formula (1)

另以選擇週次和相對溼度區間之資料特徵屬性並分析第5站到第6站的旅行時間為例，並以每個資料特徵屬性值作為初始群集，即一個相對溼度區間為一群。並由於同一週具有相同相對溼度區間的資料為複數筆，因此對複數筆的旅行時間紀錄進行平均計算，以取得群中心，如表4所示。 In addition, the data characteristics of the week and relative humidity intervals are selected and the travel time of the 5th station to the 6th station is analyzed as an example, and each data feature attribute value is used as an initial cluster, that is, a relative humidity interval is a group. And since the data with the same relative humidity interval in the same week is a plurality of pens, the travel time record of the plurality of pens is averaged to obtain the group center, as shown in Table 4.

例如：第1週中相對潍度區間為8的共有2014/04/01和2014/04/03兩天，其第5站到第6站的旅行時間分別為4.90分鐘和15.74分鐘，故第1週相對溼度區間8的旅行時間為10.32分鐘，並依此可得每個群集的群中心。在此實施例中，選擇週次和相對溼度區間之資料特徵屬性共可分為四群，分別為相對溼度區間6為第1群、相對溼度區間7為第2群、相對溼度區間8為第3群、相對溼度區間9為第4群。 For example, in the first week, the relative humidity interval is 8 for 2014/04/01 and 2014/04/03, and the travel time from the 5th station to the 6th station is 4.90 minutes and 15.74 minutes, respectively. The travel time of the weekly relative humidity interval 8 is 10.32 minutes, and the cluster center of each cluster can be obtained accordingly. In this embodiment, the data feature attributes of the weekly and relative humidity intervals are selected. The total grouping is divided into four groups, wherein the relative humidity section 6 is the first group, the relative humidity section 7 is the second group, the relative humidity section 8 is the third group, and the relative humidity section 9 is the fourth group.

步驟S204之計算群間的相依性，計算每個群集與其他群集間資料集合的相依性。在此實施例中將運用公式(2)計算第i群C _i和第j群C _j的群間卡方值x _i,j，並用公式(3)計算資料筆數，且在計算相依性上得運用公式(4)以卡方分佈累加機率密度函數來進行分析，再運用公式(5)計算群間相依性s _i,j。以2014/04/01~2014/10/23期間選擇週次和星期之資料特徵屬性後的第5站到第6站旅行時間紀錄為例，第1群C ₁(星期一群集)和第2群C ₂(星期二群集)的群間卡方值x _1,2為637.77，資料筆數k _i,j為18，運用公式(4)和公式(5)計算其群間相依性s _1,2為0%，依此類推可得兩兩群間之相依性，如表5所示。 Step S204 calculates the dependencies between groups, and calculates the dependence of each cluster on the data set between other clusters. In this embodiment, the inter-group chi-square value x _{i , j of} the i-th group C _i and the j-th group C _j is calculated using the formula (2), and the number of data is calculated by the formula (3), and the calculation is dependent on the calculation. It is necessary to use the formula (4) to analyze the cumulative probability density function of the chi-square distribution, and then use the formula (5) to calculate the inter-group dependence s _{i , j} . For example, the 5th to 6th travel time records after selecting the week and week data feature attributes from 2014/04/01 to 2014/10/23, for example, the first group C ₁ (Monday cluster) and the second Group C ₂ (Tuesday cluster) has a chi-squared value x _1,2 of 637.77, the number of data k _{i , j} is 18, and the inter-group dependence s _{1,2 is} calculated using equation (4) and formula (5). For 0%, and so on, the dependence between the two groups can be obtained, as shown in Table 5.

s _i,j=1-F(x _i,j；k _i,j-1) 公式(5) s _{i , j} =1- F ( x _{i , j} ; k _{i , j} -1) Equation (5)

其中步驟S205，將相依性高的群集進行合併，並計算合併後群集中心：判斷群集間的相依性高於設定之相依性門檻值時，得優先以相依性高之群集進行合併，並且在群集合併後計算合併後群集的群中心。以選擇週次和相對溼度區間之資料特徵屬性的群間相依性為例，由於第2群和第4群的相依性最高為99.47%，且高於相依性門檻值90%，故第2群和第4群優先合併，並得在合併時採用公式(6)計算群中心，合併後結果如表6所示。 In step S205, the clusters with high dependency are merged, and the merged cluster center is calculated: when the inter-cluster dependency is determined to be higher than the set dependency threshold, the clusters with high dependency are preferentially merged, and the cluster is clustered. The group center of the merged cluster is calculated after the merge. Taking the inter-group dependence of the data feature attributes of the selected week and relative humidity intervals as an example, since the dependence of the second group and the fourth group is up to 99.47% and is higher than the dependency threshold of 90%, the second group The group 4 is preferentially merged, and the group center is calculated by the formula (6) at the time of the merger. The combined results are shown in Table 6.

其中步驟S206，確認是否有群集未計算，並重覆計算至無群集可合併：將計算每個群集其(1)群間的相依性、(2)將相依性高的群集進行合併，並計算合併後群集中心；直到沒有群集可以合併時再停止。以選擇週次和相對溼度區間之資料特徵屬性的群間相依性為例，在第2群(星期二)和第4群(星期五)合併後，重新計算群間相依性，結果如表7所示。並且由於第1群(星期一)和第3群(星期四)的相依性為92.84%，並高於相依性門檻值90%，故將第1群和第3群合併。 Step S206, confirming whether there is a cluster uncalculated, and repeating the calculation to no cluster can be merged: calculating (1) the dependencies between groups in each cluster, (2) merging the clusters with high dependency, and calculating the merge After the cluster center; stop until no clusters can be merged. Taking the inter-group dependence of the data feature attributes of the selected week and relative humidity intervals as an example, after the combination of the second group (Tuesday) and the fourth group (Friday), the inter-group dependence is recalculated. The results are shown in Table 7. . And since the dependence of the first group (Monday) and the third group (Thursday) is 92.84%, and is higher than the dependency threshold of 90%, the first group and the third group are merged.

完成第1群(星期一)和第3群(星期四)合併後，將進行再重新計算群間相依性，並由於已無群間之相依性高於相依性門檻值90%，則完成選擇此資料特徵屬性下的分群。 After the completion of the first group (Monday) and the third group (Thursday), it will be heavy again. The new inter-group dependence is calculated, and since there is no inter-group dependence higher than the dependency threshold of 90%, the sub-group under the feature attribute of this data is selected.

其中步驟S207，確認是否有特徵屬性未計算，並得令每個特徵屬性皆有被計算：將依群集合併後之資料集合以其他未被選取過之特徵屬性計算(1)選定特徵屬性；(2)依特徵屬性值進行分群，並設定初始群集；(3)計算群間的相依性；(4)將相依性高的群集進行合併，並計算合併後群集中心；(5)確認是否有群集未計算，並重覆計算至無群集可合併；得直到沒有特徵屬性可以被選取時再停止。以此實施例為例，在完成選擇週次和星期之資料特徵屬性分群後，得判斷是否有其他特徵屬性可進行分群。並由資料中可觀察到得選擇週次和相對溼度區間之資料特徵進行分群(如表4所示)，並計算選擇此資料特徵屬性下分群的群間相依性，可得表8。並由於在選擇週次和相對溼度區間之資料特徵屬性下，群間相依性皆無高於相依性門檻值90%，故不進行群集合併。 In step S207, it is confirmed whether the feature attribute is not calculated, and each feature attribute is calculated: the data set merged according to the cluster is calculated by other unselected feature attributes (1) the selected feature attribute is selected; 2) grouping according to the characteristic attribute values and setting the initial cluster; (3) calculating the dependencies between the groups; (4) merging the clusters with high dependency and calculating the merged cluster center; (5) confirming whether there is a cluster Not calculated, and repeated calculations to no cluster can be merged; it is stopped until no feature attributes can be selected. Taking this embodiment as an example, after completing the grouping of the feature feature attributes of the selected week and the week, it is determined whether other feature attributes can be grouped. The data characteristics of the selected week and relative humidity intervals can be observed in the data (as shown in Table 4), and the inter-group dependence of the grouping under the characteristic attribute of the data is calculated. Table 8 can be obtained. And because the correlation between groups is not higher than the dependency threshold of 90% under the data characteristic attribute of the selected week and relative humidity interval, cluster consolidation is not performed.

運用資料特徵選擇和資料分群系統與方法可將資料進行分群，並可取得相依性高的資料特徵屬性，以此實施例為例，選擇資料特徵屬性‘星期’將比選擇資料特徵屬性‘相對溼度區間’更合適作為資料分群參考屬性。 Data feature selection and data grouping systems and methods can be used to group data and obtain high-dependency data feature attributes. Taking this example as an example, select data features. The attribute 'week' will be more suitable as the data grouping reference attribute than the selection material feature attribute 'relative humidity interval'.

綜上可見，本發明在突破先前之技術下，確實已達到所欲增進之功效，且也非熟悉該項技藝者所易於思及，其所具之進步性、實用性，顯已符合專利之申請要件，爰依法提出專利申請，懇請貴局核准本件發明專利申請案，以勵創作，至感德便。 In summary, the present invention has achieved the desired effect under the prior art, and is not familiar with the skill of the art, and its progress and practicality have been consistent with patents. Apply for the requirements, and file a patent application in accordance with the law, and ask your bureau to approve the application for the invention patent, in order to encourage creation, to the sense of virtue.

以上所述僅為舉例性，而非為限制性者。其它任何未脫離本發明之精神與範疇，而對其進行之等效修改或變更，均應該包含於後附之申請專利範圍中。 The above is intended to be illustrative only and not limiting. Any other equivalent modifications or alterations of the present invention are intended to be included in the scope of the appended claims.

S201~S207‧‧‧步驟流程 S201~S207‧‧‧Step procedure

Claims

A data feature selection and data grouping method for vehicle travel time includes the steps of: obtaining, by a data platform server, a time interval between a vehicle and a station at each time interval; setting a plurality of data clusters to merge one another a threshold value, when each of the data clusters is higher than the dependency threshold, the data cluster is merged into the data platform server; and the environment associated with the vehicle is collected through the sensing module. Each of the plurality of sensing materials of the information randomly selects each of the sensing data and performs grouping of the data clusters according to the set characteristic attribute and the time when the vehicle arrives at the station and the other station at different time periods and returns the data platform. a server; the data platform server collects each of the sensing data of the same feature attribute, generates each data cluster, and separately calculates a group center of each data cluster and calculates a dependency between each data cluster and each of the data clusters; Sexuality; when the dependence between each data cluster and its respective data clusters is higher than the dependency threshold, the dependency is sequential Each of the data clusters is merged until the dependency between each of the data clusters and each of the data clusters is lower than the dependency threshold, and the merged data cluster is calculated according to the data platform server. The group center recalculates the dependencies between groups; and predicts the time of arrival of the vehicle based on the degree of dependence of different groups.

For example, the data feature selection and data grouping method of the vehicle travel time described in claim 1 wherein the probability distribution function is calculated by the chi-square distribution to calculate the dependency between each data cluster and each of the data clusters.

Data feature selection and data points for vehicle travel time as described in item 1 of the patent application scope The group method, wherein each data attribute calculation step of the cluster center of the merged data cluster of the two clusters comprises: if the data attribute has a value in the cluster i, and the data attribute has no value in the cluster j, the merged data cluster The data attribute adopts the value of the data attribute of the cluster i; if the data attribute has no value in the cluster i, and the data attribute has a value in the cluster j, the data attribute of the merged data cluster uses the data attribute cluster j to the value of the data attribute If the data attribute has a value in cluster i, and the data attribute has a value in cluster j, then the data attribute of the merged data cluster uses the data attribute to average the data attribute of the cluster i and the data of the cluster j; If the data attribute has no value in cluster i and the data attribute has no value in cluster j, then the data attribute of the merged data cluster is set to a null value.

A data grouping method for vehicle travel time includes the steps of: obtaining, by a data platform server, a time interval between a vehicle arriving at a station and another station at different time periods, and collecting, by the sensing module, a travel associated with the vehicle. a plurality of sensing data of the environmental information; collecting the sensing data of the same characteristic attribute, generating each of the data clusters and separately calculating the group centers of the data clusters; calculating the data clusters by using the data platform server Dependency between each data cluster; when the dependency between each data cluster and its respective data clusters is higher than the dependency threshold, each data cluster with high dependency is sequentially merged, and the combined calculation is performed. The cluster center of the merged data cluster; again, the data platform server calculates the dependency between each data cluster and each of the data clusters, and sequentially performs the data clusters whose dependencies are higher than the dependency threshold. Combined And stopping when each of the data clusters and their respective data clusters are below the dependency threshold; and predicting the time of arrival of the vehicle according to different group dependency degrees.

A data feature selection and data grouping method for vehicle travel time, the steps include: obtaining a plurality of sensing materials and vehicle travel time from a data platform server or a database module; setting a dependency threshold of a plurality of data clusters When the correlation between the data clusters is higher than the dependency threshold, each of the data clusters is merged; a plurality of sensing data are randomly selected, and each of the data clusters is grouped according to the set characteristic attribute; Each of the sensing data sets of the feature attributes generates each of the data clusters and separately calculates a group center of each of the data clusters; calculates a dependency between each of the data clusters and each of the data clusters; and each of the data clusters and the respective data When the inter-cluster dependence is higher than the dependency threshold, each of the data clusters with high dependency is sequentially merged, and the group center of the merged data cluster is calculated after the combination; each of the data clusters is calculated again. Dependency between data clusters, and sequentially clustering each of the data clusters with a higher dependency than the dependency threshold, When the dependency between the data cluster and each of the data clusters is lower than the dependency threshold, the clustering of the data clusters is performed again according to the set characteristic attribute, and stops when each of the sensing data has no characteristic attribute can be selected. And according to the results of data feature selection and data grouping, the data of different clusters must be built separately to predict the travel time, and travel time prediction in each cluster.

Information feature selection and data points of vehicle travel time as described in item 5 of the patent application scope A group method in which a feature attribute includes at least a week, a week, a relative humidity, and a travel time.

For example, the data feature selection and data grouping method of the vehicle travel time described in claim 5, wherein the dependence of the data cluster and each of the data clusters has been calculated by the chi-square distributed cumulative probability density function.

For example, the data feature selection and data grouping method of the vehicle travel time described in claim 5, wherein each data attribute calculation step of the cluster center of the merged data cluster of the two clusters comprises: if the data attribute is in the cluster i If there is a value, and the data attribute has no value in the cluster j, then the data attribute of the merged data cluster uses the value of the data attribute of the cluster i; if the data attribute has no value in the cluster i, and the data attribute has a value in the cluster j, Then, the data attribute of the merged data cluster uses the value of the data attribute of the data attribute cluster; if the data attribute has a value in the cluster i, and the data attribute has a value in the cluster j, the data attribute of the merged data cluster adopts the attribute The data attribute is the average of the data attribute of the cluster i and the data of the cluster j; and if the data attribute has no value in the cluster i, and the data attribute has no value in the cluster j, then the data attribute of the merged data cluster is set to be empty. value.