TWI759785B

TWI759785B - System and method for recommending audit criteria based on integration of qualitative data and quantitative data

Info

Publication number: TWI759785B
Application number: TW109122785A
Authority: TW
Inventors: 王其宏
Original assignee: 王其宏
Priority date: 2019-07-22
Filing date: 2020-07-06
Publication date: 2022-04-01
Also published as: TW202107348A; CN112256740A

Abstract

A system for recommending audit criteria based on integration of qualitative data and quantitative data, which executes a method for recommending audit criteria based on integration of qualitative data and quantitative data, includes: a storage module for receiving an in-progress analysis data of a supplier audit and storing a historical analysis data; a topic model transformation module for analyzing an audit finding of the historical analysis data to obtain a topic model probability distribution so that a feature vector module can generate a feature vector set, according to a supplier business data relevant to the topic model probability distribution and the historical analysis data, as well as a feature vector value of the in-progress analysis data; a categorization module for determining a cluster to which the feature vector value belongs so that a recommendation module can, according to the determined cluster, offer recommended audit items which are pre-calculated by an audit criteria list.

Description

A system and method for integrating qualitative data and quantitative data to recommend audit criteria

本發明屬於自然語言處理領域，具體係關於一種推薦系統及方法，尤其指整合定性資料及定量資料進行稽核準則推薦的系統及方法。 The invention belongs to the field of natural language processing, and specifically relates to a recommendation system and method, in particular to a system and method for integrating qualitative data and quantitative data to recommend audit criteria.

過去在財務稽核中有如US 8,050,988 B2以及US 2006/0106686 A1就財務風險提出結構化的稽核系統與從風險面提出財務稽核的機會與建議，其他如US 7,885,841 B2、US 5,765,138、US 7,346,527 B2、US 2008/019546 A1、US 8,504,412 B1等專利也包含如稽核計畫與稽核項目產生的自動化。 In the past, there were financial audits such as US 8,050,988 B2 and US 2006/0106686 A1, which proposed a structured audit system for financial risks and opportunities and suggestions for financial audits from the risk perspective. Others such as US 7,885,841 B2, US 5,765,138, US 7,346,527 B2, US Patents such as 2008/019546 A1, US 8,504,412 B1 also cover automation such as audit planning and audit project generation.

雖有利用如自然語言處理的推薦系統過去公開或公告如US 2016/0148327 A1、US 2018/0165696 A1以及CN 107,807,962 B，但未能考慮供應商的風險可能與其背景資訊如規模、經營績效及運營時間等定量資訊。 Although recommendation systems using natural language processing have been published or announced in the past, such as US 2016/0148327 A1, US 2018/0165696 A1 and CN 107,807,962 B, they failed to consider the risks of suppliers and their background information such as scale, business performance and operation. Quantitative information such as time.

爰此，本發明人為較佳地推薦稽核準則，而提出一種整合定性資料及定量資料進行稽核準則推薦的系統，包含一儲存模組、一主題模型轉換模組、一特徵向量模組、一歸類模組及一推薦模組；所述儲存模組用於接收供應商稽核的一進行中分析資料及儲存過去已完成供應商稽核的一歷史分析資料，所述進行中分析資料及所述歷史分析資料皆包含一稽核發現之定性資料及一供應商經營數據之定量資料；所述主題模型轉換模組連接所述儲存模組，用於分析所述歷史分析資料之所述稽核發現，以建立及更新一主題模型並取得所述歷史分析資料的所述稽核發現之一主題模型概率分布，及依所述主題模型轉換所述進行中分析資料之所述稽核發現，並取得所述進行中分析資料的所述稽核發現之一主題模型概率分布；所述特徵向量模組連接所述主題模型轉換模組及所述儲存模組，用於依據所述歷史分析資料的所述主題模型概率分布及所述歷史分析資料之所述供應商經營數據產生對應的一特徵向量集合，所述特徵向量模組並用於依據所述進行中分析資料的所述主題模型概率分布及所述進行中分析資料之所述供應商經營數據產生對應所述進行中分析資料的一特徵向量值；所述歸類模組連接所述特徵向量模組，用於就所述特徵向量集合進行聚類分析分成多個聚類，並決定所述特徵向量值於所述多個聚類中所屬的一聚類；所述推薦模組連接所述歸類模組及所述主題模型轉換模組，用於接收供應商稽核所用之一稽核準則清單，並依據所述特徵向量值所屬的所述聚類，就相關之一主題產生對應之一推薦稽核準則各項。 Therefore, in order to better recommend audit criteria, the present inventor proposes a system for recommending audit criteria by integrating qualitative data and quantitative data, including a storage module, a topic model conversion module, a feature vector module, and a normalization module. A class module and a recommendation module; the storage module is used for receiving an in-progress analysis data of the supplier audit and storing a historical analysis data of the supplier audit completed in the past, the in-progress analysis data and the history The analytical data consist of qualitative data of an audit finding and a Quantitative data of supplier operation data; the subject model conversion module is connected to the storage module for analyzing the audit findings of the historical analysis data, so as to establish and update a subject model and obtain the historical analysis data A subject model probability distribution of the audit finding, and converting the audit finding of the ongoing analysis data according to the subject model, and obtaining a subject model probability distribution of the audit finding of the ongoing analysis data ;The feature vector module is connected to the subject model conversion module and the storage module, and is used for the probability distribution of the subject model according to the historical analysis data and the supplier management data of the historical analysis data A corresponding set of feature vectors is generated, and the feature vector module is used to generate corresponding to the ongoing analysis data according to the subject model probability distribution of the ongoing analysis data and the supplier operation data of the ongoing analysis data An eigenvector value of the analysis data; the classification module is connected to the eigenvector module, and is used to perform cluster analysis on the eigenvector set into multiple clusters, and determine the eigenvector value in the A cluster that belongs to a plurality of clusters; the recommendation module is connected to the classification module and the topic model conversion module, and is used for receiving an audit criteria list used for supplier audit, and according to the characteristics The cluster to which the vector values belong, yields a corresponding one of the recommended audit criteria items on a related one of the topics.

本發明也是一種整合定性資料及定量資料進行稽核準則推薦的方法，包含：由一儲存模組接收供應商稽核的一進行中分析資料及儲存過去已完成供應商稽核的一歷史分析資料，所述進行中分析資料及所述歷史分析資料皆包含一稽核發現之定性資料及一供應商經營數據之定量資料；由一主題模型轉換模組分析所述歷史分析資料之所述稽核發現，以建立及更新一主題模型並取得所述歷史分析資料的所述稽核發現之一主題模型概率分布，及所述主題模型轉換模組依所述主題模型轉換所述進行中分析資料之所述稽核發現，並取得所述進行中分析資料的所述稽核發現之一主題模型概率分布，以供一特徵向量模組依據所述主題模型概率分布及所述歷史分析資料之所述供應商經營數據產生對應的一特徵向量集合，與供所述特徵向量模組依據所述進行中分析資料的所述主題模型概率分布及所述進行中分析資料之所述供應商經營數據產生對應所述進行中分析資料的一特徵向量值；以一歸類模組就所述特徵向量集合進行聚類分析分成多個聚類，並決定所述特徵向量值於所述多個聚類中所屬的一聚類，以供一推薦模組接收供應商稽核所用之一稽核準則清單，並依據所述特徵向量值所屬的所述聚類，就相關之一主題產生對應之一推薦稽核準則各項。 The present invention is also a method for integrating qualitative data and quantitative data to recommend audit criteria, comprising: receiving, by a storage module, an ongoing analysis data of supplier audits and storing a historical analysis data of past supplier audits, the Both the ongoing analysis data and the historical analysis data include qualitative data of an audit finding and quantitative data of a supplier's operating data; the audit findings of the historical analysis data are analyzed by a subject model conversion module to establish and updating a topic model and obtaining a topic model probability distribution of the audit findings of the historical analysis data, and the topic model conversion module converting the audit findings of the ongoing analysis data according to the topic model, and obtaining a subject model probability distribution of the audit findings of the ongoing analysis data for a feature vector The module generates a corresponding feature vector set according to the subject model probability distribution and the supplier operation data of the historical analysis data, and the subject model for the feature vector module to use based on the ongoing analysis data The probability distribution and the supplier operation data of the ongoing analysis data generate a feature vector value corresponding to the ongoing analysis data; a classification module is used to perform cluster analysis on the feature vector set into multiple clusters. class, and determine a cluster to which the eigenvector value belongs among the plurality of clusters, so that a recommendation module can receive a list of audit criteria for supplier auditing, and according to which eigenvector value belongs to According to the clustering, a corresponding one of the recommended audit criteria items is generated on a related topic.

其中，所述特徵向量集合可運用K-平均演算法(K-means clustering)進行聚類分析。 Wherein, the feature vector set may use K-means clustering to perform cluster analysis.

其中，聚類分析可經加權K-均值(Weighted K-means)特徵選擇演算降低建立聚類分析的特徵向量的維數。 In the cluster analysis, the dimension of the feature vector for establishing the cluster analysis can be reduced by a weighted K-means feature selection algorithm.

其中，所述歸類模組可運算所述特徵向量值與各個所述聚類的重心的一距離值，以所述距離值為最小的所述聚類作為所述特徵向量值所屬的所述聚類。 Wherein, the classification module can calculate a distance value between the feature vector value and the centroid of each of the clusters, and the cluster with the smallest distance value is used as the cluster to which the feature vector value belongs. clustering.

其中，所述主題模型可至少運用隱含狄利克雷分佈(Latent Dirichlet Allocation,LDA)或非負矩陣分解(Non-Negative Matrix Factorization,NMF)其一建立，所述主題模型轉換模組將所述歷史分析資料的所述稽核發現根據所述主題模型映射成所述歷史分析資料的所述稽核發現之所述主題模型概率分布，且將所述進行中分析資料的所述稽核發現根據所述主題模型進行映射而得所述進行中分析資料的所述稽核發現之所述主題模型概率分布。 Wherein, the topic model can be established by at least one of Latent Dirichlet Allocation (LDA) or Non-Negative Matrix Factorization (NMF), and the topic model conversion module converts the historical The audit findings of the analytical data are mapped to the subject model probability distributions of the audit findings of the historical analytical data according to the subject model, and the audit findings of the ongoing analytical data are mapped to the subject model The subject model probability distribution of the audit findings of the in-progress analysis data is obtained by mapping.

其中，所述供應商經營數據之定量資料可至少包含一供應商人數資料、一營業額資料、一經營時間資料的任一或組合。 Wherein, the quantitative data of the supplier's operation data may include at least any one or a combination of supplier headcount data, turnover data, and operating time data.

根據上述技術特徵可達成以下功效： According to the above technical features, the following effects can be achieved:

1.稽核準則的推薦考慮供應商經營的背景資訊(如規模、經營績效及運營時間等定量資訊)，將較僅以自然語言處理推薦提供較為適合的稽核準則。 1. The recommendation of audit criteria takes into account the background information of the supplier's operation (such as quantitative information such as scale, business performance and operating time), and will provide more suitable audit criteria than only natural language processing recommendations.

2.就過去所蒐集的稽核發現的定性資訊及供應商相關的定量資訊，定期以自然語言處理及非監督學習就供應商進行聚類分析並進行特徵選擇，可客觀地建立稽核發現與經營指標的相關性。 2. Regarding the qualitative information and supplier-related quantitative information collected in past audits, regularly conduct cluster analysis and feature selection on suppliers through natural language processing and unsupervised learning, so as to objectively establish audit findings and business indicators correlation.

100:系統 100: System

1:儲存模組 1: storage module

11:進行中分析資料 11: In-progress analysis data

111:進行中稽核發現 111: In-progress audit findings

112:進行中供應商經營數據 112: Ongoing Supplier Operational Data

12:歷史分析資料 12: Historical analysis data

121:已完成稽核發現 121: Audit findings completed

122:歷史供應商經營數據 122: Historical Supplier Operation Data

2:主題模型轉換模組 2: Theme model conversion module

3:特徵向量模組 3: Feature vector module

30:特徵向量集合 30: Feature vector collection

31:特徵向量值 31: Eigenvector value

4:歸類模組 4: Classification module

40:聚類分析 40: Cluster Analysis

5:推薦模組 5: Recommended modules

50:稽核準則清單 50: Checklist of Auditing Standards

51:推薦稽核準則各項 51: Recommended audit criteria items

S10:建模步驟 S10: Modeling step

S100:建模步驟之步驟一 S100: Step 1 of the modeling steps

S101:建模步驟之步驟二 S101: Step 2 of the modeling steps

S102:建模步驟之步驟三 S102: Step 3 of the modeling steps

S103:建模步驟之步驟四 S103: Step 4 of the modeling steps

S104:建模步驟之步驟五 S104: Step 5 of the modeling steps

S105:建模步驟之步驟六 S105: Step 6 of the modeling steps

S106:建模步驟之步驟七 S106: Step 7 of the modeling steps

S107:建模步驟之步驟八 S107: Step 8 of the modeling steps

S108:建模步驟之步驟九 S108: Step 9 of the modeling steps

S109:建模步驟之步驟十 S109: Step ten of modeling steps

S110:建模步驟之步驟十一 S110: Step Eleven of Modeling Steps

S111:建模步驟之步驟十二 S111: Step 12 of the modeling steps

S112:建模步驟之步驟十三 S112: Step thirteen of the modeling steps

S20:稽核準則推薦步驟 S20: Recommended steps for auditing standards

S200:稽核準則推薦步驟之步驟一 S200: Step 1 of the recommended steps for auditing standards

S201:稽核準則推薦步驟之步驟二 S201: Step 2 of the recommended steps for auditing standards

S202:稽核準則推薦步驟之步驟三 S202: Step 3 of the recommended steps for auditing standards

S203:稽核準則推薦步驟之步驟四 S203: Step 4 of the recommended steps for auditing standards

[第一圖]係本發明一實施例的系統方塊示意圖。 [Figure 1] is a schematic block diagram of a system according to an embodiment of the present invention.

[第二圖]係本發明另一實施例中包含建模步驟及稽核準則推薦步驟的詳細流程示意圖。 [Fig. 2] is a detailed flow chart including modeling steps and audit criteria recommendation steps in another embodiment of the present invention.

綜合上述技術特徵，本發明整合定性資料及定量資料進行稽核準則推薦的系統及方法的主要功效將可於下述實施例清楚呈現。 In view of the above technical features, the main functions of the system and method for integrating qualitative data and quantitative data to recommend audit criteria of the present invention will be clearly presented in the following embodiments.

請參閱第一圖，係揭示本發明一實施例整合定性資料及定量資料進行稽核準則推薦的一系統100，實施上可為雲端系統或是單機設備，主要包含一儲存模組1、一主題模型轉換模組2、一特徵向量模組3、一歸類模組4及一推薦模組5；上述系統100是用於執行本發明另一實施例整合定性資料及定量資料進行稽核準則推薦的方法；以下，將先進一步具體說明整合定性資料及定量資料進行稽核準則推薦的所述系統100：所述儲存模組1用於接收供應商稽核的一進行中分析資料11及儲存過去已完成供應商稽核的一歷史分析資料12；所述進行中分析資料11包含一定性資料，即一進行中稽核發現111，及一定量資料，即一進行中供應商經營數據112；所述進行中稽核發現111為稽核人員對被稽核供應商於稽核過程中所見之客觀陳述，資料為文字形式，一旦稽核完成則所述進行中稽核發現111狀態更新為一已完成稽核發現121；所述進行中供應商經營數據112為一數值性資料集合，可包含但不限於例如一供應商人數資料、一營業額資料、一經營時間資料等；所述進行中供應商經營數據112可以事先蒐集獲得，稽核完成後狀態更新為一歷史供應商經營數據122，所述歷史分析資料12為所述已完成稽核發現121及所述歷史供應商經營數據122的總稱。 Please refer to the first figure, which shows a system 100 for integrating qualitative data and quantitative data to recommend audit criteria according to an embodiment of the present invention, which can be implemented as a cloud system or a stand-alone device, and mainly includes a storage module 1 and a theme model A conversion module 2, a feature vector module 3, a classification module 4, and a recommendation module 5; the above-mentioned system 100 is used to implement a method for integrating qualitative data and quantitative data to recommend audit criteria according to another embodiment of the present invention ; In the following, the system 100 for integrating qualitative data and quantitative data to recommend audit criteria will be further specifically described: The storage module 1 is used to receive an ongoing analysis data 11 of supplier audits and store a historical analysis data 12 of past supplier audits; the in-progress analysis data 11 includes qualitative data, that is, an ongoing Audit findings 111, and a certain amount of information, namely an ongoing supplier operation data 112; the ongoing audit findings 111 are objective statements made by auditors on what the audited suppliers saw during the audit process. After completion, the status of the in-progress audit finding 111 is updated to a completed audit finding 121; the in-progress supplier operation data 112 is a numerical data set, which may include, but is not limited to, for example, a supplier number data, a turnover data, a business time data, etc.; the in-progress supplier business data 112 can be collected and obtained in advance, after the audit is completed, the status is updated to a historical supplier business data 122, and the historical analysis data 12 is the completed audit findings 121 and a general term for the historical supplier operating data 122 .

所述主題模型轉換模組2連接所述儲存模組1，定期就所述已完成稽核發現121更新一主題模型，以取得一主題模型概率分布。所述主題模型係至少可運用隱含狄利克雷分佈(Latent Dirichlet Allocation,LDA)演算法或非負矩陣分解(Non-Negative Matrix Factorization,NMF)其一建立。所述主題模型轉換模組2以最新之所述主題模型，分別對所述儲存模組1儲存的所述已完成稽核發現121及對所述儲存模組1接收之所述進行中稽核發現111進行映射轉換成所述主題模型的線性組合而產生所述主題模型概率分布。 The topic model conversion module 2 is connected to the storage module 1, and regularly updates a topic model according to the completed audit findings 121, so as to obtain a topic model probability distribution. The topic model can be established by at least one of Latent Dirichlet Allocation (LDA) algorithm or Non-Negative Matrix Factorization (NMF). The topic model conversion module 2 uses the latest topic model to respectively perform the completed audit finding 121 stored in the storage module 1 and the in-progress audit finding 111 received by the storage module 1 . A linear combination of mapping transformations into the topic models produces the topic model probability distribution.

所述特徵向量模組3連接所述主題模型轉換模組2及所述儲存模組1，將所述已完成稽核發現121的所述主題模型概率分布讀入並與所述儲存模組1儲存的所述歷史供應商經營數據122做結合運算，以產生一特徵向量集合30，同時將所述進行中稽核發現111的所述主題模型概率分布讀入並與所述儲存模組1中的所述進行中供應商經營數據112做結合運算產生一特徵向量值31。 The feature vector module 3 is connected to the subject model conversion module 2 and the storage module 1, and reads the subject model probability distribution of the completed audit finding 121 and stores it with the storage module 1 The historical supplier operation data 122 is combined to generate a feature vector set 30. At the same time, the subject model probability distribution of the ongoing audit findings 111 is read in and combined with all the data in the storage module 1. The above-mentioned ongoing supplier operation data 112 is combined to generate a feature vector value 31 .

所述歸類模組4連接所述特徵向量模組3，可就所述特徵向量集合30利用如組內最小平方和演算法決定一最適聚類數，並以所述最適聚類數利用如K-平均演算法(K-means clustering)將所述特徵向量集合30進行聚類分析40分成多個聚類；在進行聚類分析40時，所述特徵向量集合30係就所述歷史供應商經營數據122與所述已完成稽核發現121的所述主題模型概率分布進行結合運算，而一特徵向量之各維對聚類分析40結果之貢獻與影響有異，因此歸類模組4可以利用Weighted K-means進行特徵選擇以降低建立聚類分析40的所述特徵向量的維數；並決定所述特徵向量值31於所述多個聚類中所屬的一聚類；具體而言，所述歸類模組4會透過運算所述特徵向量值31與各個所述聚類之重心的一距離值，並可將所述距離值為最小的所述聚類決定為所述特徵向量值31所屬的所述聚類。 The classification module 4 is connected to the eigenvector module 3, and can determine an optimal number of clusters by using the least square sum algorithm within the group for the eigenvector set 30, and use the optimal number of clusters as follows. K-means clustering performs cluster analysis 40 on the feature vector set 30 and divides it into a plurality of clusters; when performing cluster analysis 40, the feature vector set 30 is based on the historical supplier The operation data 122 is combined with the subject model probability distribution of the completed audit finding 121, and each dimension of a feature vector has different contributions and influences to the results of the cluster analysis 40, so the classification module 4 can use Weighted K-means performs feature selection to reduce the dimension of the feature vector for establishing the cluster analysis 40; and determines a cluster to which the feature vector value 31 belongs to the plurality of clusters; specifically, the The classification module 4 calculates a distance value between the eigenvector value 31 and the centroid of each of the clusters, and can determine the cluster with the smallest distance value as the eigenvector value 31 the cluster to which it belongs.

接著，連接所述歸類模組4及所述主題模型轉換模組2的所述推薦模組5接收供應商稽核所用之一稽核準則清單50，並依據所述歸類模組4所決定之所述特徵向量值31所屬的所述聚類，由所述聚類重心之座標，取得相關性高之至少一主題，利用所述主題模型以詞頻-逆向文件頻率(term frequency-inverse document frequency,tf-idf)至所述稽核準則清單50中查詢傳回依相關排序的各所述主題對應的一推薦稽核準則各項51。 Next, the recommendation module 5 connected to the categorization module 4 and the topic model conversion module 2 receives an audit criteria list 50 used for supplier audit, and determines according to the categorization module 4 The cluster to which the eigenvector value 31 belongs is obtained from the coordinates of the center of gravity of the cluster to obtain at least one topic with high correlation, and the topic model is used to calculate the term frequency-inverse document frequency (term frequency-inverse document frequency, tf-idf) to query the audit criteria list 50 and return a recommended audit criteria item 51 corresponding to each of the topics sorted by relevance.

以下實施例並結合第二圖，將進一步詳細說明所述整合定性資料及定量資料進行稽核準則推薦的方法的詳細內容，主要包含一建模步驟S10及一稽核準則推薦步驟S20。所述建模步驟S10主要是依據一儲存模組中的一已完成稽核發現、一稽核準則清單及一歷史供應商經營數據(例如供應商人數資料、營業額資料、經營時間資料等)進行聚類分析，可以僅執行一次，也可以是定期或不定期的更新。所述稽核準則推薦步驟S20則是可將新提供的一進行中稽核發現、一進行中供應商經營數據進行歸類，以提供對應的一推薦稽核準則各項。 The following embodiment and with reference to the second figure, will further describe the details of the method for integrating qualitative data and quantitative data to recommend audit criteria, which mainly includes a modeling step S10 and an audit criteria recommendation step S20. The modeling step S10 is mainly based on a completed audit finding in a storage module, a list of audit criteria and a historical supplier operation data (such as supplier number data, turnover data, operating time data, etc.) to perform aggregation. Class analysis, which can be performed only once or periodically or irregular updates. In the audit criterion recommendation step S20, a newly provided in-process audit finding and an in-progress supplier operation data can be classified to provide corresponding items of a recommended audit criterion.

所述建模步驟S10包含： The modeling step S10 includes:

一建模步驟之步驟一S100：建立一稽核事件，輸入一稽核準則清單至一推薦模組，並自所述儲存模組輸出所有既存的一供應商的一編號及對應所述編號的所述已完成稽核發現(csv檔)。 Step 1 S100 of a modeling step: create an audit event, input an audit criteria list to a recommendation module, and output from the storage module an ID of all existing suppliers and the ID corresponding to the ID Completed audit findings (csv file).

一建模步驟之步驟二S101：一主題模型轉換模組利用pandas工具讀入所述建模步驟之步驟一S100所輸出的所述已完成稽核發現。 Step 2 S101 of a modeling step: A topic model conversion module uses pandas tool to read the completed audit findings outputted in Step 1 S100 of the modeling step.

一建模步驟之步驟三S102：所述主題模型轉換模組利用gensim工具對所述建模步驟之步驟二S101中的所述已完成稽核發現進行分詞。 Step 3 S102 of a modeling step: The topic model conversion module uses the gensim tool to segment the completed audit findings in Step 2 S101 of the modeling step.

一建模步驟之步驟四S103：所述主題模型轉換模組以spacy工具與NLTK(Natural Language Tool Kit)工具對所述建模步驟之步驟三S102中分詞後的所述已完成稽核發現進行停用詞移除與詞根提取等前處理。要補充說明的是，上述pandas、gensim、spacy、NLTK皆為Python程式語言的自然語言或數據分析處理軟件工具。 Step 4 S103 of a modeling step: The topic model conversion module uses spacy tools and NLTK (Natural Language Tool Kit) tools to stop the completed audit findings after word segmentation in Step 3 S102 of the modeling step. Pre-processing such as word removal and root extraction. It should be added that the above pandas, gensim, spacy, and NLTK are all natural language or data analysis and processing software tools of the Python programming language.

一建模步驟之步驟五S104：所述主題模型轉換模組將所述建模步驟之步驟四S103處理後的所述已完成稽核發現轉換至詞頻(term frequency)空間向量。 Step 5 S104 of a modeling step: The topic model conversion module converts the completed audit findings processed in Step 4 S103 of the modeling step into a term frequency space vector.

一建模步驟之步驟六S105：所述主題模型轉換模組以隱含狄利克雷分佈(Latent Dirichlet Allocation,LDA)演算法對所述建模步驟之步驟五S104處理後的所述已完成稽核發現建立一主題模型並最佳化。 Step 6 S105 of a modeling step: The topic model conversion module uses a Latent Dirichlet Allocation (LDA) algorithm to perform the audit on the completed audit processed in Step 5 S104 of the modeling step Discovery builds a topic model and optimizes it.

一建模步驟之步驟七S106：所述主題模型轉換模組將所述已完成稽核發現映射成所述主題模型的一主題模型概率分布，亦即D=Σ φ T，其中，D為所述已完成稽核發現，T為所述主題模型，而φ為T於D之中的概率。 Step 7 S106 of a modeling step: The topic model conversion module maps the completed audit findings into a topic model probability distribution of the topic model, that is, D=Σ φ T, where D is the The audit has been completed and found that T is the topic model and φ is the probability that T is in D.

一建模步驟之步驟八S107：一特徵向量模組取出φ並自所述儲存模組讀入一定量資訊，即所述歷史供應商經營數據V並作結合運算，產生一特徵向量F=V+=φ，並由所有的所述特徵向量F構成一特徵向量集合Fⁿ。 Step 8 S107 of a modeling step: a feature vector module extracts φ and reads a certain amount of information from the storage module, that is, the historical supplier operation data V and performs a combined operation to generate a feature vector F= V +=φ, and a set of eigenvectors F ⁿ is formed by all the eigenvectors F .

一建模步驟之步驟九S108：一歸類模組對所述特徵向量集合Fⁿ利用K-平均演算法進行分析，以最小組內平方和(within-cluster sum of squares,WCSS)求得一最適聚類數k。 Step 9 S108 of a modeling step: a classification module analyzes the feature vector set F ⁿ by using the K-means algorithm, and obtains a within-cluster sum of squares (WCSS) The optimal number of clusters k.

一建模步驟之步驟十S109：就m維所述特徵向量F隨意給定w _j，但

w _j=1，其中，w _j為對應m個一特徵的權重集合。 Step 10 of a modeling step S109: arbitrarily specify w _j for the feature vector F of m dimension, but

w _j =1, where w _j is a weight set corresponding to m one feature.

一建模步驟之步驟十一S110：給定β(β>1)及k，隨意給定一聚類重心Z_k，固定兩者遞迴依次解最小

c _il

d(x _ij,z _lj)之(C,Z,w)，其中，C _il為一正交矩陣，僅當i=l時為1，亦即是僅計算n個x與其所屬聚類重心Z_k的距離值。 Step 11 of a modeling step S110: Given β (β>1) and k, randomly assign a cluster center of gravity Z _k , and fix the two recursively to solve the minimum

c _il

(C, Z, w) of d ( x _ij , z _lj ), where C _il is an orthogonal matrix, which is 1 only when i=l, that is, only n x and the cluster centroid Z to which they belong are calculated The distance value of _k .

一建模步驟之步驟十二S111：m=p+q，依w_j大小自m中選擇p而q≧0，且p中的r為選取的φ，更詳細的說，所述建模步驟之步驟十二S111係自m個所述特徵中，選擇前p個權重w較大的所述特徵，剩餘q個所述特徵未被選擇，而這p個所述特徵中，有r個所述特徵來自於所述主題模型概率分布。 Step 12 of a modeling step S111: m=p+q, select p from m according to the size of w _j and q≧0, and r in p is the selected φ. More specifically, the modeling step Step 12 S111 is to select the first p features with a larger weight w from the m features, and the remaining q features are not selected, and among the p features, there are r features. The features are derived from the topic model probability distribution.

一建模步驟之步驟十三S112：以r個主題利用tf-idf對所述稽核準則清單查詢傳回依相關排序的各所述主題對應的一稽核準則各項。 Step 13 S112 of a modeling step: query the audit criteria list by using tf-idf for r topics, and return an audit criteria item corresponding to each of the topics sorted by relevance.

所述稽核準則推薦步驟S20包含： The audit criterion recommendation step S20 includes:

一稽核準則推薦步驟之步驟一S200：所述儲存模組接收來自一用戶端(例如智慧型手機、筆記型電腦、平板電腦等)的一進行中分析資料，所述進行中分析資料包含所述進行中供應商經營數據及所述進行中稽核發現。 Step 1 S200 of an audit criterion recommendation step: the storage module receives an in-process analysis data from a client (such as a smart phone, a notebook computer, a tablet computer, etc.), and the in-process analysis data includes the On-going supplier operating data and on-going audit findings described.

一稽核準則推薦步驟之步驟二S201：所述主題模型轉換模組對所述進行中分析資料的所述進行中稽核發現以已建立的所述主題模型進行映射而得所述進行中稽核發現的一主題模型概率分布，D_A=Σ φ_AT_A，A代表所述稽核事件。所述特徵向量模組將所述進行中稽核發現的所述主題模型概率分布讀入並與所述儲存模組中的所述進行中供應商經營數據做結合運算產生一特徵向量值。 Step 2 of an audit criterion recommendation step S201 : the topic model conversion module maps the in-progress audit findings of the in-progress analysis data with the established topic model to obtain the in-progress audit findings. A subject model probability distribution, D _A =Σ φ _A T _A , where A represents the audit event. The feature vector module reads the probability distribution of the subject model found in the ongoing audit and performs a combined operation with the ongoing supplier business data in the storage module to generate a feature vector value.

一稽核準則推薦步驟之步驟三S202：所述歸類模組以所述特徵向量值的p_A個特徵計算與各所述聚類重心Z_k間的最小距離，決定當時的一聚類C_A。 Step 3 S202 of an auditing criterion recommendation step: the classification module calculates the minimum distance from each of the cluster centroids Z _k based on the p _A features of the eigenvector value, and determines a cluster C _A at that time .

一稽核準則推薦步驟之步驟四S203：所述推薦模組自所述聚類C_A的重心依與所述主題的相關程度依序推薦由所述建模步驟之步驟十三S112所生成的對應之所述稽核準則各項給所述用戶端，即所述推薦稽核準則各項。 Step 4 S203 of an audit criterion recommendation step: the recommendation module recommends the corresponding correspondence generated by Step 13 S112 of the modeling step in order from the center of gravity of the cluster _CA according to the degree of relevance to the topic The items of the audit criteria are sent to the client, that is, the items of the recommended audit criteria.

藉此，使用者可於稽核現場即時上傳供應商經營數據及稽核發現，其中，稽核發現經主題模型(Topic Model)轉換運算後形成主題分佈，並整合供應商經營數據以非監督學習(如K-means運算法)後就原先聚類進行歸類後，取聚類中概率較高的主題排序後，便可依序傳回各主題的對應推薦稽核準則各項做為稽核機會的參考。 In this way, users can upload supplier operation data and audit findings in real time at the audit site. The audit findings are converted into topic distribution by topic model, and the supplier operation data can be integrated for unsupervised learning (such as K -means algorithm), after classifying the original clusters, and sorting the topics with higher probability in the clusters, the corresponding recommended audit criteria for each topic can be returned in sequence as a reference for audit opportunities.

綜合上述實施例之說明，當可充分瞭解本發明之操作、使用及本發明產生之功效，惟以上所述實施例僅係為本發明之較佳實施例，當不能以此限定本發明實施之範圍，即依本發明申請專利範圍及發明說明內容所作簡單的等效變化與修飾，皆屬本發明涵蓋之範圍內。 Based on the descriptions of the above embodiments, one can fully understand the operation, use and effects of the present invention, but the above-mentioned embodiments are only preferred embodiments of the present invention, which should not limit the implementation of the present invention. Scope, that is, simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the contents of the description of the invention, all fall within the scope of the present invention.

S10:建模步驟 S10: Modeling step

S100:建模步驟之步驟一 S100: Step 1 of the modeling steps

S101:建模步驟之步驟二 S101: Step 2 of the modeling steps

S102:建模步驟之步驟三 S102: Step 3 of the modeling steps

S103:建模步驟之步驟四 S103: Step 4 of the modeling steps

S104:建模步驟之步驟五 S104: Step 5 of the modeling steps

S105:建模步驟之步驟六 S105: Step 6 of the modeling steps

S106:建模步驟之步驟七 S106: Step 7 of the modeling steps

S107:建模步驟之步驟八 S107: Step 8 of the modeling steps

S108:建模步驟之步驟九 S108: Step 9 of the modeling steps

S109:建模步驟之步驟十 S109: Step ten of modeling steps

S110:建模步驟之步驟十一 S110: Step Eleven of Modeling Steps

S111:建模步驟之步驟十二 S111: Step 12 of the modeling steps

S112:建模步驟之步驟十三 S112: Step thirteen of the modeling steps

S20:稽核準則推薦步驟 S20: Recommended steps for auditing standards

Claims

A system for integrating qualitative data and quantitative data to recommend audit criteria, comprising: a storage module for receiving an ongoing analysis data of supplier audits and storing a historical analysis data of supplier audits that have been completed in the past. Both the middle analysis data and the historical analysis data include a qualitative data of audit findings and a quantitative data of supplier operation data; a theme model conversion module is connected to the storage module for analyzing the historical analysis data. The audit finding is to establish a topic model or update the topic model, and obtain a topic model probability distribution of the audit finding of the historical analysis data, and the topic model conversion module converts according to the topic model The audit findings of the in-progress analysis data, and obtain a subject model probability distribution of the audit findings of the in-progress analysis data; a feature vector module, connecting the subject model conversion module and the storage A module for generating a corresponding feature vector set according to the subject model probability distribution of the historical analysis data and the supplier operation data of the historical analysis data, and the feature vector module is used to generate a corresponding feature vector set according to the The subject model probability distribution of the ongoing analysis data and the supplier operation data of the ongoing analysis data generate a feature vector value corresponding to the ongoing analysis data; a classification module is connected to the feature vector a module for performing cluster analysis on the feature vector set into multiple clusters, and determining a cluster to which the feature vector value belongs among the multiple clusters; and a recommendation module for connecting all clusters The classification module and the topic model conversion module are used to receive an audit criteria list used for supplier audit, and generate a corresponding recommendation on a related topic according to the cluster to which the feature vector value belongs audit criteria.

The system for recommending audit criteria by integrating qualitative data and quantitative data according to claim 1, wherein the classification module calculates a distance value between the feature vector value and the centroid of each of the clusters, and uses the distance The cluster with the smallest value is taken as the cluster to which the eigenvector value belongs.

According to the system for recommending auditing standards by integrating qualitative data and quantitative data in claim 1, the quantitative data of supplier operation data includes at least any one of supplier number data, turnover data, and operating time data, or combination.

For a system for recommending auditing criteria by integrating qualitative data and quantitative data according to claim 1, the topic model is at least using Latent Dirichlet Allocation (LDA) or Non-Negative Matrix Factorization (Non-Negative Matrix Factorization). Once established, the topic model conversion module maps the audit findings of the historical analysis data to the topic model probability distribution of the audit findings of the historical analysis data according to the topic model, and converts the The subject model probability distribution of the audit findings of the ongoing analysis data is obtained by mapping the audit findings of the ongoing analysis data according to the subject model.

A method for integrating qualitative data and quantitative data to recommend audit criteria, comprising: receiving, by a storage module, an in-process analysis data of supplier audits and storing a historical analysis data of supplier audits that have been completed in the past, the in-progress analysis data Both the data and the historical analysis data include qualitative data of an audit finding and quantitative data of a supplier's business data; the audit findings of the historical analysis data are analyzed by a subject model conversion module to establish a subject model or updating the topic model, and obtaining a topic model probability distribution of the audit findings of the historical analysis data, and the topic model conversion module converting the audit findings of the ongoing analysis data according to the topic model , and obtain a thematic model of the audit findings of the ongoing analysis data type probability distribution, for a feature vector module to generate a corresponding feature vector set according to the subject model probability distribution and the supplier operation data of the historical analysis data, and for the feature vector module to generate a corresponding feature vector set according to the The subject model probability distribution of the ongoing analysis data and the supplier operating data of the ongoing analysis data generate a feature vector value corresponding to the ongoing analysis data; and a classification module is used to classify the feature The vector set is divided into a plurality of clusters by cluster analysis, and a cluster to which the feature vector value belongs to the plurality of clusters is determined, so that a recommendation module can receive an audit criteria list for supplier auditing, And according to the cluster to which the feature vector value belongs, a corresponding recommended audit criterion item is generated for a related topic.

According to the method for integrating qualitative data and quantitative data for recommendation of auditing criteria according to claim 5, wherein the feature vector set is clustered using K-means clustering.

The method for recommending audit criteria by integrating qualitative data and quantitative data according to claim 5, wherein the classification module calculates a distance value between the feature vector value and the center of gravity of each of the clusters, and uses the distance The cluster with the smallest value is taken as the cluster to which the eigenvector value belongs.

According to the method of integrating qualitative data and quantitative data for auditing criteria recommendation in claim 5, wherein the cluster analysis reduces the dimension of the feature vector for establishing the cluster analysis by weighted K-means feature selection algorithm.

According to the method of integrating qualitative data and quantitative data for recommendation of auditing criteria according to claim 5, wherein the topic model is established by at least one of implicit Dirichlet distribution or non-negative matrix decomposition, and the topic model conversion module converts the mapping the audit findings of the historical analysis data to the topic model probability distributions of the audit findings of the historical analysis data according to the topic model, and The subject model probability distribution of the audit findings of the ongoing analysis data is obtained by mapping the audit findings of the ongoing analysis data according to the subject model.

According to the method of integrating qualitative data and quantitative data for auditing standard recommendation in claim 5, the quantitative data of supplier operation data includes at least any one of supplier number data, turnover data, and operating time data, or combination.