TWI549008B

TWI549008B - A large number of data into the system and methods of screening management

Info

Publication number: TWI549008B
Application number: TW103125935A
Authority: TW
Inventors: Hui Hung Chien; Tsu Chun Chen; Chia Yu Guo
Original assignee: Chunghwa Telecom Co Ltd
Priority date: 2014-07-30
Filing date: 2014-07-30
Publication date: 2016-09-11
Also published as: TW201604694A

Description

System and method for importing large amounts of data into screening management

本發明係為一種匯入資料的系統與方法有關；具體而言，特別是關於一種於電信業務中大量資料匯入篩選管理的系統與方法，係透過字元數、連續性及排序方式進而匯入篩選並管理。 The present invention relates to a system and method for importing data; in particular, a system and method for filtering and managing a large amount of data in a telecommunication service, through the number of characters, continuity and sorting method Filter into and manage.

於專利前案多媒體檔案的分類方法(申請日期：2008/12/09，申請案號：097147879)，係為利用EXIF資訊、影像特徵以及檔案屬性等參數自動分類大量的多媒體檔案，並且在瀏覽畫面上顯示具有分類名稱、檔案名稱、日期參數以及檔案位置之分類標籤，讓使用者可以方便的管理與瀏覽大量的多媒體檔案。 In the pre-patent multimedia file classification method (application date: 2008/12/09, application number: 097147879), it is to automatically sort a large number of multimedia files using parameters such as EXIF information, image features and file attributes, and browse the screen. A classification label with a category name, a file name, a date parameter, and a file location is displayed, so that the user can conveniently manage and browse a large number of multimedia files.

於專利前案利用比對單字位置關係進行全文檢索之系統及其方法(申請日期：2006/10/04，申請案號：095136960)，係為利用比對單字位置關係進行全文檢索之系統及其方法，應用於具有辭典功能之手持式資料處理裝置，係透過具有文件編號及位置編號之辭典資料庫，建立一索引關係，當執行待檢索辭彙查找時，初步比對出包含待檢索辭彙各單字之文件編號，接續比對各單字之位置編號以找出符合待檢索辭彙各單字位置關係之辭彙文件，同時生成一檢索結果列表，提供使用者相關檢索資訊，以達精確檢索之要求。 The system and method for full-text search using the positional relationship of single words in the pre-patent case (application date: 2006/10/04, application number: 095136960) is a system for performing full-text search using the positional relationship of single words and The method is applied to a handheld data processing device having a dictionary function, and an index relationship is established through a dictionary database having a file number and a location number, when performing an index to be retrieved When the vocabulary search is performed, the file numbers including the words of the vocabulary to be searched are initially compared, and the position numbers of the words are successively compared to find the vocabulary file corresponding to the positional relationship of the words to be retrieved, and a search result is generated at the same time. A list that provides user-relevant search information for accurate search.

由此可見，上述習用方式仍有諸多缺失，實非一良善之設計，而亟待加以改良。 It can be seen that there are still many shortcomings in the above-mentioned methods of use, which is not a good design, but needs to be improved.

本案發明人鑑於上述習用方式所衍生的各項缺點，乃亟思加以改良創新，並經多年苦心孤詣潛心研究後，終於成功研發完成本件發明。 In view of the shortcomings derived from the above-mentioned conventional methods, the inventor of the present invention has improved and innovated, and after years of painstaking research, he finally succeeded in researching and developing this invention.

本專利之目的，當面對大量商品資料匯入且不同來源情況下、以及格式錯誤和資料內容包含無法判斷之符號或詞句時，以尚未代入本專利篩選的方法進行展示介面呈現，往往會得到過多但不絕對精確的匹配成功結果，需要人工介入至後端平台做修改。故本發明採用建立索引分類模組區分各型態資料且建立索引值；資料篩選模組剔除無效資料內容；資料比對模組主要區分各類別資料；資料管理模組爬取資料內容並寫入各資料庫中；最後資料抽取模組依據展示介面需求，讀取主機資料庫的資料，依序回傳給展示介面，將可大幅度降低因展示介面資料內容的錯誤，而發生客訴案件機率。 The purpose of this patent is that when a large number of commodity data are imported and different sources are used, and the format errors and data contents contain symbols or words that cannot be judged, the presentation interface is not presented in the method of screening this patent. Excessive but not absolutely accurate matching of successful results requires manual intervention into the backend platform for modification. Therefore, the invention adopts an index classification module to distinguish various types of data and establish an index value; the data screening module rejects the invalid data content; the data comparison module mainly distinguishes each category of data; the data management module crawls the data content and writes In each database, the final data extraction module reads the data of the host database according to the display interface requirements, and returns it to the display interface in sequence, which can greatly reduce the error of the content of the display interface, and the probability of the customer complaint case occurs. .

以往匯入商品資料至展示介面，都需依靠人工匯入和比對，且花費大量時間和人力彙整、儲存並維運其各類商品資料，如雲端、手機或固網等商品，以便對外出售商品頁面提供正確的商品資訊。本發明可搭配需求者定義映射表格提供創新之自助服務；及資料比對模組利用加權的公式計算權重值；和需求者只要修改映射表格，不需直接更改程式，即可彈性新增、更新商品資料內容與類別，這樣可以大幅提升電信業務維運管理之便利性。 In the past, importing commodity data to the display interface relied on manual import and comparison, and it took a lot of time and manpower to consolidate, store and transport all kinds of commodity data, such as cloud, mobile phone or fixed network, for sale. The product page provides the correct product information. The invention can be combined with a demander definition mapping table to provide innovation The new self-service; and the data comparison module use the weighted formula to calculate the weight value; and the demander can modify and add the mapping table, and can directly add and update the product data content and category without directly changing the program, which can greatly improve The convenience of telecom business maintenance.

本發明之大量資料匯入篩選管理的系統與方法具備圖形化與自動化之功能，且具時效性、正確性、整合型、效率性及便利性等優點，提高資料辨別正確性，減輕大量商品編輯人力及大幅提升電信業務維運管理之便利性，也可運用於其它需大量資料匯入且資料重複的系統中。 The system and method for importing a large amount of data into the screening management of the invention have the functions of graphics and automation, and have the advantages of timeliness, correctness, integration, efficiency and convenience, improve the correctness of data identification, and reduce the editing of a large number of commodities. Manpower and the convenience of greatly improving the management of telecommunication services can also be applied to other systems that require large amounts of data to be imported and duplicated.

本發明所提供之技術特徵，與其它習用技術相互比較時，更具備下列優點： The technical features provided by the present invention have the following advantages when compared with other conventional technologies:

1.採用建立索引分類模組進行智慧型分類判斷分析，區分各型態資料，提高資料爬取效率。 1. The index classification module is used for intelligent classification judgment analysis to distinguish various types of data and improve data crawling efficiency.

2.利用資料篩選模組依據設定自訂組態，自動過濾無效字詞或符號，減少人工過濾作業成本。 2. Using the data filtering module to automatically filter invalid words or symbols according to the customized configuration, reducing the cost of manual filtering operations.

3.資料比對模組透過加權的公式計算權重值，考量比對字詞之相同字數量、相同字順序、相同字詞連續性三種條件，將可以在短時間內快速比對出匹配的資料結果，更準確區分來源商品資料的分類。 3. The data comparison module calculates the weight value through the weighted formula, and considers the three conditions of the same word number, the same word order, and the same word continuity as the word, and can quickly compare the matched data in a short time. As a result, the classification of the source commodity data is more accurately distinguished.

4.利用設定映射模組提供創新之自助服務，不需直接更改程式內容，即可彈性新增、更新商品資料內容與類別，大幅提升電信業務維運管理之便利性。 4. Using the setting mapping module to provide innovative self-service, you can flexibly add and update product content and categories without directly changing the program content, greatly improving the convenience of telecom business maintenance management.

5.採用資料管理模組可減輕大量資料建檔的人力成本負擔，所以也將減少因人為因素而導致發生回報案件。 5. The use of data management modules can reduce the labor cost burden of a large number of data files, so it will also reduce the return cases caused by human factors.

6.利用資料抽取模組可設置自動化排程，直接由展示介面傳送需求給該模組，該摸組主動由主機資料庫爬取資料再回傳，減少大量前端商品編輯人力，有效降低公司人力成本。 6. The data extraction module can be used to set the automatic scheduling, and the demand is transmitted directly to the module by the display interface. The touch group actively crawls the data from the host database and then returns the data, thereby reducing the amount of front-end product editing manpower and effectively reducing the company's manpower. cost.

100‧‧‧大量資料匯入篩選管理的系統 100‧‧‧A large amount of data is transferred to the system for screening management

110‧‧‧建立索引分類模組 110‧‧‧Create index classification module

120‧‧‧資料篩選模組 120‧‧‧Information screening module

130‧‧‧資料比對模組 130‧‧‧ data comparison module

140‧‧‧設定映射模組 140‧‧‧Set mapping module

150‧‧‧資料管理模組 150‧‧‧Data Management Module

160‧‧‧資料抽取模組 160‧‧‧Data extraction module

170‧‧‧資料匯入模組 170‧‧‧ Data Import Module

180‧‧‧資料審核模組 180‧‧‧ Data Audit Module

S210~S270‧‧‧大量資料匯入篩選管理的流程步驟 S210~S270‧‧‧ Process steps for importing large amounts of data into screening management

210、220‧‧‧資料庫 210, 220‧‧ ‧ database

230‧‧‧主機資料庫 230‧‧‧Host database

240‧‧‧展示介面 240‧‧‧Display interface

請參閱有關本發明之詳細說明及其附圖，將可進一步瞭解本發明之技術內容及其目的功效；有關附圖為：第1圖為本發明之大量資料匯入篩選管理的系統之示意圖。 The technical contents of the present invention and the effects of the objects of the present invention can be further understood by referring to the detailed description of the present invention and the accompanying drawings. FIG. 1 is a schematic diagram of a system for importing and filtering a large amount of data into the present invention.

第2圖為本發明之大量資料匯入篩選管理的方法之流程圖。 Figure 2 is a flow chart of a method for importing and filtering a large amount of data into the present invention.

第3圖為本發明之大量資料匯入篩選管理的系統之實施例圖。 Figure 3 is a diagram showing an embodiment of a system for importing and filtering large amounts of data into the present invention.

第4圖為本發明之大量資料匯入篩選管理的方法之資料計算解說圖。 Figure 4 is a data calculation diagram of the method for importing and filtering a large amount of data into the screening management of the present invention.

第5圖為本發明之大量資料匯入篩選管理的方法之資料計算解說圖。 Figure 5 is a data calculation diagram of the method for filtering and managing a large amount of data into the present invention.

第6圖為本發明之大量資料匯入篩選管理的方法之資料計算解說圖。 Figure 6 is a data calculation diagram of the method for importing and filtering large amounts of data into the screening management of the present invention.

第7圖為本發明之大量資料匯入篩選管理的系統與方法之時序圖。 Figure 7 is a timing diagram of a system and method for large-scale data import and screening management of the present invention.

為了使本發明的目的、技術方案及優點更加清楚明白，下面結合附圖及實施例，對本發明進行進一步詳細說明。應當理解，此處所描述的具體實施例僅用以解釋本發明，但並不用於限定本發明。 The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

以下，結合附圖對本發明進一步說明：請參閱第1圖，第1圖為本發明之大量資料匯入篩選管理的系統之示意圖。如第1圖所示，其包括建立索引分類模組110、資料篩選模組120、資料審核模組180、資料比對模組130、設定映射模組140、資料管理模組150與資料抽取模組160。而建立索引分類模組110接收資料並判斷各個索引目標端是否有索引值，且預先建置索引目標端與索引值儲存於資料庫。其次，資料篩選模組120設定組態並接收該建立索引分類模組110之資料進行過濾。然而，資料審核模組180接收該資料篩選模組120之資料，且該些資料係為無法判斷或亂碼並主動建立審核介面。資料比對模組130接收該資料篩選模組120或該資料審核模組180之資料，並依據設定映射模組140之資料表進行交叉比對；而該設定映射模組140利用程式邏輯進行多層群組彙整，並預先定義各個模組之資料表，且轉換為該系統之標準格式資料。接著，資料管理模組150接收該設定映射模組140之資料表與該資料比對模組130之資料，依據各資料表欄位進行特徵名稱分類。最後，資料抽取模組160呼叫資料管理模組150，並接收該資料管理模組150之資料，再將該些資料回傳至該展示介面240。 Hereinafter, the present invention will be further described with reference to the accompanying drawings: Referring to FIG. 1, FIG. 1 is a schematic diagram of a system for importing and filtering a large amount of data into the present invention. As shown in FIG. 1 , the method includes an index classification module 110, a data filtering module 120, a data review module 180, a data comparison module 130, a setting mapping module 140, a data management module 150, and a data extraction module. Group 160. The index classification module 110 receives the data and determines whether each index target has an index value, and the pre-built index target and the index value are stored in the database. Next, the data filtering module 120 sets the configuration and receives the data of the indexing classification module 110 for filtering. However, the data review module 180 receives the data of the data screening module 120, and the data is unrecognizable or garbled and actively establishes a review interface. The data matching module 130 receives the data of the data screening module 120 or the data review module 180, and performs cross-matching according to the data table of the setting mapping module 140. The setting mapping module 140 performs multi-layering by using program logic. The group is aggregated, and the data sheets of each module are pre-defined and converted into standard format data of the system. Then, the data management module 150 receives the data table of the setting mapping module 140 and the data comparison module 130, and classifies the feature names according to each data table field. Finally, the data extraction module 160 calls the data management module 150 and receives the data of the data management module 150, and then transmits the data back to the display interface 240.

而上述之該些索引值更可分類為單一值、多值與空值。該單一值係為目標端不需再進一步解析，且該多值係為目標端更包含其他索引值，需再進一步解析、而該空值係為目標端無其它索引值，不需要進一步解析。 The above index values can be further classified into a single value, a multi-value and a null value. The single value is that the target end does not need to be further parsed, and the multi-value system further includes other index values for the target end, and further analysis is needed, and the null value is the target. There are no other index values on the side and no further parsing is required.

請參閱第2圖，第2圖為本發明之大量資料匯入篩選管理的方法之流程圖。如第1圖及第2圖所示，其流程步驟可包括：S210該建立索引分類模組110接收資料，建立索引值與索引目標端，並利用索引目標端名稱進行分類。 Please refer to FIG. 2, which is a flow chart of a method for importing and filtering a large amount of data into the present invention. As shown in FIG. 1 and FIG. 2, the process steps may include: S210: The index establishment module 110 receives the data, establishes an index value and an index target end, and classifies the index target end name.

S220該資料篩選模組120設定組態並接收該建立索引分類模組110之資料進行過濾，若可進行判斷分析則進行步驟S240：若否則進行步驟S230。 S220: The data filtering module 120 sets the configuration and receives the data of the indexing classification module 110 for filtering. If the judgment analysis is possible, the process proceeds to step S240: otherwise, step S230 is performed.

S230該資料審核模組180接收該資料篩選模組120之資料，且該些資料係為無法判斷分析或亂碼，並主動建立審核介面進行除錯。 S230: The data review module 180 receives the data of the data screening module 120, and the data is incapable of judging analysis or garbled, and actively establishing a review interface for debugging.

S240該資料比對模組130接收該資料篩選模組120或該資料審核模組180之資料，並依據設定映射模組140之資料表進行交叉比對。 The data comparison module 130 receives the data of the data screening module 120 or the data review module 180, and performs cross-matching according to the data table of the setting mapping module 140.

S250該設定映射模組140利用程式邏輯進行多層群組彙整，並預先定義各個模組之資料表，且轉換為該系統之標準格式資料。 In S250, the setting mapping module 140 performs multi-layer group integration by using program logic, and pre-defines the data tables of each module, and converts into standard format data of the system.

S260該資料管理模組150接收該設定映射模組140之資料表與該資料比對模組130之資料，依據各資料表欄位進行特徵名稱分類。 S260: The data management module 150 receives the data table of the setting mapping module 140 and the data comparison module 130, and classifies the feature names according to each data table field.

S270該資料抽取模組160接收並依據展示介面240需求彙整進行批次遞送，接收該資料管理模組150之資料，再將該些資料回傳至該展示介面240。 The data extraction module 160 receives and performs batch delivery according to the display interface 240, receives the data of the data management module 150, and transmits the data back to the display interface 240.

該建立索引分類模組110預先建置索引目標端供資料比對模組130與資料管理模組150查詢，不需提前爬取整份XML文件才找的到目標字串，該建立索引分類模組110可提高資料爬取效率，且將結果存放於資料庫。然而，資料篩選模組120自訂組態進行資料過濾，負責過濾建立索引分類模組110所傳送資料之特定資料字串或符號，預先建立過濾資料表包括無法判斷之符號或詞句等不相關內容，或建立需要代替之字詞或符號之資料表，解決因人為因素導致的資料錯誤；若有資料內容無法判斷且多數為亂碼，該資料篩選模組120會逕行發送給資料審核模組180做審核動作，減少客服人員工作量。 The indexing classification module 110 pre-builds the index target end for The data comparison module 130 and the data management module 150 query the target string without the need to crawl the entire XML file in advance, and the index classification module 110 can improve the data crawling efficiency and store the result in the database. However, the data filtering module 120 customizes the configuration for data filtering, and is responsible for filtering and establishing a specific data string or symbol of the data transmitted by the index classification module 110, and pre-establishing the filtering data table including irrelevant symbols or words and the like. Or create a data sheet that needs to be replaced by a word or symbol to solve the data error caused by human factors; if the content of the data cannot be judged and most of them are garbled, the data screening module 120 will send the data to the data review module 180. Review actions to reduce the workload of customer service staff.

接著，資料比對模組130主要係執行資料比對產生作業，會接受資料篩選模組120或者是資料審核模組180傳遞的指令產生相對應的資料，並依據設定映射模組140所定義的「各個商品資料類型」之資料表交叉比對。其資料來源是資料篩選模組120、資料審核模組180與設定映射模組140所發送資料，且會將比對完的資料逕行發送給資料管理模組150彙整至來源類型欄位，將來可直接由展示介面240區分不同類型，並通知資料管理模組150。比對方法是依據商品特性及比對流程，發明的權重值公式，其各考量數值加總最高，會將商品歸類至該類型，以下列公式所示： Then, the data comparison module 130 mainly performs the data comparison generating operation, and receives the information transmitted by the data filtering module 120 or the data review module 180 to generate corresponding data, and is defined according to the setting mapping module 140. The data sheets of the "Product Data Types" are cross-checked. The data source is the data filtering module 120, the data review module 180 and the data sent by the setting mapping module 140, and the compared data path is sent to the data management module 150 to be merged into the source type field. Different types are directly distinguished by the display interface 240, and the data management module 150 is notified. The comparison method is based on the product characteristics and the comparison process. The weight value formula of the invention has the highest total value of each consideration, and the goods are classified into the type, as shown by the following formula:

α+β+γ=10 ++β+γ=10

，表示其中若計算數值小於0.1，則直接取0.1；表示其中若計算數值小於 0.1，則直接取0.1；Ws=權重值分數加總；α、β、γ=權重因子；Q=順序質量；Nd=索引目標端之資料內容欲比對的名稱相同字數；Ns=設定映射模組之定義資料表的關鍵字字數；=索引目標端之資料內容欲比對的名稱位置順序；=設定映射模組之定義資料表的關鍵字位置順序；S(Pd)=索引目標端之資料內容欲比對的名稱相同字詞連續性；以及Ps=設定映射模組之定義資料表的關鍵字詞連續性。 , which means If the calculated value is less than 0.1, take 0.1 directly; Express If the calculated value is less than 0.1, then take 0.1; Ws = weight value score sum; α, β, γ = weight factor; Q = order quality; Nd = index target data content to be compared to the same number of words; Ns=Set the number of keywords in the definition data table of the mapping module; = The order of the name of the data content of the index target is to be compared; = Set the keyword position order of the definition data table of the mapping module; S(Pd)=the data content of the index target end is the same as the name of the matching word; and the key of the definition data table of the Ps=setting mapping module Word continuity.

而設定映射模組140係利用程式邏輯進行多層群組彙整，主要功能預先定義「各個商品資料類型」之資料表與「各個索引目標端處理」之資料表供資料比對模組和資料管理模組爬取預定資料及指定放置位置，且為系統可接受的標準格式資料。 The setting mapping module 140 uses the program logic to perform multi-level group integration. The main functions pre-define the data table of each product data type and the data table of each index target processing for the data comparison module and the data management module. The group crawls the predetermined data and the specified placement location, and is in a standard format data acceptable to the system.

此外，資料管理模組150係接收設定映射模組140傳送之資料表，依據各資料表欄位進行特徵名稱分類，且遵行設定映射模組的「各個索引目標端處理」之資料表交叉比對，處理各商品資料內容，預先設置來源類型、目的名稱、目的資料表、目的欄位、擷取處理等欄位，可將欲爬取XML商品資料寫入各資料庫和資料表中。最後，資料抽取模組160依據展示介面240需求，將標準化過的商品資料從主機資料庫讀取，及根據主機資料庫之「來源類型」欄位，回傳資料至展示介面240各商品專區，藉此區分各類型資料商品。 In addition, the data management module 150 receives the data table transmitted by the setting mapping module 140, performs feature name classification according to each data table field, and follows the data table cross-matching of each index target processing of the setting mapping module. The content of each product data is processed, and the fields of source type, destination name, destination data table, destination field, and retrieval processing are preset, and the XML product data to be crawled can be written into each database and data table. Finally, the data extraction module 160 reads the standardized product data from the host database according to the requirements of the display interface 240, and returns the data to the product areas of the display interface 240 according to the "source type" field of the host database. This distinguishes between various types of data products.

為俾利能更清楚了解本發明，於此提出實施例說明。請參閱第3圖，第3圖為本發明之大量資料匯入篩選管理的系統之實施例圖。如第1圖至第3圖所示，主要透過以下步驟來完成以自動化方式配合展示介面240需求抽取主機資料庫之資料： The present invention will be more clearly understood for the benefit of the present invention. Please refer to FIG. 3, which is a diagram of an embodiment of a system for importing and filtering large amounts of data into the present invention. As shown in Figures 1 to 3, the main steps are as follows. Completing the data of the host database in an automated manner in conjunction with the display interface 240 requirements:

1.資料匯入模組170係執行定時排程作業抓取XML資料，歸類至實體位置或是資料庫暫存，並把資料庫210暫存做為待分類之商品資料。 1. The data import module 170 performs a scheduled scheduling job to capture XML data, classifies it into an entity location or a temporary storage of the database, and temporarily stores the database 210 as the commodity data to be classified.

2.建立索引分類模組110係到指定的資料庫220接收新的資料，並通知資料篩選模組120啟動其模組來處理檔案，其運作包括： 2. The index classification module 110 is configured to receive the new data from the designated database 220, and notify the data screening module 120 to start its module to process the file. The operation includes:

(a)在客服人員傳送XML資料格式有問題時，即會產生錯誤之訊息。 (a) When there is a problem with the customer service staff transmitting the XML data format, an error message will be generated.

(b)遇到網路斷線，或資料庫當機沒有回應時，產生告警訊息。 (b) An alarm message is generated when a network disconnection occurs or the database fails to respond.

(c)若有新資料傳入，指定的資料庫不存在，或指定的索引目標端無效時，產生錯誤訊息於螢幕上； (c) If new data is passed in, the specified database does not exist, or the specified index target is invalid, an error message is generated on the screen;

(d)接收或更新資料過程處理失敗或錯誤時，產生錯誤訊息於螢幕上。 (d) When the process of receiving or updating the data fails or is incorrect, an error message is generated on the screen.

(e)接收資料匯入模組170完成時，會將資料依據其特性而分類並導入至資料庫220各個相關之資料表格中，透過區分型態來分類是否需要可再進一步擷取，並建立其索引值。 (e) When the receiving data import module 170 is completed, the data is classified according to its characteristics and imported into the relevant data tables of the database 220, and the classification is classified to determine whether it is necessary to further extract and establish Its index value.

(f)接收或更新資料完成時，檢查一下接收資料是否齊全，若不齊全則產生錯誤訊息。 (f) When receiving or updating the data, check whether the received data is complete. If it is not complete, an error message will be generated.

(g)接收到其他功能模組訊息時，將其訊息直接顯示於螢幕上。 (g) When receiving other function module messages, display their messages directly on the screen.

(h)每日利用結構化查詢語言(Structured Query Language， SQL)整合服務(Integration Services)提供定時排程功能，可接收及輸出資料至資料庫220。 (h) Daily use of Structured Query Language (Structured Query Language, SQL) Integration Services provides a scheduled scheduling function that receives and outputs data to the repository 220.

3.資料篩選模組120係將爬取到商品資料且已存進資料庫220進行篩選作業，其運作包括：(a)主要功能係爬取到商品資料已存進資料庫220進行篩選作業，且需預先建立過濾資料表將無法判斷之符號或詞句等不相關內容濾除，或建立需要代替之字詞或符號之資料表，然後將資料自動傳至下一模組做處理；(b)若有資料內容無法判斷且多數為亂碼，逕行發送給資料審核模組180做審核參考，但這部分資料約占1%以下；(c)提供定時排程檢查功能，若作業處理中需要其他功能模組配合進行，則產生處理訊息，並下達指令，送交相關功能模組。 3. The data screening module 120 will crawl the product data and store it in the database 220 for screening operations. The operation includes: (a) the main function is to climb the product data and store it in the database 220 for screening operations. It is necessary to pre-establish a filter data table to filter out irrelevant content such as symbols or words that cannot be judged, or to create a data table of words or symbols that need to be replaced, and then automatically transfer the data to the next module for processing; (b) If the content of the data cannot be judged and most of them are garbled, the route is sent to the data review module 180 for audit reference, but this part of the data accounts for less than 1%; (c) provides the scheduled schedule check function, if other functions are needed in the job processing When the module cooperates, it generates a processing message, and issues an instruction to the relevant function module.

4.資料審核模組180接收資料篩選模組120之商品資料內容，資料內容大多是無法判斷且為亂碼，需要審核資料筆數非常少，且主動產製一個審核介面供客服人員做審核動作。 4. The data review module 180 receives the product data content of the data screening module 120. Most of the data content cannot be judged and is garbled. The number of documents to be reviewed is very small, and an audit interface is actively produced for the customer service personnel to perform the audit action.

5.資料比對模組130係將執行資料比對產生作業，會接受資料篩選模組120或者是資料審核模組180傳遞的指令產生相對應的資料，且依據設定映射模組140的「各個商品資料類型」之資料表交叉比對，並會將比對完的資料逕行發送給資料管理模組150彙整至來源類型欄位，比對方法是以權重值公式計算。 5. The data comparison module 130 will perform the data comparison generation operation, and will receive the corresponding information by the data filtering module 120 or the data transmission module 180, and according to the setting mapping module 140 The data sheet of the product data type is cross-aligned, and the compared data path is sent to the data management module 150 to be merged into the source type field, and the comparison method is calculated by the weight value formula.

請參閱第4圖至第6圖，係為本發明之大量資料匯入篩選管理的方法之資料計算解說圖。如第4圖至第6圖所示，該比對方法以權重值公式計算，以一實施例進行說明，其步驟如下： Please refer to FIG. 4 to FIG. 6 , which are diagrams for calculating the data of the method for filtering and managing a large amount of data into the present invention. As shown in FIG. 4 to FIG. 6 , the comparison method is calculated by a weight value formula, and is described by an embodiment. The steps are as follows:

i.該XML類型之輸入商品透過設定映射模組140所定義出「資料類型資料表」做交叉比對。例如：「asus平板電腦+mPro最新校園專案優惠月繳750」為資料檢索字詞進行資料比對，假設「比對資料表」中，符合資料檢索條件字元而被檢索出來的結果有：「iPad min平板」、「輕鬆FUN月繳750」、「大家講月租方案750」、「mPro450+3G183」、及「平板+mPro750」等結果標的。其中以公式Nd/Ns作為權重值計算之一，依據其規則拆解「asus平板電腦+mPro最新校園專案優惠月繳750」且符合欲比對字串之相同字的個數，比值分別為「iPad min平板」：5/10、「輕鬆FUN月繳750」：6/10、「大家講月租方案750」：5/10、「mPro450+3G183」：7/13、及「平板+mPro750」：10/10。其以上述例子可知「平板+mPro750」在Nd/Ns計算數值最高。 i. The XML-type input product is cross-matched by the "data type data table" defined by the setting mapping module 140. For example, "asus tablet + mPro latest campus project offer monthly payment of 750" for data search terms for data comparison, assuming that the "alignment data table", the data retrieval condition characters are retrieved and the results are: " "iPad min tablet", "Easy FUN monthly payment 750", "Let's talk about monthly rent plan 750", "mPro450+3G183", and "flat + mPro750" and other results. Among them, the formula Nd/Ns is used as one of the weight value calculations. According to the rules, the “asus tablet + mPro latest campus project offer monthly payment 750” is dismantled and the number of identical words of the desired pair is matched. The ratio is “ iPad min tablet": 5/10, "Easy FUN monthly payment 750": 6/10, "Let's talk about monthly rent plan 750": 5/10, "mPro450+3G183": 7/13, and "flat + mPro750" :10/10. It can be seen from the above example that "flat plate + mPro750" has the highest calculation value in Nd/Ns.

ii.以「asus平板電腦+mPro最新校園專案優惠月繳750」資料檢索字詞進行第二階段，符合欲比對字串相同字的位置順序比對評分，其中以公式Q()作為權重值計算之一，其比對之評分結果分別為：「iPad min平板」：0.09、「輕鬆FUN月繳750」：0.18、「大家講月租方案750」：0.12、「mPro450+3G183」：0.138、及「平板+mPro750」：0.3。其以上述例子可知「平板+mPro750」在Q()計算數值最高。 Ii. The second stage is to use the “asus tablet + mPro latest campus project offer monthly payment 750” data retrieval term, which is in accordance with the position comparison of the same word in the string, with the formula Q ( As one of the weight value calculations, the results of the comparison are: "iPad min tablet": 0.09, "Easy FUN monthly payment 750": 0.18, "Let's talk about monthly rent plan 750": 0.12, "mPro450+3G183 ": 0.138, and "flat + mPro750": 0.3. It can be seen from the above example that "flat + mPro750" is in Q ( ) The highest value is calculated.

iii.以「asus平板電腦+mPro最新校園專案優惠月繳750」資料檢索字詞進行第三階段，符合欲比對字串相同字詞的連續性比對評分，其中以公式(S(Pd))/Ps作為權重值計算之一，其比對之評分結果分別為：「iPad min平板」：2/10、「輕鬆FUN月繳750」：5/10、「大家講月租方案750」：3/10、「mPro450+3G183」：4/13、及「平板+mPro750」：4/10。其以上述例子可知「輕鬆FUN月繳750」在(S(Pd))/Ps計算數值最高。 Iii. The third stage is to use the “asus tablet + mPro latest campus project offer monthly payment 750” data retrieval term, which is consistent with the continuous comparison score of the same word in the string, with the formula ( S ( Pd ) ) / Ps is one of the weight value calculations. The results of the comparison are: "iPad min tablet": 2/10, "Easy FUN monthly payment 750": 5/10, "Let's talk about monthly rent plan 750": 3/10, "mPro450+3G183": 4/13, and "flat + mPro750": 4/10. It can be seen from the above example that "Easy FUN Monthly Payment 750" has the highest calculation value at ( S ( Pd )) / Ps .

iv.最後，傳回各計算數值時，先將各權重比值跟各權重因子加以相乘，並以加總數值最高作為選擇資料分類之依據，可以得到最後選擇傳回資料之分類為：「平板+mPro750」。 Iv. Finally, when returning the calculated values, the weight ratios are multiplied by the weighting factors, and the highest total value is used as the basis for selecting the data classification. The classification of the last selected data can be obtained as: +mPro750".

如上所述，該權重值公式計算之流程說明如下： As described above, the process of calculating the weight value formula is as follows:

(a)主要功能係執行比對作業接受資料篩選模組120或者是資料審核模組180傳遞的指令產生相對應的資料，再依據設定映射模組140的「各個商品資料類型」之資料表交叉比對。 (a) The main function is to execute the data corresponding to the instruction transmitted by the job acceptance data screening module 120 or the data review module 180, and then according to the data sheet of the "various product data types" of the mapping module 140. Cross comparison.

(b)依據收到各權重比值跟各權重因子加以相乘，並以加總數值最高作為選擇資料分類之依據，可更有效和精準判斷此產品資料屬於何種類型。 (b) Multiplying the weighting factors according to the received weight ratios and using the highest total value as the basis for selecting the data classification, which can more effectively and accurately determine the type of the product data.

(c)再將資料回傳資料管理模組150且彙整至來源類型欄位，並通知資料管理模組150處理作業已經完成。 (c) Returning the data to the data management module 150 and consolidating it to the source type field, and notifying the data management module 150 that the processing has been completed.

(d)提供定時排程檢查功能，若作業處理中需要其他功能模組配合進行，則產生處理訊息，並下達指令，送交相關功能模組。 (d) Provide a timed schedule check function. If other function modules are required to work in the process, a processing message is generated, and an instruction is issued and sent to the relevant function module.

(e)依據其他功能模組的要求，顯示指定訊息於螢幕上，或mail通知其客服人員。 (e) Display the specified message on the screen according to the requirements of other function modules, or notify its customer service by mail.

(f)依據收到其他功能模組的要求，將其錯誤訊息顯示於螢幕上、列印於報表上、並記錄於系統事件日誌資料庫。 (f) According to the requirements of other functional modules, the error message is displayed on the screen, printed on the report, and recorded in the system event log database.

(g)依據客服人員的需求，可以不限次數的查詢系統的事件記錄、呼叫記錄、目前有哪些XML的商品資料正在處理中，並可隨時產生報表。 (g) According to the needs of the customer service staff, the system can record the event record, call record, and which XML product data is currently being processed in an unlimited number of times, and can generate reports at any time.

6.設定映射模組140係預先定義「各個商品資料類型」之資料表與「各個索引目標端定義處理」之資料表供資料比對模組130和資料管理模組150比對資料及指定放置位置寫入到主機資料庫，其運作包括： 6. The setting mapping module 140 defines a data table of "each product data type" and a data table of "each index target definition processing" for the data comparison module 130 and the data management module 150 to compare data and specify placement. The location is written to the host repository and its operations include:

(a)將預先定義「各個商品資料類型」之資料表與「各個索引目標端定義處理」之資料表，並將此資料表供資料比對模組130和資料管理模組150讀取，進而處理下一階段，且寫入過程中，若發生資料格式錯誤，或任何異常錯誤，即將這些錯誤資料寫入資料記錄。 (a) a data table of "each commodity data type" and a data table of "each index target definition processing" are defined in advance, and the data table is read by the data comparison module 130 and the data management module 150, and further Processing the next stage, and during the writing process, if a data format error occurs, or any exception Error, that is, the error data is written into the data record.

(b)未來若客服人員有擴欄或者是更改產品名稱需求，可在此模組中彈性增加欄位數量與名稱，不需直接更改程式。 (b) In the future, if the customer service staff has expanded or changed the product name requirements, the number and name of the field can be flexibly increased in this module without directly changing the program.

7.資料管理模組150係把商品資料擷取至主機資料庫230且需要進行管理的動作，並判斷匯入的資料特性，爬取XML型態之商品資料和寫入各資料庫和資料表中，其運作包括： 7. The data management module 150 extracts the product data to the host database 230 and needs to perform management actions, and determines the characteristics of the imported data, crawls the XML type of commodity data, and writes to each database and data table. Its operations include:

(a)可以依據設定映射模組140所傳送過來資料表，管理主機資料庫內所有的商品資料。 (a) All the product data in the host database can be managed according to the data table transmitted by the setting mapping module 140.

(b)接收資料庫的資料後，可根據資料特性，逕行決定此筆資料儲存於主機資料庫之哪一類資料表格。 (b) After receiving the data of the database, according to the characteristics of the data, it is possible to determine which type of data table the data is stored in the host database.

(c)可提供資料抽取模組160，所篩選與比對過的大量正確資料，且提供資料抽取模組160逕行搜尋主機資料庫230之該資料表格。 (c) The data extraction module 160 can be provided with a large amount of correct data that has been filtered and compared, and the data extraction module 160 is provided to search the data table of the host database 230.

(d)遇到網路斷線，或資料庫當機沒有回應時，產生告警訊息。 (d) An alarm message is generated when a network disconnection occurs or the database fails to respond.

(e)每日利用結構化查詢語言；(Structured Query Language，SQL)整合服務(Integration Services)提供定時排程功能，可接收及輸出資料。 (e) Daily use of structured query language; (Structured Query Language, SQL) Integration Services (Integration Services) provides timing scheduling to receive and output data.

8.資料抽取模組160將標準化過的商品資料從主機資料庫讀取，其運作包括： 8. The data extraction module 160 reads the standardized commodity data from the host database, and its operations include:

(a)根據需求者依據展示介面240所提出申請需求彙整進行批次遞送，以供後續抽取主機資料庫之需求資料，且把查詢結果輸出標準格式，並將資料回傳於展示介面240。 (a) batch delivery according to the requester's request for requisition according to the demand interface of the display interface 240, for subsequent extraction of the demand data of the host database, and The query results output a standard format and the data is passed back to the presentation interface 240.

(b)自動發mail通知該人員已經開始在處理此份申請案件。 (b) Automatically send a mail to inform the person that the application has been processed.

(c)若處理好之後，自動通知該申請人員。 (c) If notified, the applicant will be automatically notified.

請參閱第7圖，第7圖為本發明之大量資料匯入篩選管理的系統與方法之時序圖。如第7圖所示，其資料流程順序圖係說明各模組之間的呼叫順序，按照呼叫時間來排序，由上而下，由左而右。傳送大量XML商品資料至資料匯入模組170，資料匯入模組170會存成一份檔案至實體位置及開始爬取整份文件至資料庫暫存，接者資料匯入模組170開始呼叫建立索引分類模組110，接著建立索引分類模組110回傳訊息給資料匯入模組170，要求資料匯入模組170下載資料至資料庫，並且將資料交給建立索引分類模組110處理；建立索引分類模組110判斷各索引目標端是否有值，可將其分為單一、多值或空值，建立一份完整索引表格，且將結果存放於資料庫。再呼叫資料篩選模組120繼續處理，資料篩選模組120負責置換、過濾無法判斷之符號或詞句等不相關內容；假設一般資料通過資料篩選模組120的處理之後，會繼續啟動資料比對模組130，將正確的資料送至資料比對模組130，與設定映射模組140做比對動作；若資料內容有誤需要修改，則會通知資料審核模組180會進入資料編輯流程送出後，會將編輯過後資料再送至資料比對模組130處理。 Please refer to FIG. 7. FIG. 7 is a timing diagram of a system and method for filtering and managing a large amount of data into the present invention. As shown in Figure 7, the data flow sequence diagram illustrates the order of calls between modules, sorted by call time, from top to bottom, from left to right. Transmitting a large amount of XML product data to the data import module 170, the data import module 170 will save a file to the physical location and start crawling the entire file to the database temporary storage, and the data import module 170 starts the call. The index classification module 110 is created, and then the index classification module 110 is configured to return a message to the data import module 170, requesting the data import module 170 to download the data to the database, and handing the data to the index classification module 110 for processing. The index classification module 110 determines whether the index target has a value, and can divide it into a single, multi-value or null value, establish a complete index table, and store the result in the database. The call data filtering module 120 continues to process, and the data filtering module 120 is responsible for replacing and filtering irrelevant content such as symbols or words that cannot be determined; and assuming that the general data is processed by the data filtering module 120, the data matching mode is continued. The group 130 sends the correct data to the data comparison module 130 and compares with the setting mapping module 140. If the data content is incorrectly modified, the data review module 180 is notified to enter the data editing process. The edited data is sent to the data comparison module 130 for processing.

需求者一開始會預先至設定映射模組140設定「各個商品資料類型」與「各個索引目標端定義處理」之對應資料表，接者設定映射模組140會等待資料比對模組130和管理模組150呼叫，回傳對應資料表給資料比對模組130及資料管理模組150。 The demander initially sets the corresponding data table of "each product data type" and "each index target definition processing" to the setting mapping module 140, and the setting mapping module 140 waits for the data comparison module 130 and management. Module 150 calls, returns the corresponding data table to the data comparison module 130 and data management Module 150.

若展示介面240開始提出申請需求，接者資料抽取模組160開始呼叫資料管理模組150，接著資料管理模組150回傳訊息給資料抽取模組160，資料抽取模組160寄發簡訊通知該人員已經開始在處理此份申請案件，處理完成之後，資料管理模組150直接將資料傳送給資料抽取模組160，資料抽取模組160回傳給展示介面240。 If the display interface 240 begins to apply for the application, the recipient data extraction module 160 starts to call the data management module 150, and then the data management module 150 returns a message to the data extraction module 160, and the data extraction module 160 sends a short message to notify the The personnel has already processed the application case. After the processing is completed, the data management module 150 directly transmits the data to the data extraction module 160, and the data extraction module 160 transmits the data to the display interface 240.

上列詳細說明乃針對本發明之一可行實施例進行具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。 The detailed description of the present invention is intended to be illustrative of a preferred embodiment of the invention, and is not intended to limit the scope of the invention. The patent scope of this case.

綜上所述，本案不僅於技術思想上確屬創新，並具備習用之傳統方法所不及之上述多項功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請貴局核准本件發明專利申請案，以勵發明，至感德便。 To sum up, this case is not only innovative in terms of technical thinking, but also has many of the above-mentioned functions that are not in the traditional methods of the past. It has fully complied with the statutory invention patent requirements of novelty and progressiveness, and applied for it according to law. Approved this invention patent application, in order to invent invention, to the sense of virtue.

120‧‧‧資料篩選模組 120‧‧‧Information screening module

130‧‧‧資料比對模組 130‧‧‧ data comparison module

140‧‧‧設定映射模組 140‧‧‧Set mapping module

150‧‧‧資料管理模組 150‧‧‧Data Management Module

160‧‧‧資料抽取模組 160‧‧‧Data extraction module

170‧‧‧資料匯入模組 170‧‧‧ Data Import Module

180‧‧‧資料審核模組 180‧‧‧ Data Audit Module

Claims

A system for importing a large amount of data into a screening management system, comprising at least: establishing an index classification module, receiving data and determining whether each target end has an index value, and pre-establishing an index target end and an index value are stored in the data base; The screening module sets the configuration and receives the data of the indexing classification module for filtering; a data review module receives the data of the data screening module, and the data is undeterminable or garbled, and actively establishes an audit. Interface; a data comparison module, receiving the data screening module or the data review module data, and performing cross-matching according to the data table of the set mapping module; the setting mapping module uses the program logic to perform multi-layer group Grouping and pre-defining the data sheets of each module and converting them into standard format data of the system; a data management module receiving the data table of the setting mapping module and the data of the matching module, according to each The data table field performs feature name classification; and a data extraction module performs batch delivery according to the display interface demand collection, and receives the data tube The data module, then the data transmitted back to some of the display interface.

For example, the large amount of data mentioned in the first paragraph of the patent application is incorporated into the screening management system, wherein the index values are single value, multi-valued value and null value, and the single value is that the target end does not need further analysis. And the multi-value system further includes other index values for the target end, and needs further analysis, and the null value is the target. There are no other index values on the side and no further parsing is required.

For example, the system for importing a large amount of data referred to in the first application of the patent scope includes a data import module, and the external data is transmitted to the index classification module, and the data import module performs timing scheduling. The job is to receive the data and classify it into an entity location or database for temporary storage.

The system for importing a large amount of data as described in item 1 of the patent application into the screening management system, wherein the data matching module and the data management module receive the index values, and the returned data is stored in the database.

For example, the large amount of data referred to in item 1 of the patent application is incorporated into the screening management system, wherein the data sheets are data sheets of various data types and data sheets processed by each target end, and the data matching modules are provided. And the data management module reads the predetermined data and specifies the placement location.

The system for importing a large amount of data according to item 5 of the patent application scope into the screening management system, wherein the data management module pre-sets the data according to the cross-reference of the data table processed by each target end of the setting mapping module. Source type, destination name, destination data sheet, destination field or extraction processing field, and stored in each database and data sheet.

A method for importing a large amount of data into a screening management, the steps of which at least include: (1) establishing an index classification module to receive data, and classifying the index target end name, and establishing an index value and an index target end; (2) data filtering mode The group sets the configuration and receives the data of the indexing classification module for filtering. If the judgment analysis can be performed, the following step 4 is performed; otherwise, the following step 3 is performed; (3) the data review module receives the data of the data screening module. And the information The system is unable to judge the analysis or garbled, and actively establishes the audit interface for debugging; (4) the data comparison module receives the data screening module or the data review module data, and according to a data sheet of the setting mapping module Perform cross-comparison; (5) set the mapping module to use the program logic to perform multi-level group integration, and pre-define the data table of each module, and convert it into standard format data of the system; (6) the data management module receives the setting The data table of the mapping module compares the data of the module with the data, and classifies the feature name according to each data table field; and (7) the data extraction module performs batch delivery according to the display interface demand, and receives the data management. The data of the module is then transmitted back to the display interface.

The method for importing a large amount of data according to item 7 of the patent application into the screening management method, wherein the index value classification of the step (1) is a single value, a multi-value and a null value, and the single value is only the target end The single index value does not need to be further parsed, and the multi-value system further includes other index values for the target end, and needs to be further parsed, and the null value has no other index value for the target end, so no further parsing is needed.

The method for importing a large amount of data according to item 7 of the patent application scope into the screening management method, wherein the data comparison module and the data management module receive the index values of the index target ends, and analyze and compare the results. Stored in the database.

If the large amount of data mentioned in item 7 of the patent application scope is included in the screening management The method, wherein the step (4) further comprises: receiving, by the data matching module, the data screening module, the data review module and the setting mapping module, and transmitting the compared data to the data management The module, the data management module performs the classification and classification.

For example, the method for importing a large amount of data according to item 7 of the patent application scope into the screening management method, wherein the comparison method of the step (4) is based on the data data and the comparison process, and the calculation of the weight value is performed, and The weight value is classified, and the weight value formula is as follows: ++β+γ=10 , which means If the calculated value is less than 0.1, take 0.1 directly; Express If the calculated value is less than 0.1, then take 0.1; Ws = weight value score sum; α, β, γ = weight factor; Q = order quality; Nd = index target data content to be compared to the same number of words; Ns=Set the number of keywords in the definition data table of the mapping module; = The order of the name of the data content of the index target is to be compared; = Set the keyword position order of the definition data table of the mapping module; S(Pd)=the data content of the index target end is the same as the name of the matching word; and the key of the definition data table of the Ps=setting mapping module Word continuity.

If the large amount of data mentioned in item 7 of the patent application scope is included in the screening management The method, wherein the data management module performs feature name classification according to the setting mapping module, analyzes and judges the data delivery process, and stores the data record in the database.

The method for importing a large amount of data according to item 7 of the patent application into the screening management method, wherein the step (7) of the data extraction module receives the external instruction for the collection batch to transmit the data according to the automatic scheduling.

The method for importing a large amount of data as described in claim 12 of the patent application into the screening management method, wherein the determining data delivery process is performed according to the content set by the specific field name of the setting mapping module.

The method for importing a large amount of data as described in claim 13 of the patent application into the screening management method, wherein the automatic scheduling refers to automatically executing the next process according to the planned process without manual intervention, including pooling requirements and batches. Processes for delivering, extracting data, identifying data content and types, and formalizing data.