WO2016197852A1 - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
WO2016197852A1
WO2016197852A1 PCT/CN2016/084442 CN2016084442W WO2016197852A1 WO 2016197852 A1 WO2016197852 A1 WO 2016197852A1 CN 2016084442 W CN2016084442 W CN 2016084442W WO 2016197852 A1 WO2016197852 A1 WO 2016197852A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
classified
original data
classification
original
Prior art date
Application number
PCT/CN2016/084442
Other languages
French (fr)
Chinese (zh)
Inventor
戢洋
甘云锋
肖禹
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016197852A1 publication Critical patent/WO2016197852A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

A data processing method and device. The method comprises: acquiring original data (101); classifying the acquired original data (102); and when a service to be processed is received, extracting needed data from the classified data according to a need of the service to be processed (103). The data processing method and device realize the automatization of data processing without conducting manual processing, so that the calculation result can be generally used and reused, thereby improving the efficiency.

Description

一种数据处理方法和设备Data processing method and device
本申请要求2015年06月09日递交的申请号为201510312912.7、发明名称为“一种数据处理方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
技术领域Technical field
本申请实施例涉及通信技术领域,特别涉及一种数据处理方法和设备。The embodiments of the present invention relate to the field of communications technologies, and in particular, to a data processing method and device.
背景技术Background technique
传统的数据建模方式,是将数据从源系统中抽出,再经过手工编写SQL(Structured Query Language,结构化查询语言)将抽出的数据整合成为数据仓库标准的维表结构,之后整个数据仓库的建模就完成了,后续根据互联网业务模式,一般会有以下两类需求:The traditional method of data modeling is to extract the data from the source system, and then manually write the SQL (Structured Query Language) to integrate the extracted data into the data warehouse standard dimension table structure, and then the entire data warehouse. Modeling is complete, and following the Internet business model, there are generally two types of requirements:
一、将数据仓库标准的维表通过手工编写SQL的方式整合成为业务大宽表;First, the data warehouse standard dimension table is integrated into a business wide table by manually writing SQL;
二、将多个数据仓库标准的维表通过手工编写SQL的方式整合成为算法模型需要的输入样本集。Second, the multiple dimension tables of the data warehouse are integrated into the input sample set required by the algorithm model by manually writing SQL.
可见在现有技术中,不管是哪种需求,都是需要手工根据需求来整合的,这样导致计算结果不可通用复用,效率低下,且人工维护成本比较高。It can be seen that in the prior art, no matter what kind of requirements, it is required to be manually integrated according to requirements, so that the calculation results are not universally multiplexed, the efficiency is low, and the manual maintenance cost is relatively high.
发明内容Summary of the invention
针对现有技术中的缺陷,本申请提出了一种数据处理方法,包括:In view of the deficiencies in the prior art, the present application proposes a data processing method, including:
获取原始的数据;Get the original data;
将获取的原始的数据进行分类;Classify the acquired raw data;
当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据。When receiving the to-be-processed service, the required data is extracted from the classified data according to the needs of the to-be-processed service.
可选的,所述原始的数据包括:新增的数据,更新的数据,特定领域的数据;Optionally, the original data includes: new data, updated data, and specific domain data;
所述获取原始的数据,包括:The obtaining the original data includes:
定时从预设的多个数据库中获取新增的数据;Regularly obtaining new data from a preset plurality of databases;
定时从预设的多个数据库中获取更新的数据;Regularly obtaining updated data from a plurality of preset databases;
定时基于关键词获取预定领域的数据。Regularly acquire data of a predetermined field based on keywords.
可选的,在所述将获取的原始的数据进行分类,之前还包括:Optionally, before the classifying the obtained original data, the method further includes:
将获取的原始的数据存储在操作数据源ODS中,并对所述ODS中原有的数据与获 取的原始的数据进行整合。The obtained original data is stored in the operation data source ODS, and the original data in the ODS is obtained. Take the raw data for integration.
可选的,所述将获取的原始的数据进行分类,包括:Optionally, the classifying the obtained original data includes:
根据预设的分类规则和分类需要设置分类配置参数;The classification configuration parameters need to be set according to the preset classification rules and classifications;
整合所有的分类配置参数生成分类整合模板数据;Integrate all classification configuration parameters to generate classification integration template data;
基于所述分类整合模板数据和多源数据整合框架生成SQL代码;Generating SQL code based on the classified integrated template data and the multi-source data integration framework;
通过所述SQL代码从所述ODS中获取原始的数据,以及将获取的原始的数据按照对象进行分类;Obtaining the original data from the ODS by using the SQL code, and classifying the acquired original data according to the object;
将分类后的数据存储在数据仓库DW中,并对所述DW中原有的数据与获取的分类后的数据进行整合;The classified data is stored in the data warehouse DW, and the original data in the DW is integrated with the obtained classified data;
其中所述对象包括:时间,地点,事件,人物,关系。The objects include: time, place, event, person, relationship.
可选的,当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据,具体包括:Optionally, when the service to be processed is received, the required data is extracted from the classified data according to the needs of the to-be-processed service, and specifically includes:
当接收到待处理业务后,基于预设的规则分析所述待处理业务的需要以确定处理所述待处理业务所需要的数据;After receiving the to-be-processed service, analyzing the need of the to-be-processed service based on a preset rule to determine data required to process the to-be-processed service;
基于确定的数据从分类后的数据中提取的数据,存储在数据集市DM中。The data extracted from the classified data based on the determined data is stored in the data mart DM.
本申请还提出了一种数据处理设备,包括:The application also proposes a data processing device, comprising:
获取模块,用于获取原始的数据,Get the module to get the original data,
分类模块,用于将获取的原始的数据进行分类;a classification module for classifying the acquired raw data;
提取模块,用于当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据。And an extracting module, configured to extract required data from the classified data according to the needs of the to-be-processed service when receiving the to-be-processed service.
可选的,所述原始的数据包括:新增的数据,更新的数据,特定领域的数据;Optionally, the original data includes: new data, updated data, and specific domain data;
所述获取模块,具体用于:The obtaining module is specifically configured to:
定时从预设的多个数据库中获取新增的数据;Regularly obtaining new data from a preset plurality of databases;
定时从预设的多个数据库中获取更新的数据;Regularly obtaining updated data from a plurality of preset databases;
定时基于关键词获取预定领域的数据。Regularly acquire data of a predetermined field based on keywords.
可选的,该设备还包括:Optionally, the device further includes:
整合模块,用于将获取的原始的数据存储在操作数据源ODS中,并对所述ODS中原有的数据与获取的原始的数据进行整合。The integration module is configured to store the acquired original data in the operation data source ODS, and integrate the original data in the ODS with the acquired original data.
可选的,所述分类模块,具体用于:Optionally, the classification module is specifically configured to:
根据预设的分类规则和分类需要设置分类配置参数; The classification configuration parameters need to be set according to the preset classification rules and classifications;
整合所有的分类配置参数生成分类整合模板数据;Integrate all classification configuration parameters to generate classification integration template data;
基于所述分类整合模板数据和多源数据整合框架生成SQL代码;Generating SQL code based on the classified integrated template data and the multi-source data integration framework;
通过所述SQL代码从所述ODS中获取原始的数据,以及将获取的原始的数据按照对象进行分类;Obtaining the original data from the ODS by using the SQL code, and classifying the acquired original data according to the object;
将分类后的数据存储在数据仓库DW中,并对所述DW中原有的数据与获取的分类后的数据进行整合;The classified data is stored in the data warehouse DW, and the original data in the DW is integrated with the obtained classified data;
其中所述对象包括:时间,地点,事件,人物,关系。The objects include: time, place, event, person, relationship.
可选的,提取模块,具体用于:Optionally, the extraction module is specifically used to:
当接收到待处理业务后,基于预设的规则分析所述待处理业务的需要以确定处理所述待处理业务所需要的数据;After receiving the to-be-processed service, analyzing the need of the to-be-processed service based on a preset rule to determine data required to process the to-be-processed service;
基于确定的数据从分类后的数据中提取的数据,存储在数据集市DM中。The data extracted from the classified data based on the determined data is stored in the data mart DM.
与现有技术相比,本申请中通过将获取的原始的数据进行分类;以便当接收到待处理业务时,根据待处理业务的需要从分类后的数据中提取需要的数据,实现了数据处理的自动化,无需进行人工处理,使得计算结果是可以通用复用的,提高了效率。Compared with the prior art, in the present application, the obtained original data is classified; so that when the to-be-processed service is received, the required data is extracted from the classified data according to the needs of the to-be-processed service, and data processing is realized. The automation does not require manual processing, so that the calculation results can be universally multiplexed, improving efficiency.
附图说明DRAWINGS
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings to be used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only some of the present application. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.
图1为本申请实施例提出的一种数据处理方法的流程示意图;1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
图2为本申请实施例提出的一种数据处理方法的示意图;2 is a schematic diagram of a data processing method according to an embodiment of the present application;
图3为本申请实施例提出的一种数据处理设备的结构示意图。FIG. 3 is a schematic structural diagram of a data processing device according to an embodiment of the present application.
具体实施方式detailed description
如背景技术所述,针对现有技术中的缺陷,本申请中提出了一种数据处理方法,如图1所示,包括以下步骤:As described in the background art, in the prior art, a data processing method is proposed in the present application, as shown in FIG. 1 , including the following steps:
步骤101、获取原始的数据。Step 101: Obtain original data.
具体的,原始的数据可以是各种数据,具体可以基于需要进行选择,而基于需要就可以从各数据库中获取原始的数据,例如可以如图2所示,可以从以下数据库中进行获 取:旅馆住宿订房记录数据库,铁路买票乘车记录数据库,民航预定乘机记录数据库,人口普查记录数据库,违法犯罪记录数据库,等等,具体的可以基于需要进行设置,还可以基于需要从其他数据库中获取原始的数据。Specifically, the original data may be various data, and may be selected based on needs, and the original data may be obtained from each database as needed, for example, as shown in FIG. 2, and may be obtained from the following database. Take: hotel accommodation reservation record database, railway ticket purchase travel record database, civil aviation reservation flight record database, census record database, illegal crime record database, etc., can be set based on needs, and can also be based on other databases Get the raw data in .
随着时间的变化,不断会有新的数据产生,而旧有的数据也会不断的进行更新更新,另外,基于某些需要还可能需要特定的领域的数据,因此原始的数据可以包括:新增的数据,更新的数据,特定领域的数据;因此具体的获取过程可以包括:As time goes on, new data will continue to be generated, and the old data will be updated and updated continuously. In addition, data of specific fields may be needed based on certain needs, so the original data may include: new Increased data, updated data, domain-specific data; therefore the specific acquisition process can include:
定时从预设的多个数据库中获取新增的数据;Regularly obtaining new data from a preset plurality of databases;
定时从预设的多个数据库中获取更新的数据;Regularly obtaining updated data from a plurality of preset databases;
定时基于关键词获取预定领域的数据。Regularly acquire data of a predetermined field based on keywords.
其中预设的多个数据库就可以包括上述的多个数据库,还可以基于需要从其他的数据库中进行获取,例如若需要查询某人(例如为A)的网络购物情况,则需要查询网络购物记录数据库,来得到淘宝上该用户A的账户记录,从而得知在淘宝上的网络购物情况,至于其他的网站的购物记录,例如天猫等与此类似。The preset plurality of databases may include the plurality of databases described above, and may also be obtained from other databases based on the need, for example, if it is required to query the online shopping situation of a person (for example, A), the network shopping record needs to be queried. The database, to get the account record of the user A on Taobao, to know the online shopping situation on Taobao, as for the shopping records of other websites, such as Tmall, etc. are similar.
而在获取了原始的数据之后,需要处理该原始的数据,具体的,可以将将获取的原始的数据存储在ODS(Operational Data Store,操作数据源)中,并对ODS中原有的数据与获取的原始的数据进行整合。例如,获取的原始数据中包含有数据1,数据2,数据3,而ODS中原有的数据中存在数据3,两个数据3是重复的,就可以任删一个,例如可以保留ODS中原有的数据3,而删除获取的原始的数据中的数据3,以此在保证数据完整全面的同时,避免重复多余的数据出现。After the original data is obtained, the original data needs to be processed. Specifically, the original data to be acquired may be stored in an ODS (Operational Data Store), and the original data and the ODS are acquired. The raw data is integrated. For example, the obtained original data includes data 1, data 2, and data 3, and the original data in the ODS has data 3, and the two data 3 are duplicated, and one can be deleted, for example, the original ODS can be retained. Data 3, and delete the data 3 in the original data obtained, so as to ensure that the data is complete and comprehensive, and avoid redundant data.
步骤102、将获取的原始的数据进行分类。Step 102: Sort the obtained original data.
具体的,在步骤101中,只是获取了数据,而数据有很多,为此本申请中对获取的数据进行分类,具体的过程包括:根据预设的分类规则和分类需要设置分类配置参数;整合所有的分类配置参数生成分类整合模板数据;基于所述分类整合模板数据和多源数据整合框架生成SQL代码;通过所述SQL代码从所述ODS中获取原始的数据,以及将获取的原始的数据按照对象进行分类;将分类后的数据存储在数据仓库DW中,并对所述DW中原有的数据与获取的分类后的数据进行整合;其中所述对象包括:时间,地点,事件,人物,关系;以便后续在需要的时候能快捷地进行提取,具体的分类过程可以如下:Specifically, in step 101, only data is acquired, and there are many data. For this reason, the obtained data is classified in the present application, and the specific process includes: setting classification configuration parameters according to preset classification rules and classification requirements; All classification configuration parameters generate classification integration template data; generate SQL code based on the classification integration template data and multi-source data integration framework; obtain original data from the ODS through the SQL code, and obtain original data Sorting according to the object; storing the classified data in the data warehouse DW, and integrating the original data in the DW with the obtained classified data; wherein the objects include: time, place, event, person, Relationship; so that the subsequent extraction can be performed quickly when needed, the specific classification process can be as follows:
基于预设的分类规则和分类需要设置分类配置参数,分类规则中包含有分类的各个步骤,例如步骤可以有:提取原始数据,对原始数据进行扫描以确定各原始数据的多维 度特征,基于分类需要选取特定的特征来对各原始数据进行分类整合,对应的,每个步骤配置对应的分类配置参数,而所有的分类配置参数整合起来就是一套分类的流程,也即对应分类整合模板数据,后续可以通过分类整合模板数据输入多源数据整合框架(用于生成SQL代码),来生成对应的SQL代码,从而可以使得后续若是面对同样的分类需要,就可以直接利用生成的SQL代码来进行分类,而若是要满足不同的需要,则只需要对应的调整分类配置参数就能适应不同的需要。Based on the preset classification rules and classifications, the classification configuration parameters need to be set. The classification rules include various steps of the classification. For example, the steps may include: extracting the original data, and scanning the original data to determine the multidimensional of each original data. Degree feature, based on the classification needs to select specific features to classify and integrate the original data, correspondingly, each step configures the corresponding classification configuration parameters, and all the classification configuration parameters are integrated into a set of classification processes, that is, corresponding The classification integrates the template data, and the subsequent integration of the template data into the multi-source data integration framework (for generating SQL code) can be used to generate the corresponding SQL code, so that if the subsequent classification needs to face the same classification, the generation can be directly utilized. The SQL code is used for classification, and if it is to meet different needs, only the corresponding adjustment configuration parameters can be adapted to different needs.
而当原始数据是存储在ODS中时,利用SQL代码从ODS中获取原始的数据并进行分类;When the original data is stored in the ODS, the original data is obtained from the ODS and classified by using the SQL code;
将获取的原始的数据按照对象进行分类;其中对象包括:时间,地点,事件,人物,关系;按照对象进行分类可以更好的展示各种维度的事件,以此可以更好地满足需要,后续将分类后的数据存储在DW(Data Warehouse,数据仓库)中,并对DW中原有的数据与获取的分类后的数据进行整合。The original data obtained is classified according to objects; the objects include: time, place, event, person, relationship; classification according to the object can better display events of various dimensions, so as to better meet the needs, follow-up The classified data is stored in a DW (Data Warehouse), and the original data in the DW is integrated with the acquired classified data.
具体的分类过程如图2所示,利用SQL代码获取原始的数据,并将获取的原始的数据基于时间,地点,事件,人物,关系进行分类,例如基于时间划分可以将其中涉及到时间的数据,按照时间的先后顺序进行排列,并设定时间区间,以便对时间进行分类,例如时间存在2012.03.06,2015.05.04,2013.03.05,2014.06.03,2013.02.04,可以设定时间区间为1年,因此可以将这几个时间进行划分,具体的,分为区间1(2012.03.06),区间2(2013.02.04,2013.03.05),区间3(2013.03.05),区间4(2015.05.04);而其他的例如地点,可以分为国家,省份,市,县等进行划分,或者按照经纬度进行划分,而事件,则可以基于需要分为交易,转账,犯罪,旅行等等进行划分,人物则可以基于与人有关的身份证,姓名,手机号,邮箱等进行划分,具体的,例如存在3个人,分别为A、B、C,则可以设置分类A中包括身份证,姓名,手机号,邮箱,至于B、C与此类似,在此不再进行赘叙,而关系则可以包括:人际关系,例如好友,同学,老乡等等,还可以是同车司机,结伙作案等等,而原始的数据之间的联系还是存在的,只是将数据进行了分类,例如原始的数据为用户1在时间1与用户2进行了交易,用户1卖给用户2货物1,其中分类后,时间为时间1,人物为用户1和用户2,关系是交易,具体的用户1卖给用户2货物1,在分类后,数据被分为了3部分,不过在分类后都可以从任一部分找到其他的部分。The specific classification process is shown in Figure 2. The original data is obtained by using the SQL code, and the obtained original data is classified based on time, place, event, person, relationship, for example, data based on time division can be involved in the time. Arrange according to the order of time, and set the time interval to classify the time. For example, the time exists 2012.03.06, 2015.05.04, 2013.03.05, 2014.06.03, 2013.02.04, the time interval can be set to 1 year, so these time can be divided, specifically, divided into interval 1 (2012.03.06), interval 2 (2013.02.04, 2013.03.05), interval 3 (2013.03.05), interval 4 (2015.05 .04); Others such as places can be divided into countries, provinces, cities, counties, etc., or divided according to latitude and longitude, and events can be divided into transactions, transfers, crimes, travel, etc. based on needs. The character can be divided based on the ID card, name, mobile phone number, mailbox, etc. related to the person. Specifically, for example, there are 3 people, respectively A, B, C, then the classification A can be set. Including ID card, name, mobile phone number, email address, as for B and C, similar to this, no longer describe it here, but the relationship can include: interpersonal relationships, such as friends, classmates, fellows, etc., can also be the same car Drivers, gangs, etc., and the connection between the original data still exists, only the data is classified, for example, the original data is that user 1 has traded with user 2 at time 1, and user 1 has sold to user 2 for goods. 1, after classification, time is time 1, the character is user 1 and user 2, the relationship is transaction, the specific user 1 sells to user 2 goods 1, after classification, the data is divided into 3 parts, but after classification You can find other parts from any part.
步骤103、当接收到待处理业务时,根据待处理业务的需要从分类后的当接收到待处理业务时,根据待处理业务的需要从分类后的数据中提取需要的数据。 Step 103: When receiving the to-be-processed service, when the to-be-processed service is received from the classified one according to the needs of the to-be-processed service, the required data is extracted from the classified data according to the needs of the to-be-processed service.
其中,具体的提取数据的操作,具体包括:The specific operation of extracting data specifically includes:
当接收到待处理业务后,基于预设的规则分析待处理业务的需要以确定处理待处理业务所需要的数据;基于确定的数据从分类后的数据中提取的数据,存储在数据集市DM(Data Malt,数据集市)中。After receiving the to-be-processed service, analyzing the need of the to-be-processed service based on the preset rule to determine data required for processing the to-be-processed service; and extracting the data extracted from the classified data based on the determined data, and storing in the data mart DM (Data Malt, data mart).
具体的,例如需要对商家A在淘宝上2014年的业绩进行评估来给出评分,首先可以基于预设的规则分析该业务的需要的数据,例如需要商家A中所卖的各种商品,各商品的价格,商家A在2014年的各种商品的销售额,每卖出去的商品是否有评价,有评价的比例,评价中的好中差的评分的数量和比例,评分中有图片的数量和比例,为此,就可以从分类后的数据中获取相应的数据,例如人物的数据就包括各买家的账号,手机号以及其他,关系为与商家A的交易,具体的交易数据,买家对商家A卖出去的商品的评价,时间则为2014年1月1日到2014年12月1日,以此获取前述数据来共同对商家A在淘宝上2014年的业绩进行评估。Specifically, for example, it is required to evaluate the performance of the merchant A on Taobao in 2014 to give a rating. First, the data of the business needs to be analyzed based on a preset rule, for example, various commodities sold in the merchant A are required. The price of the product, the sales of various commodities of the merchant A in 2014, whether there is any evaluation of the products sold, the proportion of the evaluation, the number and proportion of the scores of the good and bad in the evaluation, and the pictures in the score The quantity and proportion, for this purpose, the corresponding data can be obtained from the classified data, for example, the data of the character includes the account number of each buyer, the mobile phone number and others, and the relationship is the transaction with the merchant A, the specific transaction data, The buyer's evaluation of the products sold by the merchant A is from January 1, 2014 to December 1, 2014, in order to obtain the aforementioned data to jointly evaluate the performance of the merchant A on Taobao in 2014.
为了对本申请进行进一步的说明,本申请还公开了一种数据处理设备,如图3所示,包括:In order to further explain the present application, the present application also discloses a data processing device, as shown in FIG. 3, including:
获取模块301,用于获取原始的数据,An obtaining module 301, configured to acquire original data,
分类模块302,用于将获取的原始的数据进行分类;a classification module 302, configured to classify the acquired original data;
提取模块303,用于当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据。The extracting module 303 is configured to: when receiving the to-be-processed service, extract the required data from the classified data according to the needs of the to-be-processed service.
可选的,所述原始的数据包括:新增的数据,更新的数据,特定领域的数据;Optionally, the original data includes: new data, updated data, and specific domain data;
所述获取模块301,具体用于:The obtaining module 301 is specifically configured to:
定时从预设的多个数据库中获取新增的数据;Regularly obtaining new data from a preset plurality of databases;
定时从预设的多个数据库中获取更新的数据;Regularly obtaining updated data from a plurality of preset databases;
定时基于关键词获取预定领域的数据。Regularly acquire data of a predetermined field based on keywords.
可选的,该数据处理设备还包括:Optionally, the data processing device further includes:
整合模块,用于将获取的原始的数据存储在操作数据源ODS中,并对所述ODS中原有的数据与获取的原始的数据进行整合。The integration module is configured to store the acquired original data in the operation data source ODS, and integrate the original data in the ODS with the acquired original data.
可选的,所述分类模块302,具体用于:Optionally, the classification module 302 is specifically configured to:
根据预设的分类规则和分类需要设置分类配置参数;The classification configuration parameters need to be set according to the preset classification rules and classifications;
整合所有的分类配置参数生成分类整合模板数据; Integrate all classification configuration parameters to generate classification integration template data;
基于所述分类整合模板数据和多源数据整合框架生成SQL代码;Generating SQL code based on the classified integrated template data and the multi-source data integration framework;
通过所述SQL代码从所述ODS中获取原始的数据,以及将获取的原始的数据按照对象进行分类;Obtaining the original data from the ODS by using the SQL code, and classifying the acquired original data according to the object;
将分类后的数据存储在数据仓库DW中,并对所述DW中原有的数据与获取的分类后的数据进行整合;The classified data is stored in the data warehouse DW, and the original data in the DW is integrated with the obtained classified data;
其中所述对象包括:时间,地点,事件,人物,关系。The objects include: time, place, event, person, relationship.
可选的,提取模块303,具体用于:Optionally, the extraction module 303 is specifically configured to:
当接收到待处理业务后,基于预设的规则分析所述待处理业务的需要以确定处理所述待处理业务所需要的数据;After receiving the to-be-processed service, analyzing the need of the to-be-processed service based on a preset rule to determine data required to process the to-be-processed service;
基于确定的数据从分类后的数据中提取的数据,存储在数据集市DM中。The data extracted from the classified data based on the determined data is stored in the data mart DM.
与现有技术相比,本申请中通过将获取的原始的数据进行分类;以便当接收到待处理业务时,根据待处理业务的需要从分类后的数据中提取需要的数据,实现了数据处理的自动化,无需进行人工处理,使得计算结果是可以通用复用的,提高了效率。Compared with the prior art, in the present application, the obtained original data is classified; so that when the to-be-processed service is received, the required data is extracted from the classified data according to the needs of the to-be-processed service, and data processing is realized. The automation does not require manual processing, so that the calculation results can be universally multiplexed, improving efficiency.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present application.
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。A person skilled in the art can understand that the drawings are only a schematic diagram of a preferred implementation scenario, and the modules or processes in the drawings are not necessarily required to implement the application.
本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。A person skilled in the art may understand that the modules in the apparatus in the implementation scenario may be distributed in the apparatus for implementing the scenario according to the implementation scenario description, or may be correspondingly changed in one or more devices different from the implementation scenario. The modules of the above implementation scenarios may be combined into one module, or may be further split into multiple sub-modules.
上述本申请序号仅仅为了描述,不代表实施场景的优劣。The above serial numbers are only for the description, and do not represent the advantages and disadvantages of the implementation scenario.
以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。 The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any changes that can be made by those skilled in the art should fall within the protection scope of the present application.

Claims (10)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, comprising:
    获取原始的数据;Get the original data;
    将获取的原始的数据进行分类;Classify the acquired raw data;
    当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据。When receiving the to-be-processed service, the required data is extracted from the classified data according to the needs of the to-be-processed service.
  2. 如权利要求1所述的方法,其特征在于,所述原始的数据包括:新增的数据,更新的数据,特定领域的数据;The method of claim 1, wherein the original data comprises: new data, updated data, data of a specific domain;
    所述获取原始的数据,包括:The obtaining the original data includes:
    定时从预设的多个数据库中获取新增的数据;Regularly obtaining new data from a preset plurality of databases;
    定时从预设的多个数据库中获取更新的数据;Regularly obtaining updated data from a plurality of preset databases;
    定时基于关键词获取预定领域的数据。Regularly acquire data of a predetermined field based on keywords.
  3. 如权利要求1所述的方法,其特征在于,在所述将获取的原始的数据进行分类之前还包括:The method according to claim 1, wherein before the classifying the acquired original data, the method further comprises:
    将获取的原始的数据存储在操作数据源ODS中,并对所述ODS中原有的数据与获取的原始的数据进行整合。The obtained original data is stored in the operation data source ODS, and the original data in the ODS is integrated with the acquired original data.
  4. 如权利要求3所述的方法,其特征在于,所述将获取的原始的数据进行分类包括:The method of claim 3, wherein the classifying the acquired raw data comprises:
    根据预设的分类规则和分类需要设置分类配置参数;The classification configuration parameters need to be set according to the preset classification rules and classifications;
    整合所有的分类配置参数生成分类整合模板数据;Integrate all classification configuration parameters to generate classification integration template data;
    基于所述分类整合模板数据和多源数据整合框架生成SQL代码;Generating SQL code based on the classified integrated template data and the multi-source data integration framework;
    通过所述SQL代码从所述ODS中获取原始的数据,以及将获取的原始的数据按照对象进行分类;Obtaining the original data from the ODS by using the SQL code, and classifying the acquired original data according to the object;
    将分类后的数据存储在数据仓库DW中,并对所述DW中原有的数据与获取的分类后的数据进行整合;The classified data is stored in the data warehouse DW, and the original data in the DW is integrated with the obtained classified data;
    其中所述对象包括:时间,地点,事件,人物,关系。The objects include: time, place, event, person, relationship.
  5. 如权利要求1所述的方法,其特征在于,当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据,具体包括:The method according to claim 1, wherein when the service to be processed is received, the required data is extracted from the classified data according to the needs of the to-be-processed service, and specifically includes:
    当接收到待处理业务后,基于预设的规则分析所述待处理业务的需要以确定处理所述待处理业务所需要的数据; After receiving the to-be-processed service, analyzing the need of the to-be-processed service based on a preset rule to determine data required to process the to-be-processed service;
    基于确定的数据从分类后的数据中提取的数据,存储在数据集市DM中。The data extracted from the classified data based on the determined data is stored in the data mart DM.
  6. 一种数据处理设备,其特征在于,包括:A data processing device, comprising:
    获取模块,用于获取原始的数据,Get the module to get the original data,
    分类模块,用于将获取的原始的数据进行分类;a classification module for classifying the acquired raw data;
    提取模块,用于当接收到待处理业务时,根据所述待处理业务的需要从分类后的数据中提取需要的数据。And an extracting module, configured to extract required data from the classified data according to the needs of the to-be-processed service when receiving the to-be-processed service.
  7. 如权利要求6所述的设备,其特征在于,所述原始的数据包括:新增的数据,更新的数据,特定领域的数据;The device according to claim 6, wherein the original data comprises: new data, updated data, data of a specific domain;
    所述获取模块,具体用于:The obtaining module is specifically configured to:
    定时从预设的多个数据库中获取新增的数据;Regularly obtaining new data from a preset plurality of databases;
    定时从预设的多个数据库中获取更新的数据;Regularly obtaining updated data from a plurality of preset databases;
    定时基于关键词获取预定领域的数据。Regularly acquire data of a predetermined field based on keywords.
  8. 如权利要求6所述的设备,其特征在于,还包括:The device of claim 6 further comprising:
    整合模块,用于将获取的原始的数据存储在操作数据源ODS中,并对所述ODS中原有的数据与获取的原始的数据进行整合。The integration module is configured to store the acquired original data in the operation data source ODS, and integrate the original data in the ODS with the acquired original data.
  9. 如权利要求8所述的设备,其特征在于,所述分类模块,具体用于:The device according to claim 8, wherein the classification module is specifically configured to:
    根据预设的分类规则和分类需要设置分类配置参数;The classification configuration parameters need to be set according to the preset classification rules and classifications;
    整合所有的分类配置参数生成分类整合模板数据;Integrate all classification configuration parameters to generate classification integration template data;
    基于所述分类整合模板数据和多源数据整合框架生成SQL代码;Generating SQL code based on the classified integrated template data and the multi-source data integration framework;
    通过所述SQL代码从所述ODS中获取原始的数据,以及将获取的原始的数据按照对象进行分类;Obtaining the original data from the ODS by using the SQL code, and classifying the acquired original data according to the object;
    将分类后的数据存储在数据仓库DW中,并对所述DW中原有的数据与获取的分类后的数据进行整合;The classified data is stored in the data warehouse DW, and the original data in the DW is integrated with the obtained classified data;
    其中所述对象包括:时间,地点,事件,人物,关系。The objects include: time, place, event, person, relationship.
  10. 如权利要求6所述的设备,其特征在于,提取模块,具体用于:The device of claim 6, wherein the extraction module is specifically configured to:
    当接收到待处理业务后,基于预设的规则分析所述待处理业务的需要以确定处理所述待处理业务所需要的数据;After receiving the to-be-processed service, analyzing the need of the to-be-processed service based on a preset rule to determine data required to process the to-be-processed service;
    基于确定的数据从分类后的数据中提取的数据,存储在数据集市DM中。 The data extracted from the classified data based on the determined data is stored in the data mart DM.
PCT/CN2016/084442 2015-06-09 2016-06-02 Data processing method and device WO2016197852A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510312912.7 2015-06-09
CN201510312912.7A CN106294498A (en) 2015-06-09 2015-06-09 A kind of data processing method and equipment

Publications (1)

Publication Number Publication Date
WO2016197852A1 true WO2016197852A1 (en) 2016-12-15

Family

ID=57502989

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/084442 WO2016197852A1 (en) 2015-06-09 2016-06-02 Data processing method and device

Country Status (2)

Country Link
CN (1) CN106294498A (en)
WO (1) WO2016197852A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10607271B1 (en) * 2017-03-16 2020-03-31 Walgreen Co. Search platform with data driven search relevancy management
CN111061795A (en) * 2019-12-19 2020-04-24 新奥数能科技有限公司 Data processing method and device, intelligent terminal and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280141A (en) * 2017-12-29 2018-07-13 金螳螂家装电子商务(苏州)有限公司 A kind of quote data Fast Classification storage method for house ornamentation e-commerce platform
CN112069215A (en) * 2020-09-17 2020-12-11 国电龙源电气有限公司 Data query method and device based on integrated data
CN113362018A (en) * 2021-05-25 2021-09-07 北京明略软件系统有限公司 Conference time processing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000035967A (en) * 1998-07-21 2000-02-02 Sumitomo Metal Ind Ltd Database retrieval system and record medium
CN101281525A (en) * 2007-11-23 2008-10-08 北京九城网络软件有限公司 System and method for searching based on knowledge base on internet
CN102542071A (en) * 2012-01-17 2012-07-04 深圳市同洲视讯传媒有限公司 Distributed data processing system and method
CN102722482A (en) * 2011-03-29 2012-10-10 鸿富锦精密工业(深圳)有限公司 Data classification system and method
CN104462089A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Data processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8815987D0 (en) * 1988-04-08 1988-08-10 Ibm Relational databases
AU729275B2 (en) * 1995-12-30 2001-02-01 Tmln Royalty, Llc Data retrieval method and apparatus with multiple source capability
CN100337235C (en) * 2003-06-23 2007-09-12 华为技术有限公司 Method and apparatus for accessing database
CN102202173B (en) * 2010-03-23 2013-01-16 三星电子(中国)研发中心 Photo automatically naming method and device thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000035967A (en) * 1998-07-21 2000-02-02 Sumitomo Metal Ind Ltd Database retrieval system and record medium
CN101281525A (en) * 2007-11-23 2008-10-08 北京九城网络软件有限公司 System and method for searching based on knowledge base on internet
CN102722482A (en) * 2011-03-29 2012-10-10 鸿富锦精密工业(深圳)有限公司 Data classification system and method
CN102542071A (en) * 2012-01-17 2012-07-04 深圳市同洲视讯传媒有限公司 Distributed data processing system and method
CN104462089A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Data processing method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10607271B1 (en) * 2017-03-16 2020-03-31 Walgreen Co. Search platform with data driven search relevancy management
CN111061795A (en) * 2019-12-19 2020-04-24 新奥数能科技有限公司 Data processing method and device, intelligent terminal and storage medium
CN111061795B (en) * 2019-12-19 2024-03-08 新奥数能科技有限公司 Data processing method and device, intelligent terminal and storage medium

Also Published As

Publication number Publication date
CN106294498A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
WO2016197852A1 (en) Data processing method and device
US10726063B2 (en) Topic profile query creation
US9218364B1 (en) Monitoring an any-image labeling engine
US9037600B1 (en) Any-image labeling engine
US8321398B2 (en) Method and system for determining relevance of terms in text documents
WO2017000513A1 (en) Information pushing method and apparatus based on user search behavior, storage medium, and device
US9336286B2 (en) Graphical record matching process replay for a data quality user interface
WO2023273686A1 (en) Information search method and apparatus, computer device, and storage medium
CN106991175B (en) Customer information mining method, device, equipment and storage medium
WO2019109698A1 (en) Method and apparatus for determining target user group
US9336245B2 (en) Systems and methods providing master data management statistics
WO2022223024A1 (en) Data processing method and apparatus, device, and storage medium
US20150261837A1 (en) Querying Structured And Unstructured Databases
US20120239657A1 (en) Category classification processing device and method
US10430730B2 (en) Determining descriptive attributes for listing locations
TWI575391B (en) Social data filtering system, method and non-transitory computer readable storage medium of the same
WO2008025291A1 (en) Article exhibition system and method of exhibiting article
JP6509590B2 (en) User's emotion analysis device and program for goods
EP3408797A1 (en) Image-based quality control
WO2019012781A1 (en) Information processing device and program
CN114443727A (en) Human vein data processing method, device, equipment and storage medium
CN115640464B (en) Shared project data pushing method and device based on shared promotion management
CN115098596B (en) Government affair related data carding method, government affair related data carding device, government affair related data carding equipment and readable storage medium
US11308941B2 (en) Natural language processing apparatus and program
KR102267068B1 (en) System and method extracting information from time series database according to natural language queries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16806750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16806750

Country of ref document: EP

Kind code of ref document: A1