WO2016184192A1 - 数据处理方法及装置 - Google Patents

数据处理方法及装置 Download PDF

Info

Publication number
WO2016184192A1
WO2016184192A1 PCT/CN2016/073956 CN2016073956W WO2016184192A1 WO 2016184192 A1 WO2016184192 A1 WO 2016184192A1 CN 2016073956 W CN2016073956 W CN 2016073956W WO 2016184192 A1 WO2016184192 A1 WO 2016184192A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
module
unified
granularity
model
Prior art date
Application number
PCT/CN2016/073956
Other languages
English (en)
French (fr)
Inventor
程希
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016184192A1 publication Critical patent/WO2016184192A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Definitions

  • the present invention relates to the field of data processing, and in particular to a data processing method and apparatus.
  • Big data means that the amount of data involved is so large that it cannot be absorbed, managed, processed, and organized in a reasonable time through the current mainstream software tools, and is organized into information to help business decisions. Compared with traditional data, big data has the characteristics of volume, data source and format, Variety, Velocity, Value, and Complexity.
  • the present invention provides a data processing method and apparatus.
  • a data processing method includes: collecting original data from a data source; converting the original data into first data conforming to a target data model, wherein the first data includes At least one of the following features: unified format encoding, unified data type, unified data format; storing the first data.
  • collecting the raw data from the data source comprises: periodically collecting the raw data from the data source; or acquiring the raw data from the data source in real time according to a set acquisition condition.
  • the method before converting the original data to the first data, the method further includes: according to a preset policy, The irregular data in the original data is culled and/or does not conform to the fact data.
  • the method further comprises: performing data aggregation on the first data, wherein the data summary comprises at least one of the following: summary time granularity, summary NE granularity, summary space granularity, and aggregated business granularity.
  • storing the first data comprises storing the first data in a manner of redundant storage.
  • the method further comprises: acquiring a data model established by the user; extracting data required by the data model in the first data; and outputting a calculation result of the data model .
  • a data processing apparatus including: an acquisition module configured to collect raw data from a data source; and a conversion module configured to convert the original data to conform to a target data model The first data, wherein the first data includes at least one of the following features: a unified format encoding, a unified data type, a unified data format, and a storage module configured to store the first data.
  • the collecting module is configured to: periodically collect the raw data from the data source; or collect the raw data from the data source in real time according to the set collecting condition.
  • the apparatus further comprises: a culling module configured to cull the irregular data in the original data and/or not conform to the fact data according to a preset policy.
  • a culling module configured to cull the irregular data in the original data and/or not conform to the fact data according to a preset policy.
  • the device further includes: a summary module, configured to perform data aggregation on the first data, wherein the data summary includes at least one of the following: summary time granularity, summary network element granularity, summary space granularity, summary Business granularity.
  • a summary module configured to perform data aggregation on the first data, wherein the data summary includes at least one of the following: summary time granularity, summary network element granularity, summary space granularity, summary Business granularity.
  • the storage module is configured to store the first data by means of redundant storage.
  • the device further includes: an obtaining module configured to acquire a data model established by the user; an extracting module configured to extract data required by the data model in the first data; and an output module configured to be an output The calculation results of the data model.
  • the original data from the data source is collected; the original data is converted into the first data that conforms to the target data model, wherein the first data includes at least one of the following features: a unified format encoding, a unified data type, A unified data format; the method of storing the first data solves the problem of low data processing efficiency caused by the inconsistency of the big data storage type, and improves the processing efficiency.
  • FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 3 is a first schematic structural diagram 1 of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a second schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a third schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an educational big data application system according to a preferred embodiment of the present invention.
  • FIG. 7 is a flow chart showing an application method of educational big data according to a preferred embodiment of the present invention.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1 , the process includes the following steps:
  • Step S102 collecting raw data from a data source
  • Step S104 Convert the original data into the first data that meets the target data model, where the first data includes at least one of the following features: a unified format encoding, a unified data type, and a unified data format;
  • Step S106 storing the first data.
  • the data is uniformly processed into data conforming to the target data model in the data processing process, so that the data is uniformly stored. It can be seen that the above steps can be used to uniformly process large and complex data, solve the problem of low data processing efficiency caused by the inconsistency of big data storage types, and improve data processing efficiency.
  • the data source includes at least one of the following: an informationized classroom system, an examination system, and a school logistics management system.
  • the manner of collecting the original data may adopt a manner of periodically collecting, or may adopt an instant collection manner.
  • the period of the periodic acquisition can be set according to the needs of the user.
  • big data is very large and complex, and various kinds of valid or invalid data are mixed; in order to save storage space, avoid unnecessary resource consumption, and realize efficient data conversion, after collecting data,
  • the irregular data in the original data is eliminated and/or the fact data is not met.
  • the raw data that has been culled with irregular data and/or does not conform to the fact data is then stored.
  • the method further includes: performing data aggregation on the first data, wherein the data summary includes at least one of the following: summary time granularity, summary network element granularity, summary spatial granularity, and aggregated service granularity.
  • the aggregated data helps improve access efficiency.
  • the first data may be stored in a redundant manner, for example, the first data is performed. After the blocks are copied into multiple copies, they are stored in a distributed storage network.
  • the user can establish a corresponding data model according to requirements.
  • the embodiment may further acquire a data model established by the user; extract data required by the data model in the first data; and output a calculation result of the data model.
  • the decision result may also be output according to the calculation result and the preset policy.
  • a data processing device is provided to implement the above-mentioned embodiments and preferred embodiments.
  • the descriptions of the modules involved in the device will be described below.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • the apparatus includes: an acquisition module 22, a conversion module 24, and a storage module 26, wherein the acquisition module 22 is configured to collect data from a data source.
  • the original data is coupled to the acquisition module 22, and configured to convert the original data into the first data conforming to the target data model, wherein the first data includes at least one of the following features: unified format encoding, unified data type a unified data format; a storage module 26 coupled to the conversion module 24, configured to store the first data.
  • the acquisition module 22 is configured to periodically collect raw data from a data source; or to acquire raw data from a data source in real time according to the set acquisition conditions.
  • the apparatus further includes: a culling module 32 coupled between the acquisition module 22 and the conversion module 24, and configured to Irregular data in the original data and/or non-conformity data are excluded according to a preset policy.
  • FIG. 4 is a schematic diagram of a preferred structure of a data processing apparatus according to an embodiment of the present invention.
  • the apparatus further includes: a summary module 42 coupled between the conversion module 24 and the storage module 26, and configured to Data summary is performed on the first data, where the data summary includes at least one of the following: summary time granularity, summary network element granularity, summary space granularity, and aggregated service granularity.
  • the storage module 26 is configured to store the first data in a redundant storage manner.
  • FIG. 5 is a schematic diagram of a preferred structure of a data processing apparatus according to an embodiment of the present invention.
  • the apparatus further includes: an obtaining module 52 configured to acquire a data model established by a user; and an extraction module 54 coupled To storage
  • the module 26 and the acquisition module 52 are arranged to extract data required by the data model in the first data;
  • the output module 56 is coupled to the extraction module 54 and configured to output a calculation result of the data model.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the preferred embodiment of the present invention provides an educational big data application method for realizing collection, storage, management, analysis, query and presentation of data related to mass education, and aims to finally help students to formulate learning plans and improve performance; Master the situation of students, teach students in accordance with their aptitude; help school leaders to improve management, intelligent decision-making; help education-related industries respond to market changes and precision marketing.
  • an educational big data application system including:
  • 1 data acquisition module The function of this module can obtain the original data from different data sources according to the specified interface type and characteristic requirements.
  • the collection can be performed through a file interface, a database interface, or a message interface.
  • Data collection usually supports two methods: periodic acquisition and real-time acquisition.
  • Periodic acquisition refers to the method of extracting data in a specified time according to different data contents according to the data extraction period.
  • Instant acquisition is a one-time operation of the system immediately according to the set collection conditions. This action is not repeated after the operation is completed.
  • the application is applied to historical data and re-acquired data.
  • This module is mainly responsible for data cleaning, conversion, loading, rule management and transmission.
  • Data cleaning can eliminate the "dirty data” and eliminate data inconsistency.
  • "Dirty data” includes irregular data and does not conform to fact data.
  • Data conversion mainly includes conversion to unified format coding, unified data types, and unified data formats. Exceptions, data conversion also supports the most common data aggregation, such as: summary time granularity, summary network element granularity, summary space granularity, summary business granularity, etc.; loading cleaned and converted data conforming to the target data model, or no additional processing "clean" data.
  • This module acts as a carrier of data, providing a stable and efficient mass data storage and a data interface for upper layer access.
  • Data includes real-time data and non-real-time data; including structured data and unstructured data. Redundant storage can ensure the reliability of stored data, that is, store multiple copies of the same data. All the massive data is stored in different nodes by means of distributed storage, and redundant storage can also be provided. High concurrent access service with high throughput and high transfer rate.
  • the data application module set to complete data analysis mining, generate final result data. For example, analyze and process data based on specific business needs, including data modeling and external service capabilities.
  • the data application module provides visual modeling tools and application development tools, supports various components to be packaged and integrated into the development tools, provides a unified application programming interface (API) for the upper application, and provides a call to the application. Shield the underlying complex implementation details and improve application development efficiency.
  • API application programming interface
  • a preferred embodiment of the present invention further provides an educational big data application method, including the following steps:
  • Step 1 The data collection module acquires data from each data source according to rules negotiated in advance with each education-related application system. Including but not limited to obtaining student's grades, wrong analysis, and test time distribution information from the student examination system; obtaining data such as raising hands, answering questions, and teacher interactions in the informationized classroom; and obtaining student attendance; Get students' various life and consumption data, including libraries, canteens, electronic classrooms, supermarkets, etc.
  • Step 2 The data processing module performs processing such as cleaning and conversion on the data according to the defined rules, so that the data becomes data conforming to the target data model.
  • Step 3 The processed data is stored in the data storage module.
  • Step 4 Modeling in the data application module, using various data for comprehensive calculation, intelligent analysis to obtain various result data and decision.
  • the data obtained are used for specific educational applications, and ultimately achieve the purpose of promoting education and achieving wisdom education. This includes, but is not limited to, predicting student test scores; predicting the rate of progression; giving advice on how students can improve their learning; giving advice on how teachers can improve their teaching; and giving advice on how to improve management and service.
  • FIG. 6 is a schematic structural diagram of an educational big data application system according to a preferred embodiment of the present invention, and FIG. 6 is a modification of FIG. 5.
  • the system includes: a data acquisition module, a data processing module, a data storage module, and a data application module, wherein:
  • Data acquisition module set to obtain raw data from different data sources according to the specified interface type and characteristic requirements.
  • Data processing module responsible for data cleaning, conversion, loading, rule management and transmission.
  • the collected source data is converted into data that conforms to the target data model.
  • Data storage module set to achieve massive data storage.
  • Data Application Module Set up for data mining analysis and provide intelligent decision making for end users.
  • FIG. 7 is a schematic flowchart of an educational big data application method according to a preferred embodiment of the present invention. As shown in FIG. 7, the process includes the following steps:
  • Step S701 The data collection module collects data from an education-related application system (for example, an informationized classroom system, an examination system, a logistics system, a faculty performance management, etc.).
  • the interface between the data collection module and each educational application system includes, but is not limited to, File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP), and the like.
  • FTP File Transfer Protocol
  • HTTP Hypertext Transfer Protocol
  • Step S702 The data processing module performs processing such as cleaning and conversion on the data according to the defined rules, so that the data becomes data conforming to the target data model to meet subsequent storage and application requirements.
  • Step S703 The processed data is stored in the data storage module.
  • the data storage module can adopt cloud storage technology, including distributed file storage, distributed database storage, and the like.
  • Step S704 The application developer (ie, the user) uses the modeling tool provided by the data application module to perform modeling, and the modeling process is to design a calculation formula and specify which data is substituted into the formula for calculation.
  • Application developers use the application development tools provided by the data application module to develop specific educational applications, using the formula to calculate the data, and finally get the points. Analysis of the results.
  • Step S705 Serving students, teachers, schools, parents, and other education-related users according to the obtained intelligent analysis results, including but not limited to: predicting test scores of students; predicting the rate of progression; giving suggestions on how to improve learning; Suggestions on how teachers can improve their teaching level; give advice on how to improve management and service of water products.
  • predicting test scores of students including but not limited to: predicting the rate of progression; giving suggestions on how to improve learning; Suggestions on how teachers can improve their teaching level; give advice on how to improve management and service of water products.
  • smart education can be realized by using big data technology.
  • the educational big data application system and method provided by the above preferred embodiments through the entire process of “teaching,” “learning,” and “managing,” can simultaneously satisfy various needs of schools, teachers, parents, and students.
  • a storage medium is further provided, wherein the software includes the above-mentioned software, including but not limited to: an optical disk, a floppy disk, a hard disk, an erasable memory, and the like.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the original data from the data source is collected; the original data is converted into the first data that conforms to the target data model, wherein the first data includes at least one of the following features: a unified format encoding, a unified data type, A unified data format; the method of storing the first data solves the problem of low data processing efficiency caused by the inconsistency of the big data storage type, and improves the processing efficiency.

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种数据处理方法及装置,其中,该方法包括:采集来自数据源的原始数据;转换原始数据为符合目标数据模型的第一数据,其中,第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;存储第一数据。通过本发明,解决了大数据存储类型不统一导致的数据处理效率低的问题,提高了处理效率。

Description

数据处理方法及装置 技术领域
本发明涉及数据处理领域,具体而言,涉及一种数据处理方法及装置。
背景技术
目前,全国各地都在推进教育信息化工作。建立教育的信息化服务公共平台,开展数字化校园的实验工作,设立各种“数字化学习”试点学校,开发“微课程”,开展“翻转课堂”教学研究,一对一的“E课堂”教学实践,要真正提升教育水品,最重要的是顶层设计和理念超前。
大数据,指的是所涉及的资料量规模巨大到无法透过目前主流软件工具,在合理时间内达到撷取、管理、处理,并整理成为帮助企业经营决策为目的资讯。大数据与传统的数据相比,有数据量大(Volume)、数据的来源和格式多样(Variety)、数据增长快速(Velocity)、价值密度低(Value)、复杂度大(Complexity)等特点。
在教育领域,如何引入大数据技术,利用人(学生、家长、教师)、学校、教育局、以及其他与教育相关事物的数据,实现教育环境的设计、教育实验场景的布置,教育时空的变化、学习场景的变革、教育管理数据的采集和决策等目前亟待研究。怎样利用先进信息技术、大数据的数据支撑,改变过去靠拍脑袋或者理念灵感加经验的决策方式,是目前研究的热点问题。
发明人在研究过程中发现,随着数据规模的日益巨大,数据类型和格式的日趋复杂,无法高效的对大量的数据进行应用已经成为大数据时代面临的新问题。
针对相关技术中大数据存储类型不统一导致的数据处理效率低的问题,目前尚未提出有效的解决方案。
发明内容
为了解决上述技术问题,本发明提供了一种数据处理方法及装置。
根据本发明实施例的一个方面,提供了一种数据处理方法,包括:采集来自数据源的原始数据;转换所述原始数据为符合目标数据模型的第一数据,其中,所述第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;存储所述第一数据。
优选地,采集来自所述数据源的所述原始数据包括:周期采集来自所述数据源的所述原始数据;或者根据设定的采集条件即时采集来自所述数据源的所述原始数据。
优选地,在转换所述原始数据为所述第一数据之前,所述方法还包括:根据预设策略, 剔除所述原始数据中的不规则数据和/或不符合事实数据。
优选地,在转换所述原始数据为所述第一数据之后,所述方法还包括:对所述第一数据进行数据汇总,其中,所述数据汇总包括以下至少之一:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度。
优选地,存储所述第一数据包括:采用冗余存储的方式存储所述第一数据。
优选地,在存储所述第一数据之后,所述方法还包括:获取用户建立的数据模型;在所述第一数据中提取所述数据模型所需的数据;输出所述数据模型的计算结果。
根据本发明实施例的另一个方面,还提供了一种数据处理装置,包括:采集模块,设置为采集来自数据源的原始数据;转换模块,设置为转换所述原始数据为符合目标数据模型的第一数据,其中,所述第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;存储模块,设置为存储所述第一数据。
优选地,所述采集模块设置为:周期采集来自所述数据源的所述原始数据;或者根据设定的采集条件即时采集来自所述数据源的所述原始数据。
优选地,所述装置还包括:剔除模块,设置为根据预设策略,剔除所述原始数据中的不规则数据和/或不符合事实数据。
优选地,所述装置还包括:汇总模块,设置为对所述第一数据进行数据汇总,其中,所述数据汇总包括以下至少之一:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度。
优选地,所述存储模块,设置为采用冗余存储的方式存储所述第一数据。
优选地,所述装置还包括:获取模块,设置为获取用户建立的数据模型;提取模块,设置为在所述第一数据中提取所述数据模型所需的数据;输出模块,设置为输出所述数据模型的计算结果。
通过本发明实施例,采用采集来自数据源的原始数据;转换原始数据为符合目标数据模型的第一数据,其中,第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;存储第一数据的方式,解决了大数据存储类型不统一导致的数据处理效率低的问题,提高了处理效率。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的数据处理方法的流程图;
图2是根据本发明实施例的数据处理装置的结构示意图;
图3是根据本发明实施例的数据处理装置的优选结构示意图一;
图4是根据本发明实施例的数据处理装置的优选结构示意图二;
图5是根据本发明实施例的数据处理装置的优选结构示意图三;
图6是根据本发明优选实施例的教育大数据应用系统的结构示意图;
图7是根据本发明优选实施例的教育大数据应用方法的流程示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
本发明实施例提供了一种数据处理方法,图1是根据本发明实施例的数据处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,采集来自数据源的原始数据;
步骤S104,转换原始数据为符合目标数据模型的第一数据,其中,第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;
步骤S106,存储第一数据。
通过上述步骤,在数据处理过程中将数据统一处理成符合目标数据模型的数据进行存储,使数据得到了统一存储。可见,采用上述步骤,可以使庞大复杂的数据得到统一的处理,解决了大数据存储类型不统一导致的数据处理效率低的问题,提高了数据处理效率。
优选地,上述的数据源包括以下至少之一:信息化教室系统、考试系统、学校后勤管理系统。
优选地,在上述步骤S102中,采集原始数据的方式可以采取周期采集的方式,也可以采取即时采集的方式。优选地,周期采集的周期可以根据用户的需求进行设定。
优选地,大数据由于数据非常巨大且庞杂,其中混杂有各类有效或者无效的数据;为了节约存储空间,避免不必要的资源消耗,并实现高效的数据转换,在采集数据之后,还可以 根据预设策略,剔除掉原始数据中的不规则数据和/或不符合事实数据。然后再对剔除过不规则数据和/或不符合事实数据的原始数据进行存储。
优选地,在上述步骤S104之后,该方法还包括:对第一数据进行数据汇总,其中,数据汇总包括以下至少之一:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度。汇总后的数据有利于提升访问效率。
由于采集到的第一数据的数据量可能非常巨大,为了提升访问性能,优选地,在步骤S106中存储第一数据时可以采用冗余存储的方式存储第一数据,例如,将第一数据进行分块复制成多份后,存储在分布式的存储网络中。
优选地,在将第一数据进行存储之后,为了实现对数据的应用,用户可以根据需求,建立相应的数据模型。在这种情况下,本实施例在上述步骤S106之后,还可以获取用户建立的数据模型;在第一数据中提取数据模型所需的数据;输出数据模型的计算结果。优选地,还可以根据计算结果和预设策略,输出决策结果。
在本实施例中还提供了一种数据处理装置,用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述,下面对该装置中涉及到的模块进行说明。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图2是根据本发明实施例的数据处理装置的结构示意图,如图2所示,该装置包括:采集模块22、转换模块24、存储模块26,其中,采集模块22,设置为采集来自数据源的原始数据;转换模块24,耦合至采集模块22,设置为转换原始数据为符合目标数据模型的第一数据,其中,第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;存储模块26,耦合至转换模块24,设置为存储第一数据。
优选地,上述采集模块22设置为周期采集来自数据源的原始数据;或者根据设定的采集条件即时采集来自数据源的原始数据。
图3是根据本发明实施例的数据处理装置的优选结构示意图一,如图3所示,优选地,上述装置还包括:剔除模块32,耦合至采集模块22和转换模块24之间,设置为根据预设策略,剔除原始数据中的不规则数据和/或不符合事实数据。
图4是根据本发明实施例的数据处理装置的优选结构示意图二,如图4所示,优选地,上述装置还包括:汇总模块42,耦合至转换模块24和存储模块26之间,设置为对第一数据进行数据汇总,其中,数据汇总包括以下至少之一:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度。
优选地,上述存储模块26设置为采用冗余存储的方式存储第一数据。
图5是根据本发明实施例的数据处理装置的优选结构示意图三,如图5所示,优选地,上述装置还包括:获取模块52,设置为获取用户建立的数据模型;提取模块54,耦合至存储 模块26和获取模块52,设置为在第一数据中提取数据模型所需的数据;输出模块56,耦合至提取模块54,设置为输出数据模型的计算结果。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
为了使本发明实施例的描述更加清楚,下面结合优选实施例进行描述和说明。
本发明优选实施例提供了一种教育大数据应用方法,以实现海量教育相关数据的采集、存储、管理、分析、查询、展现等,目的在于最终帮助学生制定学习计划,提升成绩;帮助老师精确掌握学生情况,因材施教;帮助学校领导完善管理,智能决策;帮助教育相关产业即时响应市场变化,精准营销。
为实现上述目的,本发明优选实施例提供了一种教育大数据应用系统,包括:
1数据采集模块:此模块的功能可以按照指定的接口类型和特性要求,从不同的数据源处获取原始数据。其中,可以通过文件接口、数据库接口、消息接口等方式进行采集。数据采集通常支持两种方式:周期采集和即时采集。周期采集是指根据不同的数据内容,按照数据抽取周期,在指定的时间内对数据进行抽取的方式。即时采集是系统根据设定的采集条件立刻进行一次性操作,操作完成后不再重复此动作。优选地,即时采集应用在历史数据和重新采集的数据。
2数据处理模块:此模块主要负责数据的清洗、转换、装载、规则管理和传输等功能。数据清洗可以完成对“脏数据”的剔除,消除数据的不一致。“脏数据”包括不规则数据、不符合事实数据。数据转换主要包括转换成统一格式编码、统一的数据类型、统一的数据格式。例外,数据转换还支持最常用的数据汇总,例如:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度等;加载经过清洗和转换后的符合目标数据模型的数据,或无需另外处理的“干净”数据。
3数据存储模块:此模块作为数据的载体,提供稳定高效的海量数据存储以及供上层访问的数据接口。数据包括实时数据和非实时数据;包括结构化数据和非结构化数据。采用冗余存储的方式可以保证存储数据的可靠性,即为同一份数据存储多个副本,所有的海量数据采用分布式存储的方式存储在不同的节点,同时采用冗余存储的方式还可以提供高吞吐率和高传输率的高并发访问服务。
4数据应用模块:设置为完成数据分析挖掘,生成最终结果数据。例如,根据具体的业务需求,对数据进行分析和处理,包括数据建模以及对外提供服务能力。数据应用模块提供可视化建模工具和应用开发工具,支持各类组件封装并集成进开发工具中,对上层应用提供统一的应用程序接口(Application Programming Interface,简称为API),供应用调用,对应用屏蔽底层复杂的实现细节,提升应用开发效率。
为实现上述目的,本发明优选实施例还提供了一种教育大数据应用方法,包括如下步骤:
步骤1:数据采集模块按照事先与各教育相关的应用系统协商好的规则,从各数据源获取数据。包括但不限于从学生考试系统中获取学生的成绩、错题分析、考试时间分布信息等;获取学生在信息化教室中的举手、答题、和老师的互动等数据;获取学生的出勤率;获取学生的各种生活和消费数据,包括图书馆、食堂、电子化教室、超市等。
步骤2:数据处理模块按照定义好的规则,对数据进行清洗和转换等处理,使数据成为符合目标数据模型的数据。
步骤3:经过处理后的数据存储在数据存储模块中。
步骤4:在数据应用模块中进行建模,利用各种数据进行综合计算,智能化分析得到各种结果数据和决策。得到的数据供具体教育应用使用,最终达到促进教育提升的目的,实现智慧教育。包括但不限于预测学生的考试成绩;预测升学率;给出学生如何改进学习的建议;给出教师如何提高教学水平的建议;给出学校如何提高管理和服务水品的建议等。
图6是根据本发明优选实施例的教育大数据应用系统的结构示意图,图6是图5的一种变形形式。如图6所示,该系统包括:数据采集模块、数据处理模块、数据存储模块、数据应用模块,其中:
1)数据采集模块:设置为按照指定的接口类型和特性要求,从不同的数据源处获取原始数据。
2)数据处理模块:负责数据的清洗、转换、装载、规则管理和传输等功能。将采集到的源数据转变为符合目标数据模型的数据。
3)数据存储模块:设置为实现海量数据存储。
4)数据应用模块:设置为数据挖掘分析,并为最终用户提供智能决策。
图7是根据本发明优选实施例的教育大数据应用方法的流程示意图,如图7所示,该流程包括如下步骤:
步骤S701:数据采集模块从教育相关的应用系统(例如,信息化教室系统、考试系统、后勤系统、教职工绩效管理等)中采集数据。数据采集模块与各教育应用系统间的接口包括但不限于文件传输协议(FTP)、超文本传输协议(HTTP)等。
步骤S702:数据处理模块按照定义好的规则,对数据进行清洗和转换等处理,使数据成为符合目标数据模型的数据,以满足后续存储和应用的要求。
步骤S703:经过处理后的数据存储在数据存储模块中。其中,数据存储模块可以采用云存储技术,包括分布式文件存储、分布式数据库存储等。
步骤S704:应用开发者(即用户)使用数据应用模块提供的建模工具进行建模,建模的过程就是设计计算公式,并指明采用哪些数据代入公式进行计算。应用开发者使用数据应用模块提供的应用开发工具开发具体的教育应用,应用中使用公式计算出的数据,最终得到分 析结果。
步骤S705:根据得到的智能分析结果为学生、教师、学校、家长和其他教育相关的用户服务,包括但不限于:预测学生的考试成绩;预测升学率;给出学生如何改进学习的建议;给出教师如何提高教学水平的建议;给出学校如何提高管理和服务水品的建议等。最终达到促进教育提升的目的,实现智慧教育。
综上所述,通过本发明的上述实施例和优选实施例,采用大数据技术,可以实现智慧教育。例如,上述优选实施例提供的教育大数据应用系统和方法,贯穿“教”、“学”、“管”全流程,可以同时满足学校、教师、家长、学生的多方面需求。
在另外一个实施例中,还提供了一种软件,该软件用于执行上述实施例及优选实施方式中描述的技术方案。
在另外一个实施例中,还提供了一种存储介质,该存储介质中存储有上述软件,该存储介质包括但不限于:光盘、软盘、硬盘、可擦写存储器等。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的对象在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
通过本发明实施例,采用采集来自数据源的原始数据;转换原始数据为符合目标数据模型的第一数据,其中,第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;存储第一数据的方式,解决了大数据存储类型不统一导致的数据处理效率低的问题,提高了处理效率。

Claims (12)

  1. 一种数据处理方法,包括:
    采集来自数据源的原始数据;
    转换所述原始数据为符合目标数据模型的第一数据,其中,所述第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;
    存储所述第一数据。
  2. 根据权利要求1所述的方法,其中,采集来自所述数据源的所述原始数据包括:
    周期采集来自所述数据源的所述原始数据;或者
    根据设定的采集条件即时采集来自所述数据源的所述原始数据。
  3. 根据权利要求1所述的方法,其中,在转换所述原始数据为所述第一数据之前,所述方法还包括:
    根据预设策略,剔除所述原始数据中的不规则数据和/或不符合事实数据。
  4. 根据权利要求1所述的方法,其中,在转换所述原始数据为所述第一数据之后,所述方法还包括:
    对所述第一数据进行数据汇总,其中,所述数据汇总包括以下至少之一:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度。
  5. 根据权利要求1所述的方法,其中,存储所述第一数据包括:
    采用冗余存储的方式存储所述第一数据。
  6. 根据权利要求1至5中任一项所述的方法,其中,在存储所述第一数据之后,所述方法还包括:
    获取用户建立的数据模型;
    在所述第一数据中提取所述数据模型所需的数据;
    输出所述数据模型的计算结果。
  7. 一种数据处理装置,包括:
    采集模块,设置为采集来自数据源的原始数据;
    转换模块,设置为转换所述原始数据为符合目标数据模型的第一数据,其中,所述第一数据包括以下至少之一的特征:统一格式编码,统一的数据类型,统一的数据格式;
    存储模块,设置为存储所述第一数据。
  8. 根据权利要求7所述的装置,其中,所述采集模块设置为:
    周期采集来自所述数据源的所述原始数据;或者
    根据设定的采集条件即时采集来自所述数据源的所述原始数据。
  9. 根据权利要求7所述的装置,其中,所述装置还包括:
    剔除模块,设置为根据预设策略,剔除所述原始数据中的不规则数据和/或不符合事实数据。
  10. 根据权利要求7所述的装置,其中,所述装置还包括:
    汇总模块,设置为对所述第一数据进行数据汇总,其中,所述数据汇总包括以下至少之一:汇总时间粒度、汇总网元粒度、汇总空间粒度、汇总业务粒度。
  11. 根据权利要求7所述的装置,其中,
    所述存储模块,设置为采用冗余存储的方式存储所述第一数据。
  12. 根据权利要求7至11中任一项所述的装置,其中,所述装置还包括:
    获取模块,设置为获取用户建立的数据模型;
    提取模块,设置为在所述第一数据中提取所述数据模型所需的数据;
    输出模块,设置为输出所述数据模型的计算结果。
PCT/CN2016/073956 2015-05-21 2016-02-17 数据处理方法及装置 WO2016184192A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510263271.0 2015-05-21
CN201510263271.0A CN106296498A (zh) 2015-05-21 2015-05-21 数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2016184192A1 true WO2016184192A1 (zh) 2016-11-24

Family

ID=57319343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/073956 WO2016184192A1 (zh) 2015-05-21 2016-02-17 数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN106296498A (zh)
WO (1) WO2016184192A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111427946A (zh) * 2020-04-16 2020-07-17 北京搜狐互联网信息服务有限公司 数据处理方法及装置

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121508A (zh) * 2017-12-15 2018-06-05 华中师范大学 基于教育大数据的多源异构数据采集系统及处理方法
CN108268645A (zh) * 2018-01-23 2018-07-10 广州南方人才资讯科技有限公司 大数据处理方法与系统
CN108416506B (zh) * 2018-02-07 2022-08-02 平安科技(深圳)有限公司 客户风险等级管理方法、服务器及计算机可读存储介质
CN108921747A (zh) * 2018-07-06 2018-11-30 重庆和贯科技有限公司 打造学生沉浸感的智慧教育系统
CN109597846B (zh) * 2018-10-22 2024-05-07 平安科技(深圳)有限公司 大数据平台数据仓库数据处理方法、装置和计算机设备
CN109558400B (zh) * 2018-11-28 2021-04-27 北京锐安科技有限公司 数据处理方法、装置、设备和存储介质
CN110069553A (zh) * 2019-04-28 2019-07-30 中国疾病预防控制中心 一种突发公共卫生事件的数据采集及处理方法、设备
CN112947263A (zh) * 2021-04-20 2021-06-11 南京云玑信息科技有限公司 一种基于数据采集与编码管理控制系统
CN113190608A (zh) * 2021-05-28 2021-07-30 北京红山信息科技研究院有限公司 数据标准化采集方法、装置、设备及存储介质
CN117407381A (zh) * 2023-09-26 2024-01-16 陕西小保当矿业有限公司 一种矿山工业大数据实时处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052102A1 (en) * 2006-08-02 2008-02-28 Aveksa, Inc. System and method for collecting and normalizing entitlement data within an enterprise
CN103473719A (zh) * 2013-09-26 2013-12-25 杭州意能软件有限公司 一种数据采集方法、装置及系统
CN103676798A (zh) * 2012-09-10 2014-03-26 任伟 统一监检平台
CN104134100A (zh) * 2014-07-22 2014-11-05 香港佳能通节能科技有限公司 一种基于云计算的节能管理系统
CN104462604A (zh) * 2014-12-31 2015-03-25 成都市卓睿科技有限公司 数据加工方法及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663659A (zh) * 2012-03-27 2012-09-12 上海爱友科技有限公司 一种基于学业成就发展指数的教育系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052102A1 (en) * 2006-08-02 2008-02-28 Aveksa, Inc. System and method for collecting and normalizing entitlement data within an enterprise
CN103676798A (zh) * 2012-09-10 2014-03-26 任伟 统一监检平台
CN103473719A (zh) * 2013-09-26 2013-12-25 杭州意能软件有限公司 一种数据采集方法、装置及系统
CN104134100A (zh) * 2014-07-22 2014-11-05 香港佳能通节能科技有限公司 一种基于云计算的节能管理系统
CN104462604A (zh) * 2014-12-31 2015-03-25 成都市卓睿科技有限公司 数据加工方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111427946A (zh) * 2020-04-16 2020-07-17 北京搜狐互联网信息服务有限公司 数据处理方法及装置

Also Published As

Publication number Publication date
CN106296498A (zh) 2017-01-04

Similar Documents

Publication Publication Date Title
WO2016184192A1 (zh) 数据处理方法及装置
CN108805764B (zh) 一种作业进度监控方法、装置、终端及可读介质
Muhammad et al. Research On Students’ Mathematical Ability In Learning Mathematics In The Last Decade: A Bibliometric Review
Gould et al. Teaching data science to secondary students: The mobilize introduction to data science curriculum
Brenton et al. Technology infrastructure for citizen science
Ye et al. Expanding approaches for understanding impact: integrating technology, curriculum, and open educational resources in science education
CN111026944B (zh) 一种信息处理的方法、装置、介质和电子设备
CN114115392A (zh) 一种基于5g云边结合的智慧课堂控制系统及方法
Sun et al. Profiling and supporting adaptive micro learning on open education resources
CN110737776A (zh) 一种基于知识图谱和目标本体的路径学习规划系统
Pérez-Rosés et al. Synthetic generation of social network data with endorsements
Sellars “Grand challenges” in big data and the earth sciences
Dornhöfer et al. A data-driven smart city transformation model utilizing the green knowledge management cube
Abdelouarit et al. Big-Learn: Towards a tool based on Big Data to improve research in an e-learning environment
H Zadeh et al. Incorporating big data tools for social media analytics in a business analytics course
CN109472729A (zh) 在线教育大数据技术平台
Kovacic et al. Designing and evaluation procedures for interdisciplinary building information modelling use—an explorative study
Sigman et al. Visualization of Twitter Data in the Classroom
CN112906683A (zh) 文本标注方法、装置及设备
Hou et al. A spatial knowledge sharing platform. Using the visualization approach
Salihoun et al. The exploitation of traces serving tutors for the reconstruction of groups within a CBLE
Hai-ling et al. Big data technology applied to learning behavior evaluation system
Crouch A relevant data revolution for development
Murugananthan et al. A novel application framework for educational data mining towards automated learning system
Alawiye et al. Awareness, use, and perceived influence of electronic resources on studies among students of Federal University of Agriculture, Abeokuta, Nigeria

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16795679

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16795679

Country of ref document: EP

Kind code of ref document: A1