CN109977271B - Big data processing system and processing method thereof - Google Patents

Big data processing system and processing method thereof Download PDF

Info

Publication number
CN109977271B
CN109977271B CN201910354112.XA CN201910354112A CN109977271B CN 109977271 B CN109977271 B CN 109977271B CN 201910354112 A CN201910354112 A CN 201910354112A CN 109977271 B CN109977271 B CN 109977271B
Authority
CN
China
Prior art keywords
data
mapping
processing
nodes
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910354112.XA
Other languages
Chinese (zh)
Other versions
CN109977271A (en
Inventor
宋顶利
张昕
周建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Kailin Jianguan Technology Co.,Ltd.
Original Assignee
Chongqing Hanniu Technology Innovation Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Hanniu Technology Innovation Service Co ltd filed Critical Chongqing Hanniu Technology Innovation Service Co ltd
Priority to CN201910354112.XA priority Critical patent/CN109977271B/en
Publication of CN109977271A publication Critical patent/CN109977271A/en
Application granted granted Critical
Publication of CN109977271B publication Critical patent/CN109977271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a big data processing system, which comprises a data format conversion module, a data processing module and a data processing module, wherein the data format conversion module is used for converting the data format of original data to form data to be processed; the data mapping module is used for establishing a mapping set of the data to be processed; the mapping data processing module is used for processing the data of the mapping set; and the original data processing module is used for processing the original data according to the processing result of the mapping set data. The invention can improve the defects of the prior art and improve the data processing efficiency.

Description

Big data processing system and processing method thereof
Technical Field
The invention relates to the technical field of big data, in particular to a big data processing system and a big data processing method.
Background
In recent years, big data technology is rapidly developed and widely applied to a plurality of fields. Although the requirement of the big data technology on data processing precision is not high, the data source of the big data is numerous, the data volume is large, and the requirement on hardware for data processing is still high, so that the further popularization of the big data technology is limited.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a big data processing system and a processing method thereof, which can solve the defects of the prior art and improve the data processing efficiency.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows.
A big data processing system, comprising,
the data format conversion module is used for converting the data format of the original data to form data to be processed;
the data mapping module is used for establishing a mapping set of the data to be processed;
the mapping data processing module is used for processing the data of the mapping set;
and the original data processing module is used for processing the original data according to the processing result of the mapping set data.
A processing method of the big data processing system comprises the following steps:
A. the data format conversion module is used for converting the data format of the data to be processed, and the converted data to be processed comprises a data table head section, a data characteristic section, a mapping rule section and a data content section;
B. the data mapping module establishes a mapping set of the data to be processed and marks the mapping rule in a mapping rule section;
C. the mapping data processing module processes the data of the mapping set;
D. and the original data processing module processes the original data according to the processing result of the mapping set data.
Preferably, in step B, the data in the mapping set are linearly related; and calculating the similarity of the mapping rules, and setting the data to be processed corresponding to the mapping rules with the similarity larger than a set value into the same data cluster.
Preferably, in step C, processing the data of the map set comprises the steps of,
c1, establishing a data tree for the data in each data cluster, and taking the data with the same mapping rule as a node;
c2, processing data by taking the node as a starting end of data processing, wherein the data between the two nodes adopt the same processing mode, and the processing mode is determined by linear combination of the data processing modes of the nodes at the two ends;
and C3, establishing a correlation matrix among different data clusters, and establishing a multi-dimensional correlation tree of the data according to the correlation matrix.
Preferably, in step D, the processing of the raw data comprises the following steps,
d1, marking a data original address on a data table header;
d2, traversing and comparing the data features in the data feature section with the nodes of the multidimensional association tree, and taking the nodes with the highest similarity with the data features as mapping nodes;
and D3, performing inverse operation on the mapping nodes by using the mapping rules recorded by the mapping rule section to obtain a processing result of the original data.
Adopt the beneficial effect that above-mentioned technical scheme brought to lie in: according to the method, the mapping data set is established, the characteristic that the mapping data is convenient to process is utilized, the indirect processing of the original data is realized, and finally the mapping relation is fed back to the original data, so that the rapid processing of the original data is realized. The processing result of the mapping data is embodied by adopting a multi-dimensional associated tree structure, so that the dependence of the mapping data processing and the original data processing can be improved, and the accuracy of the original data processing is ensured.
Drawings
FIG. 1 is a block diagram of one embodiment of the present invention.
In the figure: 1. a data format conversion module; 2. a data mapping module; 3. a mapping data processing module; 4. and a raw data processing module.
Detailed Description
Referring to fig. 1, one embodiment of the present invention includes,
the data format conversion module 1 is used for converting the data format of the original data to form data to be processed;
the data mapping module 2 is used for establishing a mapping set of data to be processed;
the mapping data processing module 3 is used for processing the data of the mapping set;
and the original data processing module 4 is used for processing the original data according to the processing result of the mapping set data.
A processing method of the big data processing system comprises the following steps:
A. the data format conversion module 1 is used for converting the data format of the data to be processed, and the converted data to be processed comprises a data table head segment, a data characteristic segment, a mapping rule segment and a data content segment;
B. the data mapping module 2 establishes a mapping set of the data to be processed and marks the mapping rule in a mapping rule segment;
C. the mapping data processing module 3 processes the data of the mapping set;
D. and the original data processing module 4 processes the original data according to the processing result of the mapping set data.
In the step B, data in the mapping set are linearly related; and calculating the similarity of the mapping rules, and setting the data to be processed corresponding to the mapping rules with the similarity larger than a set value into the same data cluster.
In step C, processing the data of the map set includes the following steps,
c1, establishing a data tree for the data in each data cluster, and taking the data with the same mapping rule as a node;
c2, processing the data by taking the node as a starting end of data processing, wherein the data between the two nodes adopt the same processing mode, and the processing mode is determined by linear combination of the data processing modes of the nodes at the two ends;
and C3, establishing a correlation matrix among different data clusters, and establishing a multi-dimensional correlation tree of the data according to the correlation matrix.
In step D, the processing of the raw data comprises the following steps,
d1, marking a data original address on a data table header;
d2, traversing and comparing the data features in the data feature section with the nodes of the multidimensional association tree, and taking the nodes with the highest similarity with the data features as mapping nodes;
and D3, performing inverse operation on the mapping nodes by using the mapping rules recorded by the mapping rule section to obtain a processing result of the original data.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (2)

1. A big data processing system, characterized by: comprises the steps of (a) preparing a substrate,
the data format conversion module is used for converting the data format of the original data to form data to be processed;
the data mapping module is used for establishing a mapping set of the data to be processed; the data in the mapping set are linearly related; calculating the similarity of the mapping rules, and setting the data to be processed corresponding to the mapping rules with the similarity larger than a set value as the same data cluster;
the mapping data processing module is used for processing the data of the mapping set; processing the data of the mapping set includes the steps of,
c1, establishing a data tree for the data in each data cluster, and taking the data with the same mapping rule as a node;
c2, processing data by taking the node as a starting end of data processing, wherein the data between the two nodes adopt the same processing mode, and the processing mode is determined by linear combination of the data processing modes of the nodes at the two ends;
c3, establishing a correlation matrix among different data clusters, and establishing a multi-dimensional correlation tree of the data according to the correlation matrix;
a raw data processing module for processing the raw data according to the processing result of the mapping set data, the processing of the raw data comprises the following steps,
d1, marking a data original address on a data table header;
d2, traversing and comparing the data features in the data feature section with the nodes of the multidimensional association tree, and taking the nodes with the highest similarity with the data features as mapping nodes;
and D3, performing inverse operation on the mapping nodes by using the mapping rules recorded by the mapping rule section to obtain a processing result of the original data.
2. A method of processing a big data processing system according to claim 1, comprising the steps of:
A. the data format conversion module is used for converting the data format of the original data, and the converted original data comprises a data table head section, a data characteristic section, a mapping rule section and a data content section;
B. the data mapping module is used for establishing a mapping set of the data to be processed and marking the mapping rule in the mapping rule section; the data in the mapping set are linearly related; calculating the similarity of the mapping rules, and setting the data to be processed corresponding to the mapping rules with the similarity larger than a set value as the same data cluster;
C. a mapping data processing module for processing the data of the mapping set, wherein the processing of the data of the mapping set comprises the following steps,
c1, establishing a data tree for the data in each data cluster, and taking the data with the same mapping rule as a node;
c2, processing data by taking the node as a starting end of data processing, wherein the data between the two nodes adopt the same processing mode, and the processing mode is determined by linear combination of the data processing modes of the nodes at the two ends;
c3, establishing a correlation matrix among different data clusters, and establishing a multi-dimensional correlation tree of the data according to the correlation matrix;
D. the original data processing module is used for processing the original data according to the processing result of the mapping set data, and the processing of the original data comprises the following steps,
d1, marking a data original address on a header of a data table;
d2, traversing and comparing the data features in the data feature section with the nodes of the multidimensional association tree, and taking the nodes with the highest similarity with the data features as mapping nodes;
and D3, performing inverse operation on the mapping nodes by using the mapping rules recorded by the mapping rule section to obtain a processing result of the original data.
CN201910354112.XA 2019-04-29 2019-04-29 Big data processing system and processing method thereof Active CN109977271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910354112.XA CN109977271B (en) 2019-04-29 2019-04-29 Big data processing system and processing method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910354112.XA CN109977271B (en) 2019-04-29 2019-04-29 Big data processing system and processing method thereof

Publications (2)

Publication Number Publication Date
CN109977271A CN109977271A (en) 2019-07-05
CN109977271B true CN109977271B (en) 2022-12-20

Family

ID=67087062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910354112.XA Active CN109977271B (en) 2019-04-29 2019-04-29 Big data processing system and processing method thereof

Country Status (1)

Country Link
CN (1) CN109977271B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7158990B1 (en) * 2002-05-31 2007-01-02 Oracle International Corporation Methods and apparatus for data conversion
CN101283350A (en) * 2005-07-15 2008-10-08 思索软件有限公司 Method and apparatus for providing structured data for free text messages
CN106407392A (en) * 2016-09-19 2017-02-15 北京集奥聚合科技有限公司 A marking language-based node mapping relationship extracting method and system
CN106682235A (en) * 2017-01-18 2017-05-17 济南浪潮高新科技投资发展有限公司 System and method for isomerous data mapping
WO2017107453A1 (en) * 2015-12-23 2017-06-29 乐视控股(北京)有限公司 Video content recommendation method, device, and system
CN106959948A (en) * 2016-01-08 2017-07-18 普华诚信信息技术有限公司 The system and its preprocess method pre-processed for distributed nature to big data
CN107315768A (en) * 2017-05-17 2017-11-03 上海交通大学 The distribution information interacting method and system mapped based on Heterogeneous Information model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7158990B1 (en) * 2002-05-31 2007-01-02 Oracle International Corporation Methods and apparatus for data conversion
CN101283350A (en) * 2005-07-15 2008-10-08 思索软件有限公司 Method and apparatus for providing structured data for free text messages
WO2017107453A1 (en) * 2015-12-23 2017-06-29 乐视控股(北京)有限公司 Video content recommendation method, device, and system
CN106959948A (en) * 2016-01-08 2017-07-18 普华诚信信息技术有限公司 The system and its preprocess method pre-processed for distributed nature to big data
CN106407392A (en) * 2016-09-19 2017-02-15 北京集奥聚合科技有限公司 A marking language-based node mapping relationship extracting method and system
CN106682235A (en) * 2017-01-18 2017-05-17 济南浪潮高新科技投资发展有限公司 System and method for isomerous data mapping
CN107315768A (en) * 2017-05-17 2017-11-03 上海交通大学 The distribution information interacting method and system mapped based on Heterogeneous Information model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Extended Subtree: A New Similarity Function for Tree Structured Data;A. Shahbazi 等;《IEEE Transactions on Knowledge and Data Engineering》;20130527;第26卷(第4期);864-877 *
SDN网络虚拟化中规则映射研究;李佟等;《计算机系统应用》;20170915(第09期);242-249 *
元数据驱动的异构数据模型双向映射策略;黄刚等;《科学技术与工程》;20121118(第32期);274-280 *
关联规则在企业电耗数据分析中的应用;刘美 等;《微计算机信息》;20091125;第25卷(第33期);55-57 *
基于大数据风险分析的信号系统设备维护策略研究;秦晓光;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20180715(第7期);C033-84 *

Also Published As

Publication number Publication date
CN109977271A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN103761318B (en) A kind of method and system of relationship type synchronization of data in heterogeneous database
CN107341178B (en) Data retrieval method based on self-adaptive binary quantization Hash coding
WO2019109732A1 (en) Distributed storage method and architecture for gene variation data
CN102737108B (en) Method and device for processing flow diagram
US7199729B2 (en) Character code conversion methods and systems
WO2003069554A3 (en) Method and system for interactive ground-truthing of document images
CN110704649B (en) Method and system for constructing flow image data set
CN104123375B (en) Data search method and system
CN109766337B (en) Tree structure data storage method, electronic device, storage medium and system
WO2013083067A1 (en) Method and device for acquiring structured information in layout file
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN109977271B (en) Big data processing system and processing method thereof
KR101255639B1 (en) Column-oriented database system and join process method using join index thereof
CN101030230A (en) Image searching method and system
CN104123527A (en) Mask-based image table document identification method
CN112905642A (en) Method for storing IEC61850 report data into relational database based on CSV mapping file
CN116579319A (en) Text similarity analysis method and system
CN109213751B (en) Spark platform based Oracle database parallel migration method
CN115858855A (en) Video data query method based on scene characteristics
CN111091003A (en) Parallel extraction method based on knowledge graph query
CN110389953A (en) Date storage method, storage medium, storage device and server based on compression figure
CN113538474A (en) 3D point cloud segmentation target detection system based on edge feature fusion
CN110867214B (en) DNA sequence query system based on shared data outline
CN107544090B (en) Seismic data analyzing and storing method based on MapReduce
CN110134692B (en) Time-space index establishing method based on frequency attribute and PCA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221130

Address after: 7-1 #, No. 7, Huasheng Road, Yuzhong District, Chongqing 400000

Applicant after: Chongqing Hanniu Technology Innovation Service Co.,Ltd.

Address before: 063210 Tangshan City Caofeidian District, Hebei Province, Tangshan Bay eco Town, Bohai Road, 21

Applicant before: NORTH CHINA University OF SCIENCE AND TECHNOLOGY

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 7-1 #, No. 7, Huasheng Road, Yuzhong District, Chongqing 400000

Patentee after: Chongqing Kailin Jianguan Technology Co.,Ltd.

Address before: 7-1 #, No. 7, Huasheng Road, Yuzhong District, Chongqing 400000

Patentee before: Chongqing Hanniu Technology Innovation Service Co.,Ltd.

CP01 Change in the name or title of a patent holder