CN109522348A - A kind of data processing system and method merging multiple intellectual analysis language - Google Patents

A kind of data processing system and method merging multiple intellectual analysis language Download PDF

Info

Publication number
CN109522348A
CN109522348A CN201811119149.6A CN201811119149A CN109522348A CN 109522348 A CN109522348 A CN 109522348A CN 201811119149 A CN201811119149 A CN 201811119149A CN 109522348 A CN109522348 A CN 109522348A
Authority
CN
China
Prior art keywords
language
data
processing
task
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811119149.6A
Other languages
Chinese (zh)
Inventor
何海峰
王文志
谢东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Sinovatio Technology LLC
Original Assignee
Nanjing Sinovatio Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Sinovatio Technology LLC filed Critical Nanjing Sinovatio Technology LLC
Priority to CN201811119149.6A priority Critical patent/CN109522348A/en
Publication of CN109522348A publication Critical patent/CN109522348A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of data processing system and method for merging multiple intellectual analysis language, system includes: three data access module, analysis mining module and data memory module basic logic modules;Method includes the following steps: user's start flow;The more intelligent language processes of user configuration are configured by way of pulling operator on OceanMind platform;System generates task execution tree according to the process of configuration;Data source is uniformly processed in execution task, the data access provided based on OceanMind, cleaning conversion function, and according to different Intelligent treatment language, sends request and handled;Processing task is received, according to language difference and whether needs parallelization, is selected based on primary language or based on the parallel processing manner of spark;It will be integrated from different language processing result, and carry out persistence operation, so as to the use of subsequent process.The present invention is capable of providing unified data cleansing conversion function, carries out distributed analysis for all data sources.

Description

A kind of data processing system and method merging multiple intellectual analysis language
Technical field
The present invention relates to intellectual analysis language technology field, at especially a kind of data for merging multiple intellectual analysis language Manage system and method.
Background technique
In recent years, along with the fast development of computer technology and internet, the epoch of opening imformation explosion.It is filled in society Denounce more than ever before data, people is facilitated to seek more suitable intelligent data analysis mode.Existing intellectual analysis language, Such as: python, R language and tensorflow have been able to realize there are data in specific area by the development of certain time The analysis of effect.
Under facilitation, information-based main trend, individual intelligent language processing problem is also gradually highlighted, and is mainly had Following aspects: (1) single Intelligent treatment language spininess handles specific field, is unable to satisfy the place in full field Reason needs;(2) each metalanguage proposes very high want to data cleansing conversion for the standard disunity of data source It asks;(3) with the increase of data volume, the intelligent language of single machine processing can only be carried out, parallelization is unable to satisfy, distributed treatment It needs.Based on above situations, a set of multiple intellectual analysis language of fusion are established, unified data cleansing conversion function is provided, Distributed analysis is carried out for all data sources, just becomes a urgent task.
Summary of the invention
Technical problem to be solved by the present invention lies in provide a kind of data processing system for merging multiple intellectual analysis language System and method, provide unified data cleansing conversion function, carry out distributed analysis for all data sources.
In order to solve the above technical problems, the present invention provides a kind of data processing system for merging multiple intellectual analysis language, It include: three data access module, analysis mining module and data memory module basic logic modules;Data access module is responsible for Access different types of data source, and the regular format needed for subsequent processing;Analysis mining module is responsible for the number using access According to according to the flow of task of configuration, task is submitted to specific intelligent language system and handled by progress task schedule;Number According to memory module be responsible for receive analysis mining module as a result, and being stored in specific destination.
Preferably, different types of data source is relevant database, big data storage system and stream data.
Preferably, task is submitted to specific intelligent language system to handle, if user needs using parallel place Reason calls the processing mode based on spark.
Correspondingly, a kind of data processing method for merging multiple intellectual analysis language, includes the following steps:
(1) user's start flow;
(2) the more intelligent language processes of user configuration are matched by way of pulling operator on OceanMind platform It sets;
(3) system generates task execution tree according to the process of configuration;
(4) task is executed, data source is uniformly processed in data access, cleaning conversion function based on OceanMind offer, And according to different Intelligent treatment language, sends request and handled;
(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on The parallel processing manner of spark;
(6) it will be integrated from different language processing result, and carry out persistence operation, so that subsequent process makes With.
Preferably, in step (5), processing task is received, according to language difference and whether needs parallelization, selection is based on original Raw language or the parallel processing manner based on spark specifically comprise the following steps:
(51) if R language request, user can select according to whether doing parallel processing, and if desired parallel processing is System will call SparkR to handle, and otherwise use primary R language;
(52) if Python is requested, user can select according to whether doing parallel processing, if desired parallel place Reason, system will call PySpark to handle, otherwise use primary python language;
(53) if Tensorflow language request, user can select according to whether doing parallel processing, if desired simultaneously Row processing, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow;
(54) if Java language is requested, default parallelization processing;
(55) by step (51)-(54) processing result be stored in Dataframe in, after return the result.
It is carried out greatly the invention has the benefit that multiple intellectual analysis language are used alone or in combination in user in which can be convenient Data processing, such as tensorflow, R Language Processing are carried out to big data using flow of data stream, it can give full play in this way The advantage of each intellectual analysis language completes analysis task;User more can use distributed computing technology and realize data analysis mining, Various language performances are sufficiently excavated, analysis efficiency is effectively improved;It can be carried in the present system using CPU module simultaneously, it can be to being System is effectively extended, and analysis ability is promoted.
Detailed description of the invention
Fig. 1 is system structure diagram of the invention.
Fig. 2 is method flow schematic diagram of the invention.
Fig. 3 is basic big data platform schematic diagram of the invention.
Fig. 4 is task distribution and processing flow schematic diagram of the invention.
Specific embodiment
Fig. 1 is a kind of structural representation for the data processing system for merging multiple intellectual analysis language implemented according to the present invention Figure.It wherein include three data access module, analysis mining module and data memory module basic logic modules.
Data access module is responsible for accessing different types of data source, and the regular format needed for subsequent processing, such as: closing It is type database, big data storage system and stream data etc..
Analysis mining module is responsible for the data using access, according to the flow of task of configuration, task schedule is carried out, by task Specific intelligent language system is submitted to be handled.Particularly, if user needs using parallel processing, it can call and be based on The processing mode of spark.
Data memory module be responsible for receive analysis mining module as a result, and being stored in specific destination.
Fig. 2 is a kind of process signal for the data processing method for merging multiple intellectual analysis language implemented according to the present invention Figure, includes the following steps:
(1) user's start flow;
(2) the more intelligent language processes of user configuration are configured by way of pulling operator;
(3) system generates task execution tree according to the process of configuration;
(4) task is executed, data source is uniformly processed, and according to different Intelligent treatment language, sends request and is handled;
(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on The parallel processing manner of spark.
(6) result is integrated, and carries out persistence operation.
Fig. 3 is the use exemplary diagram of basic platform of the invention, and right side identifies the principal mode of operator dragging in figure.It is left Platform provides multiple functions for users to use in side, such as data importing, data export, data cleansing conversion, data analysis and machine Device study etc..
Fig. 4 is that user selects processing mode schematic diagram, including following content:
(1) if R language request, user can select according to whether doing parallel processing, if desired parallel processing, system SparkR will be called to handle, otherwise use primary R language;
(2) if Python is requested, user can select according to whether doing parallel processing, if desired parallel place Reason, system will call PySpark to handle, otherwise use primary python language;
(3) if Tensorflow language request, user can select according to whether doing parallel processing, if desired parallel Processing, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow;
(4) if Java language is requested, default parallelization processing;
(5) by step (1)-(4) processing result be stored in Dataframe in, after return the result.
In the present invention, user is used alone or in combination multiple intellectual analysis language and carries out big data processing in which can be convenient, Such as tensorflow, R Language Processing are carried out to big data using flow of data stream, each intelligence point can be given full play in this way The advantage of language is analysed, analysis task is completed;User more can use distributed computing technology and realize data analysis mining, sufficiently excavate each Kind language performance, effectively improves analysis efficiency;It can be carried in the present system using CPU module, system can be carried out effective simultaneously Extension promotes analysis ability.

Claims (5)

1. a kind of data processing system for merging multiple intellectual analysis language characterized by comprising data access module is divided Three basic logic modules of module and data memory module are excavated in analysis;Data access module is responsible for accessing different types of data Source, and the regular format needed for subsequent processing;Analysis mining module is responsible for the data using access, according to the task flow of configuration Journey carries out task schedule, task is submitted to specific intelligent language system and is handled;Data memory module, which is responsible for receiving, to be divided Analysis excavate module as a result, and being stored in specific destination.
2. merging the data processing system of multiple intellectual analysis language as described in claim 1, which is characterized in that variety classes Data source be relevant database, big data storage system and stream data.
3. merging the data processing system of multiple intellectual analysis language as described in claim 1, which is characterized in that propose task It is sent to specific intelligent language system to be handled, if user needs to call the processing side based on spark using parallel processing Formula.
4. a kind of data processing method for merging multiple intellectual analysis language, which comprises the steps of:
(1) user's start flow;
(2) the more intelligent language processes of user configuration are configured by way of pulling operator on OceanMind platform;
(3) system generates task execution tree according to the process of configuration;
(4) task is executed, data source, and root is uniformly processed in data access, cleaning conversion function based on OceanMind offer According to different Intelligent treatment language, sends request and handled;
(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on spark Parallel processing manner;
(6) it will be integrated from different language processing result, and carry out persistence operation, so as to the use of subsequent process.
5. merging the data processing method of multiple intellectual analysis language as claimed in claim 4, which is characterized in that step (5) In, receive processing task, according to language difference and whether need parallelization, select based on primary language or based on spark's and Row processing mode specifically comprises the following steps:
(51) if R language request, user can select according to whether doing parallel processing, if desired parallel processing, and system will It calls SparkR to be handled, otherwise uses primary R language;
(52) if Python is requested, user can select according to whether doing parallel processing, and if desired parallel processing is System will call PySpark to handle, and otherwise use primary python language;
(53) if Tensorflow language request, user can select according to whether doing parallel processing, if desired parallel place Reason, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow;
(54) if Java language is requested, default parallelization processing;
(55) by step (51)-(54) processing result be stored in Dataframe in, after return the result.
CN201811119149.6A 2018-09-25 2018-09-25 A kind of data processing system and method merging multiple intellectual analysis language Pending CN109522348A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811119149.6A CN109522348A (en) 2018-09-25 2018-09-25 A kind of data processing system and method merging multiple intellectual analysis language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811119149.6A CN109522348A (en) 2018-09-25 2018-09-25 A kind of data processing system and method merging multiple intellectual analysis language

Publications (1)

Publication Number Publication Date
CN109522348A true CN109522348A (en) 2019-03-26

Family

ID=65772213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811119149.6A Pending CN109522348A (en) 2018-09-25 2018-09-25 A kind of data processing system and method merging multiple intellectual analysis language

Country Status (1)

Country Link
CN (1) CN109522348A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110780978A (en) * 2019-10-25 2020-02-11 下一代互联网重大应用技术(北京)工程研究中心有限公司 Data processing method, system, device and medium
CN110968620A (en) * 2019-12-10 2020-04-07 国网信通亿力科技有限责任公司 Agile data analysis method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008007A (en) * 2014-06-12 2014-08-27 深圳先进技术研究院 Interoperability data processing system and method based on streaming calculation and batch processing calculation
CN106844585A (en) * 2017-01-10 2017-06-13 广东精规划信息科技股份有限公司 A kind of time-space relationship analysis system based on multi-source Internet of Things location aware
US20170263255A1 (en) * 2016-03-10 2017-09-14 Microsoft Technology Licensing, Llc Scalable Endpoint-Dependent Natural Language Understanding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008007A (en) * 2014-06-12 2014-08-27 深圳先进技术研究院 Interoperability data processing system and method based on streaming calculation and batch processing calculation
US20170263255A1 (en) * 2016-03-10 2017-09-14 Microsoft Technology Licensing, Llc Scalable Endpoint-Dependent Natural Language Understanding
CN106844585A (en) * 2017-01-10 2017-06-13 广东精规划信息科技股份有限公司 A kind of time-space relationship analysis system based on multi-source Internet of Things location aware

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110780978A (en) * 2019-10-25 2020-02-11 下一代互联网重大应用技术(北京)工程研究中心有限公司 Data processing method, system, device and medium
CN110780978B (en) * 2019-10-25 2022-06-24 赛尔网络有限公司 Data processing method, system, device and medium
CN110968620A (en) * 2019-12-10 2020-04-07 国网信通亿力科技有限责任公司 Agile data analysis method

Similar Documents

Publication Publication Date Title
US11036547B2 (en) File operation task optimization
CN108304505B (en) SQL statement processing method and device, server and storage medium
US20110167056A1 (en) Parameter-sensitive plans
CN105512162A (en) Real-time intelligent processing framework based on storm streaming data
CN107077364A (en) The compiling of the program specification based on figure of the automatic cluster of figure component is used based on the identification that specific FPDP is connected
CN110430444A (en) A kind of video stream processing method and system
KR102225768B1 (en) Instruction execution method and device
CN109815283A (en) A kind of heterogeneous data source visual inquiry method
CN109522348A (en) A kind of data processing system and method merging multiple intellectual analysis language
CN109063017A (en) A kind of data persistence location mode of cloud computing platform
CN111435354A (en) Data export method and device, storage medium and electronic equipment
CN109101330A (en) Data capture method, device and system
CN107291770A (en) The querying method and device of mass data in a kind of distributed system
CN112199477A (en) Dialogue management scheme and dialogue management corpus construction method
CN105164667B (en) Modification analysis stream
CN103544357B (en) The implementation method of the calculating task of ANSYS and device
CN103678425A (en) Integrated analysis for multiple systems
CN108255913A (en) A kind of real-time streaming data processing method and processing device
CN108804710A (en) Method and device for refining label through model tool based on business rule
CN105740374B (en) Three-dimensional platform data fuzzy query method based on distributed memory
CN113568931A (en) Route analysis system and method for data access request
CN103530091B (en) The implementation method of the calculating task of CPMD and device
CN114090583A (en) Cross-business system order data analysis method and device
CN114036182A (en) Data query method, computer equipment and storage medium
US9483332B2 (en) Event processing method in stream processing system and stream processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190326

RJ01 Rejection of invention patent application after publication