CN109522348A

CN109522348A - A kind of data processing system and method merging multiple intellectual analysis language

Info

Publication number: CN109522348A
Application number: CN201811119149.6A
Authority: CN
Inventors: 何海峰; 王文志; 谢东
Original assignee: Nanjing Sinovatio Technology LLC
Current assignee: Nanjing Sinovatio Technology LLC
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2019-03-26

Abstract

The invention discloses a kind of data processing system and method for merging multiple intellectual analysis language, system includes: three data access module, analysis mining module and data memory module basic logic modules；Method includes the following steps: user's start flow；The more intelligent language processes of user configuration are configured by way of pulling operator on OceanMind platform；System generates task execution tree according to the process of configuration；Data source is uniformly processed in execution task, the data access provided based on OceanMind, cleaning conversion function, and according to different Intelligent treatment language, sends request and handled；Processing task is received, according to language difference and whether needs parallelization, is selected based on primary language or based on the parallel processing manner of spark；It will be integrated from different language processing result, and carry out persistence operation, so as to the use of subsequent process.The present invention is capable of providing unified data cleansing conversion function, carries out distributed analysis for all data sources.

Description

A kind of data processing system and method merging multiple intellectual analysis language

Technical field

The present invention relates to intellectual analysis language technology field, at especially a kind of data for merging multiple intellectual analysis language Manage system and method.

Background technique

In recent years, along with the fast development of computer technology and internet, the epoch of opening imformation explosion.It is filled in society Denounce more than ever before data, people is facilitated to seek more suitable intelligent data analysis mode.Existing intellectual analysis language, Such as: python, R language and tensorflow have been able to realize there are data in specific area by the development of certain time The analysis of effect.

Under facilitation, information-based main trend, individual intelligent language processing problem is also gradually highlighted, and is mainly had Following aspects: (1) single Intelligent treatment language spininess handles specific field, is unable to satisfy the place in full field Reason needs；(2) each metalanguage proposes very high want to data cleansing conversion for the standard disunity of data source It asks；(3) with the increase of data volume, the intelligent language of single machine processing can only be carried out, parallelization is unable to satisfy, distributed treatment It needs.Based on above situations, a set of multiple intellectual analysis language of fusion are established, unified data cleansing conversion function is provided, Distributed analysis is carried out for all data sources, just becomes a urgent task.

Summary of the invention

Technical problem to be solved by the present invention lies in provide a kind of data processing system for merging multiple intellectual analysis language System and method, provide unified data cleansing conversion function, carry out distributed analysis for all data sources.

In order to solve the above technical problems, the present invention provides a kind of data processing system for merging multiple intellectual analysis language, It include: three data access module, analysis mining module and data memory module basic logic modules；Data access module is responsible for Access different types of data source, and the regular format needed for subsequent processing；Analysis mining module is responsible for the number using access According to according to the flow of task of configuration, task is submitted to specific intelligent language system and handled by progress task schedule；Number According to memory module be responsible for receive analysis mining module as a result, and being stored in specific destination.

Preferably, different types of data source is relevant database, big data storage system and stream data.

Preferably, task is submitted to specific intelligent language system to handle, if user needs using parallel place Reason calls the processing mode based on spark.

Correspondingly, a kind of data processing method for merging multiple intellectual analysis language, includes the following steps:

(1) user's start flow；

(2) the more intelligent language processes of user configuration are matched by way of pulling operator on OceanMind platform It sets；

(3) system generates task execution tree according to the process of configuration；

(4) task is executed, data source is uniformly processed in data access, cleaning conversion function based on OceanMind offer, And according to different Intelligent treatment language, sends request and handled；

(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on The parallel processing manner of spark；

(6) it will be integrated from different language processing result, and carry out persistence operation, so that subsequent process makes With.

Preferably, in step (5), processing task is received, according to language difference and whether needs parallelization, selection is based on original Raw language or the parallel processing manner based on spark specifically comprise the following steps:

(51) if R language request, user can select according to whether doing parallel processing, and if desired parallel processing is System will call SparkR to handle, and otherwise use primary R language；

(52) if Python is requested, user can select according to whether doing parallel processing, if desired parallel place Reason, system will call PySpark to handle, otherwise use primary python language；

(53) if Tensorflow language request, user can select according to whether doing parallel processing, if desired simultaneously Row processing, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow；

(54) if Java language is requested, default parallelization processing；

(55) by step (51)-(54) processing result be stored in Dataframe in, after return the result.

It is carried out greatly the invention has the benefit that multiple intellectual analysis language are used alone or in combination in user in which can be convenient Data processing, such as tensorflow, R Language Processing are carried out to big data using flow of data stream, it can give full play in this way The advantage of each intellectual analysis language completes analysis task；User more can use distributed computing technology and realize data analysis mining, Various language performances are sufficiently excavated, analysis efficiency is effectively improved；It can be carried in the present system using CPU module simultaneously, it can be to being System is effectively extended, and analysis ability is promoted.

Detailed description of the invention

Fig. 1 is system structure diagram of the invention.

Fig. 2 is method flow schematic diagram of the invention.

Fig. 3 is basic big data platform schematic diagram of the invention.

Fig. 4 is task distribution and processing flow schematic diagram of the invention.

Specific embodiment

Fig. 1 is a kind of structural representation for the data processing system for merging multiple intellectual analysis language implemented according to the present invention Figure.It wherein include three data access module, analysis mining module and data memory module basic logic modules.

Data access module is responsible for accessing different types of data source, and the regular format needed for subsequent processing, such as: closing It is type database, big data storage system and stream data etc..

Analysis mining module is responsible for the data using access, according to the flow of task of configuration, task schedule is carried out, by task Specific intelligent language system is submitted to be handled.Particularly, if user needs using parallel processing, it can call and be based on The processing mode of spark.

Data memory module be responsible for receive analysis mining module as a result, and being stored in specific destination.

Fig. 2 is a kind of process signal for the data processing method for merging multiple intellectual analysis language implemented according to the present invention Figure, includes the following steps:

(1) user's start flow；

(2) the more intelligent language processes of user configuration are configured by way of pulling operator；

(4) task is executed, data source is uniformly processed, and according to different Intelligent treatment language, sends request and is handled；

(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on The parallel processing manner of spark.

(6) result is integrated, and carries out persistence operation.

Fig. 3 is the use exemplary diagram of basic platform of the invention, and right side identifies the principal mode of operator dragging in figure.It is left Platform provides multiple functions for users to use in side, such as data importing, data export, data cleansing conversion, data analysis and machine Device study etc..

Fig. 4 is that user selects processing mode schematic diagram, including following content:

(1) if R language request, user can select according to whether doing parallel processing, if desired parallel processing, system SparkR will be called to handle, otherwise use primary R language；

(2) if Python is requested, user can select according to whether doing parallel processing, if desired parallel place Reason, system will call PySpark to handle, otherwise use primary python language；

(3) if Tensorflow language request, user can select according to whether doing parallel processing, if desired parallel Processing, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow；

(4) if Java language is requested, default parallelization processing；

(5) by step (1)-(4) processing result be stored in Dataframe in, after return the result.

In the present invention, user is used alone or in combination multiple intellectual analysis language and carries out big data processing in which can be convenient, Such as tensorflow, R Language Processing are carried out to big data using flow of data stream, each intelligence point can be given full play in this way The advantage of language is analysed, analysis task is completed；User more can use distributed computing technology and realize data analysis mining, sufficiently excavate each Kind language performance, effectively improves analysis efficiency；It can be carried in the present system using CPU module, system can be carried out effective simultaneously Extension promotes analysis ability.

Claims

1. a kind of data processing system for merging multiple intellectual analysis language characterized by comprising data access module is divided Three basic logic modules of module and data memory module are excavated in analysis；Data access module is responsible for accessing different types of data Source, and the regular format needed for subsequent processing；Analysis mining module is responsible for the data using access, according to the task flow of configuration Journey carries out task schedule, task is submitted to specific intelligent language system and is handled；Data memory module, which is responsible for receiving, to be divided Analysis excavate module as a result, and being stored in specific destination.

2. merging the data processing system of multiple intellectual analysis language as described in claim 1, which is characterized in that variety classes Data source be relevant database, big data storage system and stream data.

3. merging the data processing system of multiple intellectual analysis language as described in claim 1, which is characterized in that propose task It is sent to specific intelligent language system to be handled, if user needs to call the processing side based on spark using parallel processing Formula.

4. a kind of data processing method for merging multiple intellectual analysis language, which comprises the steps of:

(1) user's start flow；

(2) the more intelligent language processes of user configuration are configured by way of pulling operator on OceanMind platform；

(4) task is executed, data source, and root is uniformly processed in data access, cleaning conversion function based on OceanMind offer According to different Intelligent treatment language, sends request and handled；

(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on spark Parallel processing manner；

(6) it will be integrated from different language processing result, and carry out persistence operation, so as to the use of subsequent process.

5. merging the data processing method of multiple intellectual analysis language as claimed in claim 4, which is characterized in that step (5) In, receive processing task, according to language difference and whether need parallelization, select based on primary language or based on spark's and Row processing mode specifically comprises the following steps:

(51) if R language request, user can select according to whether doing parallel processing, if desired parallel processing, and system will It calls SparkR to be handled, otherwise uses primary R language；

(52) if Python is requested, user can select according to whether doing parallel processing, and if desired parallel processing is System will call PySpark to handle, and otherwise use primary python language；

(53) if Tensorflow language request, user can select according to whether doing parallel processing, if desired parallel place Reason, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow；

(54) if Java language is requested, default parallelization processing；