CN109522348A - A kind of data processing system and method merging multiple intellectual analysis language - Google Patents
A kind of data processing system and method merging multiple intellectual analysis language Download PDFInfo
- Publication number
- CN109522348A CN109522348A CN201811119149.6A CN201811119149A CN109522348A CN 109522348 A CN109522348 A CN 109522348A CN 201811119149 A CN201811119149 A CN 201811119149A CN 109522348 A CN109522348 A CN 109522348A
- Authority
- CN
- China
- Prior art keywords
- language
- data
- processing
- task
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a kind of data processing system and method for merging multiple intellectual analysis language, system includes: three data access module, analysis mining module and data memory module basic logic modules;Method includes the following steps: user's start flow;The more intelligent language processes of user configuration are configured by way of pulling operator on OceanMind platform;System generates task execution tree according to the process of configuration;Data source is uniformly processed in execution task, the data access provided based on OceanMind, cleaning conversion function, and according to different Intelligent treatment language, sends request and handled;Processing task is received, according to language difference and whether needs parallelization, is selected based on primary language or based on the parallel processing manner of spark;It will be integrated from different language processing result, and carry out persistence operation, so as to the use of subsequent process.The present invention is capable of providing unified data cleansing conversion function, carries out distributed analysis for all data sources.
Description
Technical field
The present invention relates to intellectual analysis language technology field, at especially a kind of data for merging multiple intellectual analysis language
Manage system and method.
Background technique
In recent years, along with the fast development of computer technology and internet, the epoch of opening imformation explosion.It is filled in society
Denounce more than ever before data, people is facilitated to seek more suitable intelligent data analysis mode.Existing intellectual analysis language,
Such as: python, R language and tensorflow have been able to realize there are data in specific area by the development of certain time
The analysis of effect.
Under facilitation, information-based main trend, individual intelligent language processing problem is also gradually highlighted, and is mainly had
Following aspects: (1) single Intelligent treatment language spininess handles specific field, is unable to satisfy the place in full field
Reason needs;(2) each metalanguage proposes very high want to data cleansing conversion for the standard disunity of data source
It asks;(3) with the increase of data volume, the intelligent language of single machine processing can only be carried out, parallelization is unable to satisfy, distributed treatment
It needs.Based on above situations, a set of multiple intellectual analysis language of fusion are established, unified data cleansing conversion function is provided,
Distributed analysis is carried out for all data sources, just becomes a urgent task.
Summary of the invention
Technical problem to be solved by the present invention lies in provide a kind of data processing system for merging multiple intellectual analysis language
System and method, provide unified data cleansing conversion function, carry out distributed analysis for all data sources.
In order to solve the above technical problems, the present invention provides a kind of data processing system for merging multiple intellectual analysis language,
It include: three data access module, analysis mining module and data memory module basic logic modules;Data access module is responsible for
Access different types of data source, and the regular format needed for subsequent processing;Analysis mining module is responsible for the number using access
According to according to the flow of task of configuration, task is submitted to specific intelligent language system and handled by progress task schedule;Number
According to memory module be responsible for receive analysis mining module as a result, and being stored in specific destination.
Preferably, different types of data source is relevant database, big data storage system and stream data.
Preferably, task is submitted to specific intelligent language system to handle, if user needs using parallel place
Reason calls the processing mode based on spark.
Correspondingly, a kind of data processing method for merging multiple intellectual analysis language, includes the following steps:
(1) user's start flow;
(2) the more intelligent language processes of user configuration are matched by way of pulling operator on OceanMind platform
It sets;
(3) system generates task execution tree according to the process of configuration;
(4) task is executed, data source is uniformly processed in data access, cleaning conversion function based on OceanMind offer,
And according to different Intelligent treatment language, sends request and handled;
(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on
The parallel processing manner of spark;
(6) it will be integrated from different language processing result, and carry out persistence operation, so that subsequent process makes
With.
Preferably, in step (5), processing task is received, according to language difference and whether needs parallelization, selection is based on original
Raw language or the parallel processing manner based on spark specifically comprise the following steps:
(51) if R language request, user can select according to whether doing parallel processing, and if desired parallel processing is
System will call SparkR to handle, and otherwise use primary R language;
(52) if Python is requested, user can select according to whether doing parallel processing, if desired parallel place
Reason, system will call PySpark to handle, otherwise use primary python language;
(53) if Tensorflow language request, user can select according to whether doing parallel processing, if desired simultaneously
Row processing, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow;
(54) if Java language is requested, default parallelization processing;
(55) by step (51)-(54) processing result be stored in Dataframe in, after return the result.
It is carried out greatly the invention has the benefit that multiple intellectual analysis language are used alone or in combination in user in which can be convenient
Data processing, such as tensorflow, R Language Processing are carried out to big data using flow of data stream, it can give full play in this way
The advantage of each intellectual analysis language completes analysis task;User more can use distributed computing technology and realize data analysis mining,
Various language performances are sufficiently excavated, analysis efficiency is effectively improved;It can be carried in the present system using CPU module simultaneously, it can be to being
System is effectively extended, and analysis ability is promoted.
Detailed description of the invention
Fig. 1 is system structure diagram of the invention.
Fig. 2 is method flow schematic diagram of the invention.
Fig. 3 is basic big data platform schematic diagram of the invention.
Fig. 4 is task distribution and processing flow schematic diagram of the invention.
Specific embodiment
Fig. 1 is a kind of structural representation for the data processing system for merging multiple intellectual analysis language implemented according to the present invention
Figure.It wherein include three data access module, analysis mining module and data memory module basic logic modules.
Data access module is responsible for accessing different types of data source, and the regular format needed for subsequent processing, such as: closing
It is type database, big data storage system and stream data etc..
Analysis mining module is responsible for the data using access, according to the flow of task of configuration, task schedule is carried out, by task
Specific intelligent language system is submitted to be handled.Particularly, if user needs using parallel processing, it can call and be based on
The processing mode of spark.
Data memory module be responsible for receive analysis mining module as a result, and being stored in specific destination.
Fig. 2 is a kind of process signal for the data processing method for merging multiple intellectual analysis language implemented according to the present invention
Figure, includes the following steps:
(1) user's start flow;
(2) the more intelligent language processes of user configuration are configured by way of pulling operator;
(3) system generates task execution tree according to the process of configuration;
(4) task is executed, data source is uniformly processed, and according to different Intelligent treatment language, sends request and is handled;
(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on
The parallel processing manner of spark.
(6) result is integrated, and carries out persistence operation.
Fig. 3 is the use exemplary diagram of basic platform of the invention, and right side identifies the principal mode of operator dragging in figure.It is left
Platform provides multiple functions for users to use in side, such as data importing, data export, data cleansing conversion, data analysis and machine
Device study etc..
Fig. 4 is that user selects processing mode schematic diagram, including following content:
(1) if R language request, user can select according to whether doing parallel processing, if desired parallel processing, system
SparkR will be called to handle, otherwise use primary R language;
(2) if Python is requested, user can select according to whether doing parallel processing, if desired parallel place
Reason, system will call PySpark to handle, otherwise use primary python language;
(3) if Tensorflow language request, user can select according to whether doing parallel processing, if desired parallel
Processing, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow;
(4) if Java language is requested, default parallelization processing;
(5) by step (1)-(4) processing result be stored in Dataframe in, after return the result.
In the present invention, user is used alone or in combination multiple intellectual analysis language and carries out big data processing in which can be convenient,
Such as tensorflow, R Language Processing are carried out to big data using flow of data stream, each intelligence point can be given full play in this way
The advantage of language is analysed, analysis task is completed;User more can use distributed computing technology and realize data analysis mining, sufficiently excavate each
Kind language performance, effectively improves analysis efficiency;It can be carried in the present system using CPU module, system can be carried out effective simultaneously
Extension promotes analysis ability.
Claims (5)
1. a kind of data processing system for merging multiple intellectual analysis language characterized by comprising data access module is divided
Three basic logic modules of module and data memory module are excavated in analysis;Data access module is responsible for accessing different types of data
Source, and the regular format needed for subsequent processing;Analysis mining module is responsible for the data using access, according to the task flow of configuration
Journey carries out task schedule, task is submitted to specific intelligent language system and is handled;Data memory module, which is responsible for receiving, to be divided
Analysis excavate module as a result, and being stored in specific destination.
2. merging the data processing system of multiple intellectual analysis language as described in claim 1, which is characterized in that variety classes
Data source be relevant database, big data storage system and stream data.
3. merging the data processing system of multiple intellectual analysis language as described in claim 1, which is characterized in that propose task
It is sent to specific intelligent language system to be handled, if user needs to call the processing side based on spark using parallel processing
Formula.
4. a kind of data processing method for merging multiple intellectual analysis language, which comprises the steps of:
(1) user's start flow;
(2) the more intelligent language processes of user configuration are configured by way of pulling operator on OceanMind platform;
(3) system generates task execution tree according to the process of configuration;
(4) task is executed, data source, and root is uniformly processed in data access, cleaning conversion function based on OceanMind offer
According to different Intelligent treatment language, sends request and handled;
(5) processing task is received, according to language difference and whether needs parallelization, selection is based on primary language or is based on spark
Parallel processing manner;
(6) it will be integrated from different language processing result, and carry out persistence operation, so as to the use of subsequent process.
5. merging the data processing method of multiple intellectual analysis language as claimed in claim 4, which is characterized in that step (5)
In, receive processing task, according to language difference and whether need parallelization, select based on primary language or based on spark's and
Row processing mode specifically comprises the following steps:
(51) if R language request, user can select according to whether doing parallel processing, if desired parallel processing, and system will
It calls SparkR to be handled, otherwise uses primary R language;
(52) if Python is requested, user can select according to whether doing parallel processing, and if desired parallel processing is
System will call PySpark to handle, and otherwise use primary python language;
(53) if Tensorflow language request, user can select according to whether doing parallel processing, if desired parallel place
Reason, system will use Xlearning frame to handle, otherwise be handled using primary tensorflow;
(54) if Java language is requested, default parallelization processing;
(55) by step (51)-(54) processing result be stored in Dataframe in, after return the result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811119149.6A CN109522348A (en) | 2018-09-25 | 2018-09-25 | A kind of data processing system and method merging multiple intellectual analysis language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811119149.6A CN109522348A (en) | 2018-09-25 | 2018-09-25 | A kind of data processing system and method merging multiple intellectual analysis language |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109522348A true CN109522348A (en) | 2019-03-26 |
Family
ID=65772213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811119149.6A Pending CN109522348A (en) | 2018-09-25 | 2018-09-25 | A kind of data processing system and method merging multiple intellectual analysis language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522348A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110780978A (en) * | 2019-10-25 | 2020-02-11 | 下一代互联网重大应用技术(北京)工程研究中心有限公司 | Data processing method, system, device and medium |
CN110968620A (en) * | 2019-12-10 | 2020-04-07 | 国网信通亿力科技有限责任公司 | Agile data analysis method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008007A (en) * | 2014-06-12 | 2014-08-27 | 深圳先进技术研究院 | Interoperability data processing system and method based on streaming calculation and batch processing calculation |
CN106844585A (en) * | 2017-01-10 | 2017-06-13 | 广东精规划信息科技股份有限公司 | A kind of time-space relationship analysis system based on multi-source Internet of Things location aware |
US20170263255A1 (en) * | 2016-03-10 | 2017-09-14 | Microsoft Technology Licensing, Llc | Scalable Endpoint-Dependent Natural Language Understanding |
-
2018
- 2018-09-25 CN CN201811119149.6A patent/CN109522348A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008007A (en) * | 2014-06-12 | 2014-08-27 | 深圳先进技术研究院 | Interoperability data processing system and method based on streaming calculation and batch processing calculation |
US20170263255A1 (en) * | 2016-03-10 | 2017-09-14 | Microsoft Technology Licensing, Llc | Scalable Endpoint-Dependent Natural Language Understanding |
CN106844585A (en) * | 2017-01-10 | 2017-06-13 | 广东精规划信息科技股份有限公司 | A kind of time-space relationship analysis system based on multi-source Internet of Things location aware |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110780978A (en) * | 2019-10-25 | 2020-02-11 | 下一代互联网重大应用技术(北京)工程研究中心有限公司 | Data processing method, system, device and medium |
CN110780978B (en) * | 2019-10-25 | 2022-06-24 | 赛尔网络有限公司 | Data processing method, system, device and medium |
CN110968620A (en) * | 2019-12-10 | 2020-04-07 | 国网信通亿力科技有限责任公司 | Agile data analysis method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11036547B2 (en) | File operation task optimization | |
CN108304505B (en) | SQL statement processing method and device, server and storage medium | |
US20110167056A1 (en) | Parameter-sensitive plans | |
CN105512162A (en) | Real-time intelligent processing framework based on storm streaming data | |
CN107077364A (en) | The compiling of the program specification based on figure of the automatic cluster of figure component is used based on the identification that specific FPDP is connected | |
CN110430444A (en) | A kind of video stream processing method and system | |
KR102225768B1 (en) | Instruction execution method and device | |
CN109815283A (en) | A kind of heterogeneous data source visual inquiry method | |
CN109522348A (en) | A kind of data processing system and method merging multiple intellectual analysis language | |
CN109063017A (en) | A kind of data persistence location mode of cloud computing platform | |
CN111435354A (en) | Data export method and device, storage medium and electronic equipment | |
CN109101330A (en) | Data capture method, device and system | |
CN107291770A (en) | The querying method and device of mass data in a kind of distributed system | |
CN112199477A (en) | Dialogue management scheme and dialogue management corpus construction method | |
CN105164667B (en) | Modification analysis stream | |
CN103544357B (en) | The implementation method of the calculating task of ANSYS and device | |
CN103678425A (en) | Integrated analysis for multiple systems | |
CN108255913A (en) | A kind of real-time streaming data processing method and processing device | |
CN108804710A (en) | Method and device for refining label through model tool based on business rule | |
CN105740374B (en) | Three-dimensional platform data fuzzy query method based on distributed memory | |
CN113568931A (en) | Route analysis system and method for data access request | |
CN103530091B (en) | The implementation method of the calculating task of CPMD and device | |
CN114090583A (en) | Cross-business system order data analysis method and device | |
CN114036182A (en) | Data query method, computer equipment and storage medium | |
US9483332B2 (en) | Event processing method in stream processing system and stream processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |
|
RJ01 | Rejection of invention patent application after publication |