CN109086038B - Spark-based big data development method and device, and terminal - Google Patents

Spark-based big data development method and device, and terminal Download PDF

Info

Publication number
CN109086038B
CN109086038B CN201810755408.8A CN201810755408A CN109086038B CN 109086038 B CN109086038 B CN 109086038B CN 201810755408 A CN201810755408 A CN 201810755408A CN 109086038 B CN109086038 B CN 109086038B
Authority
CN
China
Prior art keywords
development
template
big data
data
spark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810755408.8A
Other languages
Chinese (zh)
Other versions
CN109086038A (en
Inventor
刘霄峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxun Spatial Intelligence Inc
Original Assignee
Qianxun Spatial Intelligence Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxun Spatial Intelligence Inc filed Critical Qianxun Spatial Intelligence Inc
Priority to CN201810755408.8A priority Critical patent/CN109086038B/en
Publication of CN109086038A publication Critical patent/CN109086038A/en
Application granted granted Critical
Publication of CN109086038B publication Critical patent/CN109086038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/33Intelligent editors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention is suitable for the technical field of big data development, and provides a big data development method, a device and a terminal based on Spark, wherein the big data development method comprises the following steps: an integrated development environment is installed, and template engineering is conveniently introduced; downloading a recent template project, compiling and packaging at the same time, and generating a software development kit; adding the software development kit to the integrated development environment to form a development template; and newly building a big data development project, and developing the big data by applying the development template. In the invention, the development mode based on the template not only provides the encapsulation class and the encapsulation method, but also provides the directly operable development template, thereby improving the development efficiency, reducing the door threshold and accelerating the development progress in the simplest and most effective mode.

Description

Spark-based big data development method and device, and terminal
Technical Field
The invention belongs to the technical field of big data development, and particularly relates to a big data development method and device based on Spark, and a terminal.
Background
In recent years, more and more tool development kits bring great convenience to the development task, namely, a technician packages some dependence and practical methods by self-contained packaging means and then uses the methods by others by reference. The method is the most common technology and function sharing mode at present, but the mode has certain disadvantages that the mode is not friendly to beginners, the development and packaging of Spark and the like are not thorough, and the mode cannot be used by many people quickly.
The existing tool development kit only provides an encapsulation method or a parent class, and is used in an inheritance and reference mode, so that a user can well use an internal method only by reading the internal method to a certain extent, and can really start task development only by correspondingly knowing Spark development through other information sources. Therefore, the problems of slow operation, difficult development and the like are caused, and extra development cost is increased.
Disclosure of Invention
The embodiment of the invention provides a Spark-based big data development method, a Spark-based big data development device and a Spark-based big data development terminal, and aims to solve the problems that the development mode in the prior art is not completely encapsulated and cannot be used quickly.
A Spark-based big data development method comprises the following steps:
an integrated development environment is installed, and template engineering is conveniently introduced;
downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
adding the software development kit to the integrated development environment to form a development template;
and newly building a big data development project, and developing the big data by applying the development template.
Preferably, after the installing the integrated development environment, the method further includes: and installing the Maven plug-in of the Maven warehouse and the IDE.
Preferably, the development template includes at least one of a general template, a data cleansing template, and a Spark operator template.
Preferably, the development template contains the reading and structuring of input parameters, the input and output of data, and the selection of intermediate cleaning methods.
Preferably, the step of performing big data development by applying the development template in the newly-built big data development project includes:
performing corresponding modification according to the codes of the development template to complete big data development, or
And continuously expanding the development template, simplifying the development process and sharing a code architecture.
Preferably, the development template is a code with detailed comments and capable of running quickly, and the step of applying the development template to develop big data includes:
selecting a required data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a required data source input method;
the code is modified or pruned as needed.
The invention also provides a Spark-based big data development device, which is characterized by comprising:
the installation unit is used for installing an integrated development environment and is convenient for introducing template engineering;
the compiling unit is used for downloading a recent template project, compiling and packaging at the same time and generating a software development kit;
the adding unit is used for adding the software development toolkit into the integrated development environment to form a development template;
and the development unit is used for newly building a big data development project and applying the development template to develop the big data.
Preferably, the mounting unit further comprises: and installing the Maven plug-in of the Maven warehouse and the IDE.
The invention also provides a memory storing a computer program executed by a processor to perform the steps of:
an integrated development environment is installed, and template engineering is conveniently introduced;
downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
adding the software development kit into the integrated development environment to form a development template;
and newly building a big data development project, and developing the big data by applying the development template.
The invention also provides a terminal, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the following steps:
an integrated development environment is installed, and template engineering is conveniently introduced;
downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
adding the software development kit into the integrated development environment to form a development template;
and newly building a big data development project, and developing the big data by applying the development template.
In the embodiment of the invention, the development mode based on the template not only provides the class and the method of the encapsulation, but also provides the development template which can be directly operated, thereby improving the development efficiency, reducing the door threshold and accelerating the development progress in the simplest and most effective mode.
Drawings
Fig. 1 is a flowchart of a Spark-based big data development method according to a first embodiment of the present invention;
fig. 2 is a flowchart of a preferred mode of a Spark-based big data development method according to a first embodiment of the present invention;
fig. 3 is a structural diagram of a Spark-based big data development device according to a second embodiment of the present invention;
fig. 4 is a structural diagram of a terminal according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In an embodiment of the present invention, a Spark-based big data development method includes: an integrated development environment is installed, and template engineering is conveniently introduced; downloading a recent template project, compiling and packaging at the same time, and generating a software development kit; adding the software development kit to the integrated development environment to form a development template; and newly building a big data development project, and developing the big data by applying the development template.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
The first embodiment is as follows:
fig. 1 is a flowchart illustrating a big data development method based on Spark according to a first embodiment of the present invention, where the method includes:
step S1, installing an Integrated Development Environment (IDE) to facilitate introduction of template engineering;
an IDE typically includes a code editor, compiler, debugger, and graphical user interface tools. The IDE in the embodiment of the invention can be development environments such as IDEA, Eclipse and the like. In step S1, after the integrated development environment is installed, the Maven plugins of the Maven repository and IDE need to be installed so that the development tool can introduce the template engineering.
Step S2, downloading the recent template project, compiling and packaging at the same time, and generating a Software Development Kit (SDK);
the SDK in the embodiments of the present invention supports multiple versions. The SDK encapsulates various versions of Spark-dependent and general methods, such as access to various databases, preprocessing of some data, and the like. Taking developing a simple Spark task as an example, under a common condition, developing the Spark task requires storing related knowledge of Spark, finding dependence, building a development environment, familiarizing an interface, then performing customized development according to a Spark programming specification, and knowing an existing method of Spark based on RDD.
Step S3, adding the software development kit into the integrated development environment to form a development template;
specifically, the development template includes at least one of a general template, a data cleansing template, and a Spark operator template. The development template needs to be dependent on the existing development tools and related plug-ins, needs to be used with dependent management plug-ins, and needs to use template writing functionality to the existing development tools.
All the template related dependencies are integrated into the existing SDK, when the SDK is imported, all the dependencies of the template class are introduced, and the dependency configuration of the functions can be completed in one key. The related dependence of the development template is provided by an SDK mode, the development template is not a compiled tool kit, but has detailed comments and codes capable of running quickly, and the development template has three types in total, namely a general template, a data cleaning template and a Spark operator template. The general template inherits a reading code of a data source, a simple processing code of data and a data storage code, the data cleaning template adds various ETL processes such as filtering, de-duplication, merging and the like on the basis of the general template, and the Spark operator template adds some use examples of Spark complex operators such as Aggregate and the like on the basis of the data cleaning template.
Step S4, creating a big data development project, and developing big data by applying the development template;
specifically, a big data development project is newly built, and after the SDK is introduced, a development template is created, so that all functions in the SDK can be used. The template can be developed secondarily and contributed by codes, and the development of Spark tasks can be completed with minimum time cost. The dependency introduction is completed after the SDK is introduced, the whole development template deployment is completed after the template class is introduced, the configuration of the task development project of Spark is completed only by the two steps, and the method is very friendly to new users and can be directly compiled and run.
In this embodiment, the template-based development method not only provides the class and method of encapsulation, but also provides a directly operable development template, which improves the development efficiency, reduces the threshold for entry, and accelerates the development progress in the simplest and most effective manner.
The whole big data is developed as the following table 1:
Figure BDA0001725874570000051
firstly, installing an IDE (integrated development environment), and installing a Maven plug-in of a Maven warehouse and the IDE to lead in a template project; then, a management tool is relied on to download a recent template project, and compiling and packaging are carried out simultaneously to generate an SDK; the SDK is added into the IDE, a Spark task project is newly built, a development template is created after the SDK is introduced, the development can be started, all functions in the SDK can be used at the moment, meanwhile, the development templates in various forms provide executable and diversified program samples, the development can be directly and correspondingly changed according to template codes, the development task is completed, the development template can be continuously expanded, the development flow is simplified, and a code framework is shared.
The data source adaptation can comprise writing and outputting of data sources such as MongoDB, HDFS, Hive, Hbase and MySQL, and the universal method comprises various time specifications such as a day starting timestamp, a week starting timestamp and a five-minute starting timestamp; regular judgment of character strings, judgment of NULL values, dynamic switching of data sources and the like are also carried out; meanwhile, configuration management of some dynamic parameters, local configuration association, HDFS configuration association, KV library configuration association and the like are added. All three templates contain the reading and regularization of input parameters, the input and output of data, and the selection of intermediate washing methods.
The invention provides a large data development mode based on a template, not only provides a packaging class and a method, but also provides a directly operable development template, and the development template is respectively customized according to different scenes, so that a user can directly operate only by introducing the template, and simultaneously, the available method is directly modified according to the template sample, and the adjustment of parameters is completed according to the comments. The development template is directly available for users and is provided with a known code which is read in detail, and when a Spark task is created each time, the development template can be created directly through the template, and meanwhile, improvement and addition can be carried out according to the existing template, and the creation of the own template is completed.
The method provided by the embodiment of the invention can further improve the development efficiency of developers, reduce the threshold of entry, build a Spark big data development environment in a one-stop manner, provide an easy-to-use method and increase the support for various data sources.
In a preferred embodiment of this embodiment (see fig. 2), the step of developing big data by applying the development template includes:
step S5, selecting a needed data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a needed data source input method;
taking developing a simple Spark task as an example, under a common condition, developing the Spark task requires storing related knowledge of Spark, finding dependence, building a development environment, familiarizing an interface, then performing customized development according to a Spark programming specification, and knowing an existing method of Spark based on RDD. After the development method is provided, the debugging of the Spark task can be carried out only by establishing the MAVEN project belonging to the development method, downloading the SDK introduction and introducing the SDK introduction together with the template. And selecting template classes from the three categories to create, selecting a required data source writing method according to the annotations, selecting a reasonable RDD operator, and selecting a required data source input method.
Step S6, modifying or deleting the code as needed.
The developer does not need to care about details, all data source operations and RDD operator operations are presented in the template class in a code mode, and the developer can complete development only by modifying or deleting the code according to needs. The universal template can be selected for development by an experienced developer, and the cost of code specification and data input and output programming is saved. The user can customize own development template at the same time, can establish a new template according to the templates of the three types, and only needs to share the template when multiple persons develop in a collaborative manner.
The function of the entire SDK is shown in table 2 below:
Figure BDA0001725874570000071
the method is only a simple example, in real life, the difficulty of entering the door is faced by people, the repeated labor is always a great problem troubling the development, and the development mode based on the template just solves the problem.
The big data development method of the embodiment of the invention is based on the existing IDE, only SDK is directly introduced, the use is convenient, the template class of the invention can be directly operated and presented in a code mode, the modification is easy, the expansion is convenient, and the template development class of the invention is established; the invention improves the development efficiency, reduces the entrance threshold, can adapt to various modes such as individual combat and multi-person cooperation, and accelerates the development progress in the simplest and most effective mode. Therefore, the big data development method and the big data development platform provided by the invention have very wide application prospects in various fields such as big data development and the like. It should be noted that, in the implementation process of the present invention, support of the existing development tools is required, data sources included in the present invention are wide, and include MongoDB, HDFS, Hive, Hbase, Mysql, Kafka, and supported operators include all RDD operators on Spark official networks, including method types and use examples.
Example two:
as shown in fig. 3, a structure diagram of a big data developing device based on Spark according to a second embodiment of the present invention includes: installation unit 1, compiling unit 2 connected with installation unit 1, adding unit 3 connected with compiling unit 2, developing unit 4 connected with adding unit 3, wherein:
the installation unit 1 is used for installing an integrated development environment and is convenient for introducing template engineering;
an IDE typically includes a code editor, compiler, debugger, and graphical user interface tools. The IDE in the embodiment of the present invention may be a development environment such as IDEA, Eclipse, or the like. In step S1, after the integrated development environment is installed, the Maven plugins of the Maven repository and IDE need to be installed so that the development tool can introduce the template engineering.
The compiling unit 2 is used for downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
the SDK in the embodiments of the present invention supports multiple versions. The SDK encapsulates various versions of Spark-dependent and general methods, such as access to various databases, preprocessing of some data, and the like. Taking developing a simple Spark task as an example, under a common condition, developing the Spark task requires storing related knowledge of Spark, finding dependence, building a development environment, familiarizing an interface, then performing customized development according to a Spark programming specification, and knowing an existing method of Spark based on RDD.
The adding unit 3 is used for adding the software development toolkit into the integrated development environment to form a development template;
specifically, the development template includes at least one of a general template, a data cleansing template, and a Spark operator template. The development template needs to be dependent on the existing development tools and related plug-ins, needs to be used with dependent management plug-ins, and needs to use template writing functionality to the existing development tools.
All the template related dependencies are integrated into the existing SDK, when the SDK is imported, all the dependencies of the template class are introduced, and the dependency configuration of the functions can be completed in one key. The related dependence of the development template is provided by an SDK mode, the development template is not a compiled tool kit but is provided with detailed comments and codes capable of running quickly, and the development template comprises three types in total, namely a universal template, a data cleaning template and a Spark operator template. The general template inherits a reading code of a data source, a simple processing code of data and a data storage code, the data cleaning template adds various ETL processes such as filtering, de-duplication, merging and the like on the basis of the general template, and the Spark operator template adds some use examples of Spark complex operators such as Aggregate and the like on the basis of the data cleaning template.
The development unit 4 is used for newly building a big data development project and applying the development template to develop big data;
specifically, a big data development project is newly built, and after the SDK is introduced, a development template is created, so that all functions in the SDK can be used. The template can be developed secondarily and contributed by codes, and the development of Spark tasks can be completed with minimum time cost.
The SDK includes data source adaptation, a general method, a configuration method and templates. The data source adaptation can comprise writing and outputting of data sources such as MongoDB, HDFS, Hive, Hbase and MySQL, and the universal method comprises various time specifications such as a day starting timestamp, a week starting timestamp and a five-minute starting timestamp; regular judgment of character strings, judgment of NULL values, dynamic switching of data sources and the like are also carried out; meanwhile, configuration management of some dynamic parameters, local configuration association, HDFS configuration association, KV library configuration association and the like are added. All three templates contain the reading and regularization of input parameters, the input and output of data, and the selection of intermediate washing methods.
The dependency introduction is completed after the SDK is introduced, the whole development template deployment is completed after the template class is introduced, the configuration of the task development project of Spark is completed only by the two steps, and the method is very friendly to new users and can be directly compiled and run.
The invention provides a large data development mode based on a template, not only provides a packaging class and a method, but also provides a directly operable development template, and the development template is respectively customized according to different scenes, so that a user can directly operate only by introducing the template, and simultaneously, the available method is directly modified according to the template sample, and the adjustment of parameters is completed according to the comments. The development template is directly available for users and is provided with a known code which is read in detail, and when a Spark task is created each time, the development template can be created directly through the template, and meanwhile, improvement and addition can be carried out according to the existing template, and the creation of the own template is completed.
In this embodiment, the template-based development method not only provides the class and method of encapsulation, but also provides a directly operable development template, which improves the development efficiency, reduces the threshold for entry, and accelerates the development progress in the simplest and most effective manner.
In a preferred embodiment of the present embodiment, the development unit 4 is further configured to:
selecting a required data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a required data source input method;
the code is modified or pruned as needed.
Taking developing a simple Spark task as an example, under a common condition, developing the Spark task requires storing related knowledge of Spark, finding dependence, building a development environment, familiarizing an interface, then performing customized development according to a Spark programming specification, and knowing an existing method of Spark based on RDD. After the development method is provided, debugging of Spark tasks can be performed only by establishing MAVEN engineering belonging to the development method, downloading SDK introduction and introducing templates together. And selecting template classes from the three categories to create, selecting a required data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a required data source input method.
The developer does not need to care about details, all data source operations and RDD operator operations are presented in the template class in a code mode, and the developer can complete development only by modifying or deleting the code according to needs. The universal template can be selected for development by an experienced developer, and the cost of code specification and data input and output programming is saved. The user can customize own development template at the same time, can establish a new template according to the templates of the three types, and only needs to share the template when multiple persons develop in a collaborative manner.
The big data development method of the embodiment of the invention is based on the existing IDE, only SDK is directly introduced, the use is convenient, the template class of the invention can be directly operated and presented in a code mode, the modification is easy, the expansion is convenient, and the template development class of the invention is established; the invention improves the development efficiency, reduces the entrance threshold, can adapt to various modes such as single-soldier combat and multi-person cooperation, and accelerates the development progress in the simplest and most effective mode. Therefore, the big data development method and the big data development platform provided by the invention have very wide application prospects in various fields such as big data development and the like. It should be noted that, in the implementation process of the present invention, support of the existing development tools is required, the data sources included in the present invention are wide, and include MongoDB, HDFS, Hive, Hbase, Mysql, Kafka, and the supported operators include all RDD operators on Spark official networks, including method categories and use examples, although such a method package is encapsulated by someone, no template is formed so far, and no data source capable of supporting so many data sources is formed, and the one-stop template-based rapid development method of the present invention is not formed.
Example three:
fig. 4 shows a block diagram of a terminal according to a fourth embodiment of the present invention, where the terminal includes: a memory (memory)41, a processor (processor)42 and a bus 43, wherein the processor 42 and the memory 41 are in mutual communication via the bus 43.
A memory 41 for storing various data;
specifically, the memory 41 is used for storing various data, such as parameters, codes, and the like in the process of developing big data, and is not limited herein, and the memory further includes a plurality of computer programs.
The processor 42 is configured to call various computer programs in the memory 41 to execute a Spark-based big data development method provided in the first embodiment, for example:
an integrated development environment is installed, and template engineering is conveniently introduced;
downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
adding the software development kit to the integrated development environment to form a development template;
and newly building a big data development project, and developing the big data by applying the development template.
The present invention further provides a memory, where the memory stores a plurality of computer programs, and the computer programs are called by the processor to execute a Spark-based big data development method according to the first embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution.
Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A Spark-based big data development method is characterized by comprising the following steps:
installing an integrated development environment for introducing template engineering;
downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
adding the software development kit to the integrated development environment to form a development template;
newly building a big data development project, and developing the big data by applying the development template;
the step of newly building big data development engineering and applying the development template to develop big data comprises the following steps:
correspondingly changing according to the codes of the development template to complete big data development;
continuing to expand the development template, simplifying the development process and sharing a code architecture;
the development template is a code with comments and capable of running, and the step of applying the development template to develop big data further comprises the following steps:
selecting a required data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a required data source input method;
modifying or deleting the code as required;
the big data development further comprises: data source adaptation, namely regular judgment of character strings, judgment of NULL values, dynamic switching of data sources, configuration management of dynamic parameters, local configuration association, HDFS configuration association and KV library configuration association; the related dependence of the development template is provided in an SDK mode, and the development template comprises at least one of a general template, a data cleaning template and a Spark operator template; the development template comprises reading and regularizing input parameters, inputting and outputting data and selecting an intermediate cleaning method.
2. The big data development method according to claim 1, wherein after installing the integrated development environment, the method further comprises: and installing the Maven plug-in of the Maven warehouse and the IDE.
3. A Spark-based big data development device is characterized by comprising:
the installation unit is used for installing an integrated development environment and introducing template engineering;
the compiling unit is used for downloading a recent template project, compiling and packaging at the same time and generating a software development kit;
the adding unit is used for adding the software development toolkit into the integrated development environment to form a development template;
the development unit is used for newly building a big data development project and applying the development template to develop big data;
the development unit is further to:
correspondingly changing according to the codes of the development template to complete big data development;
continuing to expand the development template, simplifying the development process and sharing a code architecture;
the development template is code with comments and capable of running, and the development unit is further configured to:
selecting a required data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a required data source input method;
modifying or deleting the code as required;
the big data development further comprises: data source adaptation, namely regular judgment of character strings, judgment of NULL values, dynamic switching of data sources, configuration management of dynamic parameters, local configuration association, HDFS configuration association and KV library configuration association; the related dependence of the development template is provided in an SDK mode, and the development template comprises at least one of a general template, a data cleaning template and a Spark operator template; the development template comprises reading and regularizing input parameters, inputting and outputting data and selecting an intermediate cleaning method.
4. The big data developing apparatus according to claim 3, wherein the installation unit further comprises: and installing the Maven plug-in of the Maven warehouse and the IDE.
5. A memory for Spark-based big data development, the memory storing a computer program, the computer program being executable by a processor to perform the steps of:
installing an integrated development environment for introducing template engineering;
downloading a recent template project, compiling and packaging at the same time, and generating a software development kit;
adding the software development kit to the integrated development environment to form a development template;
newly building a big data development project, and developing the big data by applying the development template;
the step of newly building the big data development project and applying the development template to develop the big data comprises the following steps:
correspondingly changing according to the codes of the development template to complete big data development;
continuing to expand the development template, simplifying the development process and sharing a code architecture;
the development template is a code with comments and capable of running, and the step of applying the development template to develop big data further comprises the following steps:
selecting a required data source writing method according to the annotation, selecting a reasonable RDD operator, and selecting a required data source input method;
modifying or deleting the code as required;
the big data development further comprises: data source adaptation, namely regular judgment of character strings, judgment of NULL values, dynamic switching of data sources, configuration management of dynamic parameters, local configuration association, HDFS configuration association and KV library configuration association; the related dependence of the development template is provided in an SDK mode, and the development template comprises at least one of a general template, a data cleaning template and a Spark operator template; the development template comprises reading and regularizing input parameters, inputting and outputting data and selecting an intermediate cleaning method.
6. A terminal for Spark-based big data development, comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor implements the steps of the Spark-based big data development method according to any one of claims 1 to 2 when executing the computer program.
CN201810755408.8A 2018-07-10 2018-07-10 Spark-based big data development method and device, and terminal Active CN109086038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810755408.8A CN109086038B (en) 2018-07-10 2018-07-10 Spark-based big data development method and device, and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810755408.8A CN109086038B (en) 2018-07-10 2018-07-10 Spark-based big data development method and device, and terminal

Publications (2)

Publication Number Publication Date
CN109086038A CN109086038A (en) 2018-12-25
CN109086038B true CN109086038B (en) 2022-05-31

Family

ID=64837591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810755408.8A Active CN109086038B (en) 2018-07-10 2018-07-10 Spark-based big data development method and device, and terminal

Country Status (1)

Country Link
CN (1) CN109086038B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110007900A (en) * 2019-02-13 2019-07-12 平安科技(深圳)有限公司 Tool-class call method, system, computer equipment and storage medium
CN110928529B (en) * 2019-11-06 2021-10-26 第四范式(北京)技术有限公司 Method and system for assisting operator development
CN114722161B (en) * 2022-06-09 2022-10-11 易方信息科技股份有限公司 Method and device for rapidly inquiring state of single task of adding PM on IDE interface

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103777944A (en) * 2013-12-25 2014-05-07 中软信息系统工程有限公司 MIPS platform integrated development environment based on Eclipse and implementation method thereof
WO2017114188A1 (en) * 2015-12-29 2017-07-06 口碑控股有限公司 Printing apparatus and printing method
CN106990965A (en) * 2017-03-31 2017-07-28 合肥民众亿兴软件开发有限公司 A kind of software platform and its development approach
CN107924305A (en) * 2015-09-02 2018-04-17 谷歌有限责任公司 Software development and distribution platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100726614B1 (en) * 2006-02-01 2007-06-11 에스케이 텔레콤주식회사 System for surpporting a programing about an application based on virtual machine and a method the same
US9218166B2 (en) * 2008-02-20 2015-12-22 Embarcadero Technologies, Inc. Development system with improved methodology for creation and reuse of software assets
CN103713896B (en) * 2013-12-17 2017-01-04 北京京东尚科信息技术有限公司 Method and device is generated for accessing the SDK of server
CN106250987B (en) * 2016-07-22 2019-03-01 无锡华云数据技术服务有限公司 A kind of machine learning method, device and big data platform
CN107632817A (en) * 2017-09-28 2018-01-26 北京昆仑在线网络科技有限公司 A kind of Mobile solution efficient iterative Spark frameworks
CN107943485B (en) * 2017-12-11 2021-07-20 北京奇虎科技有限公司 Patch compiling platform and patch compiling method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103777944A (en) * 2013-12-25 2014-05-07 中软信息系统工程有限公司 MIPS platform integrated development environment based on Eclipse and implementation method thereof
CN107924305A (en) * 2015-09-02 2018-04-17 谷歌有限责任公司 Software development and distribution platform
WO2017114188A1 (en) * 2015-12-29 2017-07-06 口碑控股有限公司 Printing apparatus and printing method
CN106990965A (en) * 2017-03-31 2017-07-28 合肥民众亿兴软件开发有限公司 A kind of software platform and its development approach

Also Published As

Publication number Publication date
CN109086038A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109086038B (en) Spark-based big data development method and device, and terminal
ES2804506T3 (en) First-class object sharing across multiple interpreted programming languages
US20140372972A1 (en) Method and apparatus for code virtualization and remote process call generation
CN109739494B (en) Tree-LSTM-based API (application program interface) use code generation type recommendation method
CN114610640B (en) Fuzzy test method and system for trusted execution environment of Internet of things
CN111966357A (en) Operating system application compiling method and device and electronic equipment
CN113360156B (en) IOS compiling method and related equipment
JP2005018114A (en) Program maintenance support device, program maintenance support method, and program
CN108304164B (en) Business logic development method and development system
US20150378742A1 (en) Rule-based activation of behaviors in an extensible software application
Sousa et al. Operationalizing the integration of user interaction specifications in the synthesis of modeling editors
Marin et al. Towards a framework for generating program dependence graphs from source code
Winetzhammer Modgraph-generating executable emf models
CN111124386B (en) Animation event processing method, device, equipment and storage medium based on Unity
Ullah et al. Template-based automatic code generation for web application and APIs using class diagram
Standish et al. EcoLab: Agent based modeling for C++ programmers
CN112486523A (en) Container mirror image creating method and device, storage medium and electronic equipment
Papadimitriou et al. Scientific scripting for the Java platform with jLab
Gyén et al. Comprehension of Thread Scheduling for the C++ Programming Language
CN115185502B (en) Rule-based data processing workflow definition method, device, terminal and medium
Di Salle et al. Mastering Reference Architectures with Modeling Assistants
Ishikawa A Case Study of Refactoring with UML Editor Plug-in for Eclipse–Replace Type Code with State/Strategy–
Mészáros et al. Visual specification of a DSL processor debugger
CN118277268A (en) Code compiling and pile inserting method and device, electronic equipment and readable storage medium
CN114185801A (en) Method and device for generating Mock test script based on unit test

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant