CN109408537A - Data processing method and device, storage medium and calculating equipment based on Spark SQL - Google Patents
Data processing method and device, storage medium and calculating equipment based on Spark SQL Download PDFInfo
- Publication number
- CN109408537A CN109408537A CN201811214789.5A CN201811214789A CN109408537A CN 109408537 A CN109408537 A CN 109408537A CN 201811214789 A CN201811214789 A CN 201811214789A CN 109408537 A CN109408537 A CN 109408537A
- Authority
- CN
- China
- Prior art keywords
- user
- session
- spark
- user name
- context variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
Embodiments of the present invention provide a kind of data processing method based on Spark SQL.This method comprises:, according to the user name for the proxy user for initiating session, being concentrated in preset relation in response to the initiation of session and searching the corresponding Spark context variable example of the user name;Corresponding Spark context variable is created if not finding and is instantiated, and the corresponding relationship for adding the user name at least between corresponding Spark context variable example is concentrated in preset relation;According to the corresponding Spark context variable example of user name for the proxy user for initiating session, corresponding runtime environment is created to execute corresponding data processing, this method can provide service by running single application example on a server for multiple tenants, realize multi-tenant function.In addition, embodiments of the present invention provide a kind of data processing equipment based on Spark SQL, storage medium and calculate equipment.
Description
Technical field
Embodiments of the present invention are related to data processing field, more specifically, embodiments of the present invention are related to a kind of base
In the data processing method and device of Spark SQL, storage medium and calculate equipment.
Background technique
Big data technology is a more popular at present technology, refers to and is inquired huge data, analyzed
The technology of processing.With the arriving of big data era, data warehouse relevant to big data, data safety, data analysis, data
The application such as excavation has been increasingly becoming the research hotspot of IT industry.
For example, the Apache Spark for being born in University of California Berli gram branch school AMPLab is one and calculates based on memory
Big data Computational frame.Wherein, Spark is the alternative solution of MapReduce (MR), and it is an object of the present invention to provide at more efficient data
Reason ability, and it can be compatible with HDFS distributed storage layer, compatible Apache Hive metadata warehouse can incorporate Hadoop's
The ecosystem, to make up the deficiency of missing MapReduce.In general, Spark program is principal and subordinate (master/slave) structure, drive
Dynamic device (Driver) is responsible for its tune for calculating minimum unit task (task) as master (referring to the side for actively initiating to request)
Degree, and the operation of actuator (Executor) loser task.But MapReduce is not able to satisfy under most of big data scene
Extemporaneous inquiry.
For another example, one of which of the Spark SQL as SQL on Hadoop technology, effect is to pass through SQL query statement
Its included query optimizer translates into Spark bottom calculating logic, to provide efficient SQL query ability.Based on Spark
SQL realizes calculating logic to the target product of Apache Hive etc., can be improved processing for MapReduce
Performance.
Summary of the invention
But above-mentioned big data Computational frame can not be multiple by running single application example on a server
Tenant provides service, that is, not having multi-tenant (Multi Tenancy/Tenant) function.
Draw for example, HiveServer2 (hereinafter referred to as technology one) as shown in Figure 1A provides a kind of inquire based on Hive
The SQL on Hadoop multi-tenant scheme held up, the multi-tenant scheme ask each client (Client) from the user
It asks, HiveServer2 is that the request creates a session (Session), and distributes an execution context environmental, is corresponded to
One wheel MR task.In the multi-tenant scheme, the performing environment of computation layer starting is corresponded with Client number, can not be reused
Efficiency is influenced, is not able to achieve and runs single application example on a server to provide the purpose of service for multiple tenants, therefore
Without real multi-tenant function.
For another example, SparkThriftServer as shown in Figure 1B (hereinafter referred to as technology two) provides a kind of based on Spark
The SQL On Hadoop scheme of SQL query engine, since single SparkThriftServer does not have multi-tenant characteristic, in order to
User can be allowed to access the data for being stored in HDFS corresponding to it, it is necessary to start individual server (server) for it, that is, use
Family User2 cannot achieve through the server of User1 the purpose for accessing oneself resource.Therefore, the program does not have more rents yet
Family characteristic, also, the program increases the complexity of system maintenance by way of a server preset for specific user,
Reduce the concurrent capability and resource utilization of server resource.
Therefore in the prior art, the often mode of two mixed deployment of above-mentioned technology one and technology, but the two can not be real
Existing seamless compatibility, this is very bothersome process.
Thus, it is also very desirable to which a kind of improved data processing method based on Spark SQL is taken with making it through at one
Service can be provided for multiple tenants by running single application example on business device.
In the present context, embodiments of the present invention are intended to provide a kind of data processing method based on Spark SQL
And device, storage medium and calculating equipment.
In the first aspect of embodiment of the present invention, a kind of data processing method based on Spark SQL is provided, is wrapped
It includes: being concentrated described in lookup according to the user name for the proxy user for initiating the session in preset relation in response to the initiation of session
The corresponding Spark context variable example of user name;If not finding the corresponding Spark context variable of the user name
Example then creates Spark context variable corresponding with the user name, and carries out example to the Spark context variable
Change, to form the corresponding Spark context variable example of the user name, and is concentrated in the preset relation and add the user
Corresponding relationship of the name at least between corresponding Spark context variable example;And it is used according to the agency for initiating the session
The corresponding Spark context variable example of the user name at family creates corresponding runtime environment to execute corresponding data processing.
In one embodiment of the invention, the preset relation collection includes: from the use by one or more proxy users
The first set that name in an account book is constituted to reflecting between the second set being made of one or more Spark context variable examples one by one
Penetrate relationship.
In another embodiment of the present invention, the preset relation collection includes: from by one or more proxy users
The first set that user name is constituted is to the mapping relations one by one between third set;Wherein, the third set include one or
Multiple elements, each element of the third set include a Spark context variable example and with the Spark context
The corresponding connection number of variable instance.
In yet another embodiment of the present invention, the mapping relations one by one are the HashMap buildings based on thread-safe
's.
In yet another embodiment of the present invention, the described user name at least with corresponding Spark context variable example
Between corresponding relationship include: corresponding relationship between the user name and corresponding Spark context variable example.
In yet another embodiment of the present invention, the described user name at least with corresponding Spark context variable example
Between corresponding relationship include: the user name and corresponding Spark context variable example and the Spark context variable example
Corresponding relationship between corresponding connection number.
In yet another embodiment of the present invention, this method further include: if finding the proxy user for initiating the session
The corresponding Spark context variable example of user name, by the preset relation concentrate and the Spark context variable example phase
The connection number answered is updated to current connection number and adds 1 resulting value.
In yet another embodiment of the present invention, this method further include: when the session is closed, the session will be initiated
The corresponding corresponding connection number of Spark context variable example of proxy user be updated to the current connection number resulting value that subtracts 1.
In yet another embodiment of the present invention, this method further include: periodically or in response to the session pass
It closes, the occupied resource of Spark context variable example is recycled according to LRU principle.
It is described occupied to Spark context variable example according to LRU principle in yet another embodiment of the present invention
The step of resource is recycled includes: to judge that the preset relation is concentrated with the presence or absence of wherein Spark context variable example phase
The corresponding relationship that the connection number answered is 0 is concentrated in the preset relation and is deleted when there are the corresponding relationship of the connection number 0
The corresponding relationship of the connection number 0, and the corresponding occupied money of Spark context variable of corresponding relationship for discharging the connection number 0
Source.
In yet another embodiment of the present invention, shared by same proxy user in the session that different clients are initiated same
A Spark context variable example.
It is described to execute corresponding data processing including executing corresponding data query in yet another embodiment of the present invention
Processing.
In yet another embodiment of the present invention, the step of creation corresponding runtime environment includes: creation
Driver RPC communication environment.
In yet another embodiment of the present invention, the step of creation corresponding runtime environment includes: to resource pipe
It manages device and submits resource request, it is corresponding to correspond to acquisition in queue in the proxy user for initiating the session by resource manager
Computing resource, and start the actuator with computing resource binding.
It is corresponding in the user name for searching the proxy user for initiating the session in yet another embodiment of the present invention
Before the step of Spark context variable example, further includes: if the authentication information for initiating the proxy user of the session is invalid,
Terminate the processing of the session.
It is corresponding in the user name for searching the proxy user for initiating the session in yet another embodiment of the present invention
Before the step of Spark context variable example, further includes: if the proxy user for initiating the session is not the starting service
The providers of credit of the process user of device terminates the processing to the session.
In the second aspect of embodiment of the present invention, a kind of storage medium for being stored with program, described program are provided
The above-mentioned data processing method based on Spark SQL is realized when being executed by processor.
In the third aspect of embodiment of the present invention, a kind of data processing equipment based on Spark SQL is provided, is wrapped
Include: searching unit is adapted for the initiation of session, according to the user name for the proxy user for initiating the session, closes default
The corresponding Spark context variable example of the user name is searched in assembly;Processing unit, if suitable for not finding the user
The corresponding Spark context variable example of name, then create Spark context variable corresponding with the user name, and right
The Spark context variable is instantiated, to form the corresponding Spark context variable example of the user name, and
The preset relation concentrates the corresponding relationship for adding the user name at least between corresponding Spark context variable example;
And execution unit, suitable for the corresponding Spark context variable example of user name according to the proxy user for initiating the session,
Corresponding runtime environment is created to execute corresponding data processing.
In one embodiment of the invention, the preset relation collection includes: from the use by one or more proxy users
The first set that name in an account book is constituted to reflecting between the second set being made of one or more Spark context variable examples one by one
Penetrate relationship.
In another embodiment of the present invention, the preset relation collection includes: from by one or more proxy users
The first set that user name is constituted is to the mapping relations one by one between third set;Wherein, the third set include one or
Multiple elements, each element of the third set include a Spark context variable example and with the Spark context
The corresponding connection number of variable instance.
In yet another embodiment of the present invention, the mapping relations one by one are the HashMap buildings based on thread-safe
's.
In yet another embodiment of the present invention, the described user name at least with corresponding Spark context variable example
Between corresponding relationship include: corresponding relationship between the user name and corresponding Spark context variable example.
In yet another embodiment of the present invention, the described user name at least with corresponding Spark context variable example
Between corresponding relationship include: the user name and corresponding Spark context variable example and the Spark context variable example
Corresponding relationship between corresponding connection number.
In yet another embodiment of the present invention, the processing unit is further adapted for: if finding the generation for initiating the session
The corresponding Spark context variable example of user name for managing user concentrates the preset relation and the Spark context variable
The corresponding connection number of example is updated to current connection number and adds 1 resulting value.
In yet another embodiment of the present invention, the processing unit is further adapted for: when the session is closed, by initiating
State session the corresponding corresponding connection number of Spark context variable example of proxy user be updated to current connection number subtract 1 gained
Value.
In yet another embodiment of the present invention, the processing unit is further adapted for: periodically or in response to the session
Closing, the occupied resource of Spark context variable example is recycled according to LRU principle.
In yet another embodiment of the present invention, the processing unit is suitable for: judging whether the preset relation concentration deposits
The corresponding relationship for being 0 in the wherein corresponding connection number of Spark context variable example, when there are the corresponding of the connection number 0 to close
When being, the corresponding relationship for deleting the connection number 0 is concentrated in the preset relation, and the corresponding relationship for discharging the connection number 0 is corresponding
The occupied resource of Spark context variable.
In yet another embodiment of the present invention, the searching unit is adapted so that same proxy user in different clients
The same Spark context variable example is shared in the session of initiation.
In yet another embodiment of the present invention, corresponding data processing performed by the execution unit includes corresponding
Data query processing.
In yet another embodiment of the present invention, corresponding runtime environment that the execution unit is created: corresponding
Driver RPC communication environment.
In yet another embodiment of the present invention, the execution unit is suitable for creating corresponding operation by handling as follows
When environment: resource request is submitted to resource manager, with corresponding in the proxy user for initiating the session by resource manager
Corresponding computing resource is obtained in queue, and starts the actuator with computing resource binding.
In yet another embodiment of the present invention, the searching unit, which is further adapted for searching in searching unit, initiates the session
Proxy user the corresponding Spark context variable example of user name before, determine the proxy user for initiating the session
Whether authentication information is effective, if the authentication information is invalid, terminates the processing of the session.
In yet another embodiment of the present invention, the searching unit is further adapted for searching the agency's use for initiating the session
Before the step of user name at family corresponding Spark context variable example, determine to initiate the session proxy user whether
It is the providers of credit for starting the process user of the server, if the proxy user for initiating the session is not the starting server
Process user providers of credit, terminate processing to the session.
In the fourth aspect of embodiment of the present invention, a kind of calculating equipment, including above-mentioned storage medium are provided.
The data processing method and device, storage medium and calculating based on Spark SQL of embodiment according to the present invention
Equipment can provide service by running single application example on a server for multiple tenants, realize multi-tenant
Function.
Detailed description of the invention
The following detailed description is read with reference to the accompanying drawings, above-mentioned and other mesh of exemplary embodiment of the invention
, feature and advantage will become prone to understand.In the accompanying drawings, if showing by way of example rather than limitation of the invention
Dry embodiment, in which:
Figure 1A is the illustrative diagram for showing existing HiveServer2 scheme;
Figure 1B is the illustrative diagram for showing existing SparkThriftServer scheme;
Fig. 1 C is the circuit theory schematic diagram for showing existing primary Spark program;
Fig. 1 D is the frame for showing the data processing method and device based on Spark SQL of embodiment according to the present invention
Structural schematic diagram;
Fig. 2 is schematically show the data processing method based on Spark SQL of embodiment according to the present invention one
The flow chart of a exemplary process;
Fig. 3 A is the UML timing diagram for schematically showing the working principle of existing Spark program;
Fig. 3 B is one for schematically showing the data processing method according to an embodiment of the present invention based on Spark SQL
It is preferred that applying the UML timing diagram of exemplary working principle;
Fig. 4 is schematically show the data processing equipment based on Spark SQL of embodiment according to the present invention one
A exemplary structural block diagram;
Fig. 5 is the structural schematic diagram for schematically showing computer according to an embodiment of the invention;
Fig. 6 is the schematic diagram for schematically showing computer readable storage medium according to an embodiment of the invention.
In the accompanying drawings, identical or corresponding label indicates identical or corresponding part.
Specific embodiment
The principle and spirit of the invention are described below with reference to several illustrative embodiments.It should be appreciated that providing this
A little embodiments are used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and be not with any
Mode limits the scope of the invention.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and energy
It is enough that the scope of the present disclosure is completely communicated to those skilled in the art.
One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method
Or computer program product.Therefore, the present disclosure may be embodied in the following forms, it may be assumed that complete hardware, complete software
The form that (including firmware, resident software, microcode etc.) or hardware and software combine.
Embodiment according to the present invention proposes a kind of data processing method based on Spark SQL and device, storage
Medium and calculating equipment.
It is to be appreciated that any number of elements in attached drawing be used to example rather than limit and it is any name all only
For distinguishing, without any restrictions meaning.
Below with reference to several representative embodiments of the invention, the principle and spirit of the present invention are explained in detail.
Summary of the invention
The inventors discovered that Apache Spark included Thrift Server module simply inherits Apache Hive
HiveServer2 module, but other than being promoted in functionally similar and performance, due to Spark overall architecture
Limitation, but having castrated much has the function of practical significance, such as multi-tenant characteristic and High Availabitity characteristic etc..However, looking forward to
Being isolated for the shared and data of resource is generally required under industry grade scene, no multi-tenant characteristic is just unable to satisfy the actual demand.
Meanwhile for a resident service, if the robustness of service will substantially reduce without High Availabitity characteristic.The present invention exists
On the basis of transformation Spark kernel makes it execute more examples, the Thrift of the offer multi-tenant service an of High Availabitity is realized
Server system realizes shared and user data the isolation of resource.
Fig. 1 C shows the framework of a primary Spark program.As shown in Figure 1, driver procedure (Driver
It Program is) Master of Spark program, in original architecture design, Spark context variable (SparkContext) example
It will create its corresponding runtime environment after change, including create Driver RPC communication environment and mentioned to resource manager (YARN)
Resource request etc. is handed over, when instantiating a SparkContext again in Driver process, which can be subsequently supplied
Once, exist in the form of globally unique variable due to the environment, because this latter can cover the former all environment, lead to the former
Environment it is actually unavailable.
It can thus be appreciated that, on the one hand, since Spark core architecture limits, the Driver process of a Spark program can only be right
A SparkContext example is answered, which can request into resource manager (Hadoop YARN) some particular queue
Corresponding computing resource starts corresponding number Executor.These resources are owned by a certain user, cannot be shared by other users,
The data of the user can only be accessed, other users are not available the resource and access its corresponding data;On the other hand, due to
Thrift Server is substantially a Spark program, can be corresponded to when starting the program the corresponding SparkContext of starting and
Corresponding resource, the problems such as due to permission, this program cannot provide service for different users, be only different users
Such a service of deactivation, this framework are clearly unpractical.
The present invention provides a kind of data processing method based on Spark SQL in view of the above problems and device, storage are situated between
Matter and calculating equipment realize the more characteristics of examples of SparkContext by modification Spark core architecture as shown in figure iD,
So that multiple non-interfering SparkContext can be instantiated in Driver process, and the operation of these SparkContext
Environment is mutually isolated by granularity of user, each SparkContext to resource manager go user correspond to obtained in queue it is corresponding
Computing resource, and start the Executor computing resource bound with it.
In addition, as shown in figure iD, based on " the more examples of SparkContext realize Multi Spark Thrift Server "
Method, Server and SparkContext can be decoupled first, starting service itself when server starts, without starting
SparkContext and its corresponding computing cluster;Secondly, server can for example be checked when there is corresponding user to initiate session
The user whether have it is corresponding initialize the SparkContext example that finishes, if there is being then multiplexed, if being created without if;When
When user closes session, such as unified recycling can be carried out to SparkContext according to LRU principle by server, guarantee performance with
Balance between resource occupation.
It follows that embodiment of the present invention provide technical solution due to can be based on Hadoop camouflage mechanism, i.e., with
Process user pretends the instantiation process that (doAs) proxy user executes SparkContext, so, should
Cluster just executes all subsequent manipulations with proxy user when SparkContext corresponding operation.In addition, realization <user, fortune
Environment when row>(<user,env>), i.e. the mapping one by one of proxy user and runtime environment is corresponding by user
The corresponding running environment variable storage of SparkContext example is into the mapping.
Wherein, process user for example refers to that the user of launching process or the login (Login) under Kerberos environment use
Family.The user's name that proxy user for example can be the user information carried in client instance or be specified by configuration item;
In the process of implementation, process user executes correlation function with the identity of user's (proxy user) in process.
After introduced the basic principles of the present invention, lower mask body introduces various non-limiting embodiment party of the invention
Formula.
Illustrative methods
In the following, being described with reference to Figure 2 the data processing side based on Spark SQL of illustrative embodiments according to the present invention
Method.
Fig. 2 schematically shows one kind according to the data processing method based on Spark SQL of the embodiment of the present disclosure
Illustrative process flow 200.
As shown in Fig. 2, step S210 is first carried out after process flow 200 starts.
S210, the initiation in response to session are looked into according to the user name for the proxy user for initiating session in preset relation concentration
Look for SparkContext example corresponding to the user name of the proxy user.
As an example, the user name of proxy user for example including but be not limited to register account number, the pet name, cell-phone number, mailbox number
Or any one of other contact methods.
As an example, preset relation collection can be empty set at the beginning, also can wrap containing pre-stored mapping relations.
As an example, the preset relation collection in embodiment of the present invention for example may include from first set to second set
Mapping relations one by one.Wherein, first set is for example made of the user name of one or more proxy users, and second set example
The mapping relations one by one between second set being such as made of one or more SparkContext examples.In other words, at this
In example, preset relation collection for example may include multiple corresponding relationships, the user name of an each corresponding relationship i.e. proxy user
Corresponding relationship between corresponding SparkContext example.
As an example, the preset relation collection in embodiment of the present invention for example also may include: from by one or more generations
First set that the user name of user is constituted is managed to the mapping relations one by one between third set;Wherein, third set includes one
A or multiple elements, each element of third set are real including a SparkContext example and with the SparkContext
The corresponding connection number of example.In other words, in this example, preset relation collection for example may include multiple corresponding relationships, each right
It should be related to the corresponding SparkContext example of the user name of i.e. one proxy user, the company, SparkContext example institute
The corresponding relationship between connection number (all proxy user quantity connected) connect.
Increase new corresponding relationship in preset relation concentration it should be understood that can according to need, that is, increases one newly
Proxy user the corresponding SparkContext example of user name between corresponding relationship, also can according to need to it
In included corresponding relationship deleted or changed.
Wherein, above-mentioned mapping relations one by one from first set to second set and/or above-mentioned from first set to
The mapping relations one by one of three set are, for example, the HashMap building based on thread-safe.
As an example, the user name corresponding SparkContext example for searching the proxy user for initiating session the step of
Before, if can also include: the proxy user for initiating session authentication information it is invalid, terminate to the processing of the session (following letter
Claim authentication information determination processing).
For example, when there is proxy user to initiate session, first determining whether to initiate the session after the beginning of process flow 200
Whether the authentication information of proxy user is effective: if effectively, it can be according to the user name for the proxy user for initiating the session, pre-
If searching the corresponding Spark context variable example of the user name in set of relations;If invalid, terminate the place to this session
Reason waits the initiation of session next time.
As an example, the user name corresponding SparkContext example for searching the proxy user for initiating session the step of
Before, if can also include: the proxy user of initiation session be the providers of credit for starting the process user of server, terminate to this
The processing (hereinafter referred to as credit determination processing) of session.
For example, when there is proxy user to initiate session, first determining whether to initiate the session after the beginning of process flow 200
Whether proxy user is the providers of credit for starting the process user of server: if so, can be used according to the agency for initiating the session
The user name at family is concentrated in preset relation and searches the corresponding Spark context variable example of the user name;Otherwise, then terminate pair
The processing of this session waits the initiation of session next time.
In another example, the corresponding SparkContext example of user name for initiating the proxy user of session is being searched
The step of before, can simultaneously include above-mentioned authentication information determination processing and credit determination processing, wherein authentication information determine
Processing and credit determination processing successive execution sequence do not limit, can first carry out authentication information determination processing, after awarded
Believe determination processing, vice versa.
For example, when there is proxy user to initiate session, first determining whether to initiate the session after the beginning of process flow 200
Whether the authentication information of proxy user is effective: if the authentication information is effective, continuing to determine that the proxy user for initiating the session is
The providers of credit of the no process user for starting server, when the proxy user is to start the providers of credit of the process user of server
When, it is concentrated according to the user name for the proxy user for initiating the session in preset relation and searches the corresponding Spark or more of the user name
Literary variable instance terminates the place to this session when the proxy user is not to start the providers of credit of the process user of server
Reason;If the authentication information is invalid, terminate the processing to this session.
If the user name for step S220, not finding the proxy user for initiating the session in step S210 is corresponding
SparkContext example then creates SparkContext corresponding with the user name, and carries out example to SparkContext
Change, to form the corresponding SparkContext example of the user name, and preset relation concentrate add the user name at least with it is right
The corresponding relationship between SparkContext example answered.
As an example, corresponding relationship of the user name at least between corresponding SparkContext example for example,
Corresponding relationship between the user name and corresponding SparkContext example.
As an example, corresponding relationship of the user name at least between corresponding SparkContext example for example can also be with
It include: pair between user name connection number corresponding with corresponding SparkContext example and the SparkContext example
It should be related to.
As an example, if the user name for finding the proxy user for initiating the session in step S210 is corresponding
SparkContext example can then skip the processing that step S220 directly executes step S230.
S230, according to initiate session proxy user the corresponding SparkContext example of user name, create it is corresponding
Runtime environment executes corresponding data processing.
As an example, executing corresponding data processing for example including corresponding data query processing of execution etc..
As an example, the step of creating corresponding runtime environment may include: creation Driver RPC communication environment.
As an example, the step of creating corresponding runtime environment also may include: to submit resource to ask to resource manager
Ask, correspond in the proxy user for initiating session by resource manager and obtains corresponding computing resource in queue, and start and
The actuator of computing resource binding.
As an example, if can also include the following steps:, finding the agency for initiating session is used in process flow 200
Preset relation, then is concentrated that SparkContext example is corresponding connects with this by the corresponding SparkContext example of the user name at family
Number is connect to be updated to current connection number and add 1 resulting value.
For example, if finding the corresponding SparkContext example of user name for initiating the proxy user of session, it is assumed that with
Current connection number is n to the SparkContext example accordingly1, then updated connection number is n1+1。
As an example, in process flow 200 can also include the following steps: that session will be initiated when the session is closed
The corresponding corresponding connection number of SparkContext example of proxy user be updated to the current connection number resulting value that subtracts 1.
For example, when the session is closed, it is assumed that current connection number corresponding with the SparkContext example is n2, then more
Connection number after new is n2-1。
As an example, in process flow 200, can also include the following steps: periodically or in response to session pass
It closes, the occupied resource of SparkContext example is recycled according to LRU principle.
For example, can realize as follows " according to LRU principle to the occupied resource of SparkContext example
Recycled " the step of: judge that preset relation is concentrated with the presence or absence of the wherein corresponding connection number of SparkContext example as 0
Corresponding relationship concentrates the corresponding relationship for deleting the connection number 0 in preset relation, and release when there are the corresponding relationship of connection number 0
Put the occupied resource of the corresponding SparkContext of corresponding relationship of the connection number 0.
As an example, sharing the same SparkContext reality in the session that different clients are initiated by same proxy user
Example.
For example, proxy user UAIn client P1One session S of upper initiation1, it is assumed that proxy user UAIt is corresponding
SparkContext example is SCA;In session S1During continuing, proxy user UAAgain in client P2Upper another session of initiation
S2, then session S2With session S1Shared SparkContext example SCA, wherein client is not limited to cell phone client, computer
Client etc..
It is preferred that applying example
Before describing this and preferably applying example, the application scenarios of the prior art are described referring initially to Fig. 3 A.In existing skill
It, can not be normal when there is different user request as soon as a Spark Thrift Server can only serve a user in art
Execute certain requests associated with the data of user.When user needs to access the data of oneself, it is necessary to will belong to the user's
Server is preset.
As shown in Figure 3A, in the prior art, after user 1 starts server, just start SparkContext and example
Change, after instantiating successfully, if user 1 initiates new session, the SparkContext and server of user 1 is bound, and user 1
Subsequent can carry out inquiring etc. operation, but another user 2 then can not by access the SparkContext of the server come into
Row corresponding operating, such as inquiry.
Fig. 3 B shows one of the embodiment of the present invention preferably using example.As shown in Figure 3B, example is preferably applied at this
In, for different user's connection requests, SparkContext can be instantiated by user, the connection request of different user is real
The different SparkContext of exampleization realizes that Server concurrently responds the request from different user.Same subscriber is come from
The connection request of different clients then shares a SparkContext example, these examples all can be in respective resource manager
Effective queue in complete initialization, and the data resource that subsequent access HDFS accumulation layer has permission, to realize multi-tenant.
As shown in Figure 3B, user 1 and user 2 are used as proxy user.
The full instance of one proxy user is described by taking user 1 as an example, user 2 can use similar form, will no longer
It repeats.
In figure 3b, SC caching be the HashMap building based on thread-safe <user name, (SparkContext is real
Example, connection number)>mapping relations (or building<user name, SparkContext example>mapping relations), support key-value pair
Increase, delete, changing, looking into operation.
As shown in Figure 3B, user 1 starts server by process user, and SC caches (SparkContext Cache) example
After changing (Init) success, user 1 initiates new session (being set as session 1), in SC caching corresponding to the middle user name for searching user 1
SparkContext example: if finding, user 1 and the SparkContext example are bound, and current connection number adds 1, i.e., will <
The user name of user 1, the user name of (the corresponding sparkcontext example of user 1, connection number)>be updated to<user 1, (user
1 corresponding sparkcontext example, connection number+1) >;Otherwise, SparkContext corresponding to the user name of user 1 is created
And the SparkContext is instantiated, in the effective situation of token, SparkContext is instantiated successfully, by the user of <user 1
Name, (the corresponding sparkcontext example of user 1,1) > mapping relations are written SC Cache and complete registration, and then user's meeting
Words creation is completed, and then binds user 1 and the SparkContext example.User 1 subsequent can carry out the operation such as inquiring.When
At the end of session 1, session 1 itself is removed, current connection number subtracts 1, i.e., by the user name of <user 1, (user 1 is corresponding
Sparkcontext example, connection number)>it is updated to<the user name of user 1, (the corresponding sparkcontext example of user 1, even
Connect number -1) >.
In addition, as shown in Figure 3B, the preset relation collection when user 2 initiates new session (being set as session 2), in SC caching
SparkContext example corresponding to the middle user name for searching user 2: if finding, user 2 and the SparkContext are real
Example binding, current connection number add 1, i.e., by the user name of<user 2, (the corresponding sparkcontext example of user 2, connection number)>
It is updated to<the user name of user 2, (the corresponding sparkcontext example of user 2, connection number+1)>;Otherwise, create user's 2
SparkContext corresponding to user name simultaneously instantiates the SparkContext, in the effective situation of token, SparkContext
It instantiates successfully, by the user name of<user 2, SC is written in (the corresponding sparkcontext example of user 2,1)>mapping relations
Cache completes registration, and then user conversation creation is completed, and then binds user 2 and the SparkContext example.User
2 subsequent can carry out the operation such as inquiring.At the end of session 2, session 2 itself is removed, current connection number subtracts 1, i.e. general <user 2
User name, the user name of (the corresponding sparkcontext example of user 2, connection number)>be updated to<user 2, (user 2 is corresponding
Sparkcontext example, connection number -1) >.
Further, it is also possible to regularly be recycled etc. to the occupied resource of SparkContext example according to LRU principle.
As shown in Figure 3B, such as above-mentioned function can be realized by SC cache cleaner thread (SC Cache cleaner thread),
The thread is the thread that a cycle of server end starting executes, and is mainly used for judging in SC Cache with the presence or absence of <use
Name in an account book, (SparkContext example, 0) > mapping relations (mapping relations indicate that not active user is being connected to this
SparkContext example), this, then can be recorded complete deletion, and recycle by such mapping relations if it exists
SparkContext。
As can be seen from the above description, the above-mentioned data processing method based on Spark SQL according to an embodiment of the present invention, energy
Enough realize is completed in a manner of user isolation in the Java Virtual Machine (JVM) of the driver of single Spark program
The multiple example type of SparkContext, major embodiment both ways: one is that can meet the other tune of SparkContext thread-level
Degree, it is more efficient than being dispatched in general prior art with process-level;The other is can realize that SparkContext is same
One user sharing improves oncurrent processing ability, and cannot achieve in prior art shared.Wherein, SparkContext line
The scheduling of journey rank refers to, is as shown in Figure 3B a process instance, operates in single JVM, completes to belong in inside
In the instantiation of the SparkContext of different user, another process/JVM is not restarted to realize
(and traditional scheme, due to being limited by Spark framework itself, Yao Shixian similar functions are then for the instantiation of SparkContext
Another process/JVM can only be started to realize).
In some embodiments, the above-mentioned data processing method based on Spark SQL according to an embodiment of the present invention passes through
SparkContext and user bind, and just need to instantiate the example when there is user's request, multiple requests with user can be total to
Enjoy the example;Different users has different examples to bind therewith;SparkContext and server itself realize decoupling, realize
Dynamic scheduling, more efficient reasonable utilization backstage cluster resource.In addition, multi-tenant engine is realized using Spark, with
HiveServer2 compares the query performance that can greatly improve SQL
Exemplary means
After describing the data processing method based on Spark SQL of exemplary embodiment of the invention, next,
It is illustrated with reference to data processing equipment based on Spark SQL of the Fig. 4 to exemplary embodiment of the invention.
Referring to fig. 4, it is schematically shown that the data processing equipment according to an embodiment of the invention based on Spark SQL
Structural schematic diagram, which can be set in terminal device, for example, the device can be set in desktop computer, notes
In the intelligent electronic devices such as type computer, intelligent mobile phone and tablet computer;Certainly, the device of embodiment of the present invention
It can be set in server.The device 400 of embodiment of the present invention may include following component units: searching unit 410, place
Manage unit 420 and execution unit 430.
Searching unit 410 is adapted for the initiation of session, according to the user name for the proxy user for initiating session, pre-
If searching the corresponding SparkContext example of user name in set of relations.
Processing unit 420, if suitable for not finding the corresponding SparkContext example of user name, newly-built and user name
Corresponding SparkContext, and SparkContext is instantiated, to form the corresponding SparkContext of user name
Example, and corresponding relationship of the addition user name at least between corresponding SparkContext example is concentrated in preset relation.
Execution unit 430, suitable for according to initiate session proxy user the corresponding SparkContext example of user name,
Corresponding runtime environment is created to execute corresponding data processing.
As an example, the preset relation collection in embodiment of the present invention for example, from by one or more proxy users
User name constitute first set between the second set being made of one or more SparkContext examples one by one
Mapping relations.
As an example, the preset relation collection in embodiment of the present invention for example also may include: from by one or more generations
First set that the user name of user is constituted is managed to the mapping relations one by one between third set;Wherein, third set includes one
A or multiple elements, each element of third set are real including a SparkContext example and with the SparkContext
The corresponding connection number of example.
As an example, the mapping relations one by one in embodiment of the present invention are, for example, the HashMap structure based on thread-safe
It builds.
As an example, the user name in embodiment of the present invention is at least between corresponding SparkContext example
Corresponding relationship for example, the corresponding relationship between the user name and corresponding SparkContext example.
As an example, the user name in embodiment of the present invention is at least between corresponding SparkContext example
Corresponding relationship for example, the user name and corresponding SparkContext example and this SparkContext example is corresponding connects
Connect the corresponding relationship between number.
As an example, the processing unit 420 in embodiment of the present invention is for example further adapted for: if finding the generation for initiating session
The corresponding SparkContext example of user name for managing user is concentrated preset relation corresponding with the SparkContext example
Connection number is updated to current connection number and adds 1 resulting value.
As an example, the processing unit 420 in embodiment of the present invention is for example further adapted for: when session is closed, will initiate
The corresponding corresponding connection number of SparkContext example of the proxy user of session is updated to the current connection number resulting value that subtracts 1.
As an example, the processing unit 420 in embodiment of the present invention is for example further adapted for: periodically or in response to meeting
The closing of words recycles the occupied resource of SparkContext example according to LRU principle.
As an example, the processing unit 420 in embodiment of the present invention is for example suitable for: judging whether preset relation concentration deposits
The corresponding relationship for being 0 in the wherein corresponding connection number of SparkContext example, when there are the corresponding relationship of connection number 0,
Preset relation concentrates the corresponding relationship for deleting the connection number 0, and the corresponding relationship for discharging the connection number 0 is corresponding
The occupied resource of SparkContext.
As an example, the searching unit 410 in embodiment of the present invention is for example suitable for so that same proxy user is in difference
The same SparkContext example is shared in the session that client is initiated.
As an example, corresponding data processing performed by execution unit 430 in embodiment of the present invention for example including
Corresponding data query processing.
As an example, the corresponding runtime environment that the execution unit 430 in embodiment of the present invention is created for example is wrapped
It includes: corresponding Driver RPC communication environment.
As an example, the execution unit 430 in embodiment of the present invention is for example suitable for creating correspondence by handling as follows
Runtime environment: to resource manager submit resource request, with by resource manager initiate session proxy user pair
It answers and obtains corresponding computing resource in queue, and start the actuator with computing resource binding.
As an example, the searching unit 410 in embodiment of the present invention is for example further adapted for searching hair in searching unit 410
Before the corresponding SparkContext example of user name for playing the proxy user of session, recognizing for the proxy user for initiating session is determined
It whether effective demonstrate,proves information, if the authentication information is invalid, terminates the processing of session.
As an example, the searching unit 410 in embodiment of the present invention is for example further adapted for searching the agency for initiating session
Before the step of user name of user corresponding SparkContext example, whether the proxy user for determining to initiate session is starting
The providers of credit of the process user of server, if the proxy user for initiating session is not the credit for starting the process user of server
Person terminates the processing to session.
It should be noted that each unit in the above-mentioned data processing equipment based on Spark SQL can execute respectively with
The identical processing of each corresponding step in data processing method based on Spark SQL described above, and can reach similar
As function and technical effect, which is not described herein again.
Fig. 5 shows the block diagram for being suitable for the exemplary computer system/server 50 for being used to realize embodiment of the present invention.
The computer system/server 50 that Fig. 5 is shown is only an example, should not function and use scope to the embodiment of the present invention
Bring any restrictions.
As shown in figure 5, computer system/server 50 is showed in the form of universal computing device.Computer system/service
The component of device 50 can include but is not limited to: one or more processor 501, system storage 502, connect not homologous ray group
The bus 503 of part (including system storage 502 and processor 501).
Computer system/server 50 typically comprises a variety of computer system readable media.These media, which can be, appoints
What usable medium that can be accessed by computer system/server 50, including volatile and non-volatile media, it is moveable and
Immovable medium.
System storage 502 may include the computer system readable media of form of volatile memory, such as deposit at random
Access to memory (RAM) 5021 and/or cache memory 5022.Computer system/server 50 may further include it
Its removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, ROM5023 can be with
For reading and writing immovable, non-volatile magnetic media (not showing in Fig. 5, commonly referred to as " hard disk drive ").Although not existing
It is shown in Fig. 5, disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") and right can be provided
The CD drive of removable anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these feelings
Under condition, each driver can be connected by one or more data media interfaces with bus 503.In system storage 502
It may include at least one program product, which has one group of (for example, at least one) program module, these program moulds
Block is configured to perform the function of various embodiments of the present invention.
Program/utility 5025 with one group of (at least one) program module 5024, can store in such as system
In memory 502, and such program module 5024 includes but is not limited to: operating system, one or more application program, its
It may include the realization of network environment in its program module and program data, each of these examples or certain combination.
Program module 5024 usually executes function and/or method in embodiment described in the invention.
Computer system/server 50 can also be with one or more external equipment 504 (such as keyboard, sensing equipment, displays
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 505.Also, computer system/server 50
Network adapter 506 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public affairs can also be passed through
Common network network, such as internet) communication.As shown in figure 5, network adapter 506 passes through bus 503 and computer system/server
50 other modules (such as processor 501) communication.It should be understood that although being not shown in Fig. 5, can in conjunction with computer system/
Server 50 uses other hardware and/or software module.
Processor 501 by the program that is stored in system storage 502 of operation, thereby executing various function application and
Data processing, for example, executing and realizing each step in the data processing method based on Spark SQL;For example, in response to session
Initiation concentrate that search the user name corresponding in preset relation according to the user name for the proxy user for initiating the session
Spark context variable example;If not finding the corresponding Spark context variable example of the user name, create
Spark context variable corresponding with the user name, and the Spark context variable is instantiated, to be formed
State the corresponding Spark context variable example of user name, and the preset relation concentrate add the user name at least with it is right
The corresponding relationship between Spark context variable example answered;And the user name according to the proxy user for initiating the session
Corresponding Spark context variable example creates corresponding runtime environment to execute corresponding data processing.
One specific example of computer readable storage medium of embodiment of the present invention is as shown in Figure 6.
The computer readable storage medium of Fig. 6 is CD 600, is stored thereon with computer program (i.e. program product), should
When program is executed by processor, documented each step in above method embodiment can be realized, for example, in response to the hair of session
It rises, according to the user name for the proxy user for initiating the session, is concentrated in preset relation and search the corresponding Spark of the user name
Context variable example;If not finding the corresponding Spark context variable example of the user name, create with it is described
The corresponding Spark context variable of user name, and the Spark context variable is instantiated, to form the user
The corresponding Spark context variable example of name, and the preset relation concentrate add the user name at least with it is corresponding
Corresponding relationship between Spark context variable example;And it is corresponding according to the user name for the proxy user for initiating the session
Spark context variable example, create corresponding runtime environment to execute corresponding data processing;The specific reality of each step
This will not be repeated here for existing mode.
It should be noted that although being referred to the several of the data processing equipment based on Spark SQL in the above detailed description
Unit, module or submodule, but it is this division be only exemplary it is not enforceable.In fact, according to the present invention
The feature and function of embodiment, two or more above-described modules can embody in a module.Conversely, above
The feature and function of one module of description can be to be embodied by multiple modules with further division.
In addition, although describing the operation of the method for the present invention in the accompanying drawings with particular order, this do not require that or
Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired
As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one
Step is decomposed into execution of multiple steps.
Although detailed description of the preferred embodimentsthe spirit and principles of the present invention are described by reference to several, it should be appreciated that, this
It is not limited to the specific embodiments disclosed for invention, does not also mean that the feature in these aspects cannot to the division of various aspects
Combination is benefited to carry out, this to divide the convenience merely to statement.The present invention is directed to cover appended claims spirit and
Included various modifications and equivalent arrangements in range.
Claims (10)
1. the data processing method based on Spark SQL, characterized by comprising:
It is concentrated in preset relation according to the user name for the proxy user for initiating the session in response to the initiation of session and searches institute
State the corresponding Spark context variable example of user name;
If not finding the corresponding Spark context variable example of the user name, create corresponding with the user name
Spark context variable, and the Spark context variable is instantiated, it is corresponding to form the user name
Spark context variable example, and the preset relation concentrate add the user name at least with corresponding Spark context
Corresponding relationship between variable instance;And
According to the corresponding Spark context variable example of user name for the proxy user for initiating the session, corresponding fortune is created
Environment executes corresponding data processing when row.
2. data processing method according to claim 1, which is characterized in that the preset relation collection includes: from by one
Or the first set that the user name of multiple proxy users is constituted is believed to by the correlation of one or more Spark context variable examples
Cease the mapping relations one by one between the second set constituted.
3. data processing method according to claim 1, which is characterized in that the preset relation collection includes: from by one
Or the first set that the user name of multiple proxy users is constituted is to the mapping relations one by one between third set;
Wherein, the third set includes one or more elements, and each element of the third set includes on a Spark
The hereafter relevant information of variable instance and connection number corresponding with the Spark context variable example.
4. data processing method according to any one of claim 1-3, it is characterised in that further include:
Periodically or in response to the session closing, it is occupied to Spark context variable example according to LRU principle
Resource is recycled.
5. data processing method according to any one of claim 1-3, which is characterized in that by same proxy user not
The same Spark context variable example is shared in the session initiated with client.
6. data processing method according to any one of claim 1-3, which is characterized in that initiate the session searching
Proxy user user name corresponding Spark context variable example the step of before, further includes: if initiating the session
The authentication information of proxy user is invalid, terminates the processing of the session.
7. data processing method according to any one of claim 1-3, which is characterized in that initiate the session searching
Proxy user user name corresponding Spark context variable example the step of before, further includes: if initiating the session
Proxy user is not the providers of credit for starting the process user of the server, terminates the processing to the session.
8. the data processing equipment based on Spark SQL, characterized by comprising:
Searching unit is adapted for the initiation of session, according to the user name for the proxy user for initiating the session, closes default
The corresponding Spark context variable example of the user name is searched in assembly;
Processing unit, if suitable for not finding the corresponding Spark context variable example of the user name, newly-built and institute
The corresponding Spark context variable of user name is stated, and the Spark context variable is instantiated, to form the use
The corresponding Spark context variable example of name in an account book, and the preset relation concentrate add the user name at least with it is corresponding
Corresponding relationship between Spark context variable example;And
Execution unit, suitable for the corresponding Spark context variable example of user name according to the proxy user for initiating the session,
Corresponding runtime environment is created to execute corresponding data processing.
9. a kind of storage medium for being stored with program was realized when described program is executed by processor such as appointing in claims 1 to 7
Data processing method based on Spark SQL described in one.
10. a kind of calculating equipment, including storage medium as claimed in claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811214789.5A CN109408537A (en) | 2018-10-18 | 2018-10-18 | Data processing method and device, storage medium and calculating equipment based on Spark SQL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811214789.5A CN109408537A (en) | 2018-10-18 | 2018-10-18 | Data processing method and device, storage medium and calculating equipment based on Spark SQL |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109408537A true CN109408537A (en) | 2019-03-01 |
Family
ID=65468435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811214789.5A Pending CN109408537A (en) | 2018-10-18 | 2018-10-18 | Data processing method and device, storage medium and calculating equipment based on Spark SQL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408537A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889108A (en) * | 2019-11-26 | 2020-03-17 | 网易(杭州)网络有限公司 | spark task submitting method and device and server |
CN111031123A (en) * | 2019-12-10 | 2020-04-17 | 中盈优创资讯科技有限公司 | Spark task submission method, system, client and server |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104360903A (en) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Method for realizing task data decoupling in spark operation scheduling system |
CN106126641A (en) * | 2016-06-24 | 2016-11-16 | 中国科学技术大学 | A kind of real-time recommendation system and method based on Spark |
KR101718119B1 (en) * | 2016-04-22 | 2017-03-21 | 숭실대학교산학협력단 | System and Method for processing SPARQL queries based on Spark SQL |
CN106844546A (en) * | 2016-12-30 | 2017-06-13 | 江苏号百信息服务有限公司 | Multi-data source positional information fusion method and system based on Spark clusters |
CN107015989A (en) * | 2016-01-27 | 2017-08-04 | 博雅网络游戏开发(深圳)有限公司 | Data processing method and device |
CN107797874A (en) * | 2017-10-12 | 2018-03-13 | 南京中新赛克科技有限责任公司 | A kind of resource management-control method based on embedded jetty and spark on yarn frameworks |
CN108153859A (en) * | 2017-12-24 | 2018-06-12 | 浙江工商大学 | A kind of effectiveness order based on Hadoop and Spark determines method parallel |
CN108255619A (en) * | 2017-12-28 | 2018-07-06 | 新华三大数据技术有限公司 | A kind of data processing method and device |
CN108664662A (en) * | 2018-05-22 | 2018-10-16 | 上海交通大学 | Time travel and tense aggregate query processing method |
-
2018
- 2018-10-18 CN CN201811214789.5A patent/CN109408537A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104360903A (en) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Method for realizing task data decoupling in spark operation scheduling system |
CN107015989A (en) * | 2016-01-27 | 2017-08-04 | 博雅网络游戏开发(深圳)有限公司 | Data processing method and device |
KR101718119B1 (en) * | 2016-04-22 | 2017-03-21 | 숭실대학교산학협력단 | System and Method for processing SPARQL queries based on Spark SQL |
CN106126641A (en) * | 2016-06-24 | 2016-11-16 | 中国科学技术大学 | A kind of real-time recommendation system and method based on Spark |
CN106844546A (en) * | 2016-12-30 | 2017-06-13 | 江苏号百信息服务有限公司 | Multi-data source positional information fusion method and system based on Spark clusters |
CN107797874A (en) * | 2017-10-12 | 2018-03-13 | 南京中新赛克科技有限责任公司 | A kind of resource management-control method based on embedded jetty and spark on yarn frameworks |
CN108153859A (en) * | 2017-12-24 | 2018-06-12 | 浙江工商大学 | A kind of effectiveness order based on Hadoop and Spark determines method parallel |
CN108255619A (en) * | 2017-12-28 | 2018-07-06 | 新华三大数据技术有限公司 | A kind of data processing method and device |
CN108664662A (en) * | 2018-05-22 | 2018-10-16 | 上海交通大学 | Time travel and tense aggregate query processing method |
Non-Patent Citations (2)
Title |
---|
KENT_YAO: "A Brief Introduction of Kyuubi Architecture", 《HTTPS://WWW.JIANSHU.COM/P/B046A623F038》 * |
缪雪峰等: "Spark平台下基于上下文信息的影片混合推荐", 《计算机工程与应用》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889108A (en) * | 2019-11-26 | 2020-03-17 | 网易(杭州)网络有限公司 | spark task submitting method and device and server |
CN110889108B (en) * | 2019-11-26 | 2022-02-08 | 网易(杭州)网络有限公司 | spark task submitting method and device and server |
CN111031123A (en) * | 2019-12-10 | 2020-04-17 | 中盈优创资讯科技有限公司 | Spark task submission method, system, client and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7349970B2 (en) | Workload management of stateful program entities | |
JP6188732B2 (en) | Computer-implemented method, computer program product, and system for managing tenant-specific data sets in a multi-tenant environment | |
US11853291B2 (en) | Privacy preserving architecture for permissioned blockchains | |
CN111901294A (en) | Method for constructing online machine learning project and machine learning system | |
CN105681104B (en) | Method, system and the computer readable storage devices of network and machine are managed for online service | |
US10164896B2 (en) | Cloud-based content management system | |
US8145593B2 (en) | Framework for web services exposing line of business applications | |
JP2012504262A (en) | Distributed cache placement | |
US10970311B2 (en) | Scalable snapshot isolation on non-transactional NoSQL | |
CN106569896A (en) | Data distribution and parallel processing method and system | |
Picco et al. | On global virtual data structures | |
CN109408537A (en) | Data processing method and device, storage medium and calculating equipment based on Spark SQL | |
US9229980B2 (en) | Composition model for cloud-hosted serving applications | |
CN112104504B (en) | Transaction management framework for large-scale resource access, design method and cloud platform | |
Wrzeszcz et al. | New approach to global data access in computational infrastructures | |
Hellings et al. | Byshard: Sharding in a byzantine environment | |
US11153388B2 (en) | Workflow engine framework for cross-domain extension | |
Carstoiu et al. | High performance eventually consistent distributed database Zatara | |
US11507512B2 (en) | Fault tolerant cluster data handling | |
CN111680069B (en) | Database access method and device | |
CN109947768A (en) | Local identifier for database object | |
CN113760822A (en) | HDFS-based distributed intelligent campus file management system optimization method and device | |
CN106878414B (en) | Data write request processing method, device and distributed data-storage system | |
Carstoiu et al. | Zatara, the Plug-in-able Eventually Consistent Distributed Database | |
US11914556B2 (en) | Lazy virtual filesystem instantiation and caching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |