CN107547541B

CN107547541B - spark-mllib calling method, storage medium, electronic device and system

Info

Publication number: CN107547541B
Application number: CN201710771009.6A
Authority: CN
Inventors: 王毅; 张文明; 陈少杰
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2020-07-31
Anticipated expiration: 2037-08-31
Also published as: CN107547541A

Abstract

The invention discloses a spark-mlib calling method, a storage medium, electronic equipment and a spark-mlib calling system, and relates to the field of spark-mlib calling. The method comprises the following steps: when receiving the learning request, the server sends the learning request to at least 2 akka-http arranged on the server; when the akka-http receives the learning request, comparing the url of the learning request with the route of a prearranged spark-mlib program, and calling the spark-mlib program corresponding to the route matched with the learning request to run if the route matched with the learning request exists; after the spark-mlib program trains the model, the format of the model prediction result is converted into vector by the output model prediction result. The method and the device can directly return the trained model prediction result to the user, thereby realizing millisecond response and remarkably improving user experience.

Description

spark-mllib calling method, storage medium, electronic device and system

Technical Field

The invention relates to the field of spark-mllib (machine learning library of distributed machine learning algorithm) calling, in particular to a spark-mllib calling method, a storage medium, electronic equipment and a spark-mllib calling system.

Background

Currently, the method for a user to perform machine learning operation by using spark-mllib program generally includes: and the user initiates a learning request to the server, and the server runs a spark-mllib program training model corresponding to the learning request in a thread calling mode to obtain a model prediction result.

The above method has the following disadvantages:

(1) the server runs the spark-mllib program in a thread calling mode, and when the number of learning requests is large, each learning request occupies 1 thread, so that the memory occupancy rate and the load of the server are greatly improved, and the working efficiency of the server is reduced.

(2) After the spark-mllib program trains the model, the format of the output model prediction result is an RDD (flexible distributed data sets) or Dataframe data frame, the 2 types of the formats cannot be directly identified, and a user needs to rely on a special code or a third-party tool to convert the model prediction results of the 2 types of the formats into a directly identified model prediction result, so that the user experience is reduced.

Disclosure of Invention

Aiming at the defects in the prior art, the invention solves the technical problems that: how to obtain model predictions that can be directly identified. The method and the device can directly return the trained model prediction result to the user, thereby realizing millisecond response and remarkably improving user experience.

In order to achieve the above object, the spark-mllib calling method provided by the invention comprises the following steps:

s1: when receiving the learning request, the server sends the learning request to at least 2 akka-http arranged on the server, and goes to S2;

s2: comparing the url of the learning request with the route of a prearranged spark-mllib program when the akka-http receives the learning request, calling the spark-mllib program corresponding to the route matched with the learning request to operate if the route matched with the learning request exists, and turning to S3;

s3: and after training the model by using the spark-mllib program, outputting a model prediction result, and converting the format of the model prediction result into a vector.

Based on the above technical solution, in S2, the actor system service created in advance in the server calls the spark-mllib program corresponding to the route matching the learning request to operate.

The storage medium provided by the invention is stored with a computer program, and the computer program realizes the spark-mllib calling method when being executed by a processor.

The electronic equipment provided by the invention comprises a memory and a processor, wherein a computer program running on the processor is stored in the memory, and the spark-mllib calling method is realized when the processor executes the computer program.

The spark-mllib calling system provided by the invention comprises a learning request forwarding module, at least 2 akka-http modules and a plurality of spark-mllib program training modules, wherein the learning request forwarding module is arranged in a server;

the learning request forwarding module is used for: when a learning request is received, the learning request is sent to each akka-http module;

the akka-http module is used to: when a learning request is received, comparing the url of the learning request with a route of a prearranged spark-mllib program, and calling a spark-mllib program training module corresponding to the route matched with the learning request to run if the route matched with the learning request exists;

the spark-mllib program training module is used to: and after a spark-mllib program is called to train the model, outputting a model prediction result, and converting the format of the model prediction result into a vector.

On the basis of the technical scheme, the akka-http module calls a spark-mllib program corresponding to a route matched with the learning request to operate through an actor System service pre-created in the server.

Compared with the prior art, the invention has the advantages that:

(1) referring to S3 of the present invention, the model prediction result of the present invention is in vector format, and the user can directly obtain and identify the model prediction result in vector format without relying on a special code or a third-party tool. Therefore, the trained model prediction result can be directly returned to the user, millisecond-level response (the time required for directly returning the model prediction result is measured in milliseconds) is further realized, and the user experience is remarkably improved.

(2) It can be known from S2 that, in the prior art, the method of calling and running the spark-mllib program by using a higher-level and higher-level actor system service instead of calling and running the spark-mllib program by using a thread call is avoided, so that the memory occupancy rate and load of the server are significantly improved, and the working efficiency of the server is greatly improved.

Drawings

FIG. 1 is a flow chart of a spark-mllib calling method according to an embodiment of the present invention;

fig. 2 is a connection block diagram of an electronic device in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Referring to fig. 1, the spark-mllib calling method in the embodiment of the present invention includes the following steps:

s1: when receiving the learning request, the server sends the learning request to at least 2 akka-http (a toolbox for generating and providing or consuming http-based network services) installed on the server in a polling manner, and goes to S2.

The server in S1 is nginx (engine x, high-performance HTTP and reverse proxy server), and nginx can adapt to HTTP learning requests with high concurrency, thereby improving the working performance and quality of the server.

The process of sending the learning request to at least 2 akka-http in S1 is as follows: adding an upstream node in the configuration file of nginx, and forming proxy _ pass (forwarding path) of nginx according to the unique identifier of each akka-http and the upstream node, wherein the forwarding path is as follows: the unique identification + upstream of http:// akka-http; and sending a learning request according to proxy _ pass.

S2: akka-http compares the url (uniform resource locator) of the learning request with the routes of the preconfigured spark-mllib program (the spark-mllib program in the server has multiple types and respectively corresponds to different service requirements) when receiving the learning request, and if a route matching the learning request exists, calls the spark-mllib program corresponding to the route matching the learning request to run through an actor system service (the actor system service itself is the prior art, and a specific creation process and a subsequent calling process are not described herein), and goes to S3.

In order for akka-http to receive the learning request, S2 may further include the following steps: and binding the ip and the port of the server to akka-http through the ActorSystems service, and confirming that the learning request is received when the akka-http monitors the learning request of the port of the server.

It can be known from S2 that, in the embodiment of the present invention, a mode of invoking and running a spark-mllib program through a thread in the prior art is avoided, and instead, a higher-level and higher-level actor system service is adopted to invoke the spark-mllib program to run, so that the memory occupancy rate and load of the server are significantly improved, and the working efficiency of the server is greatly improved.

S3: after the spark-mllib program trains the model, a model prediction result is output, and the format of the model prediction result is converted into a vector (an object array capable of realizing automatic growth in java).

It can be known from S3 that the model prediction result in the embodiment of the present invention is in the vector format, and a user can directly acquire and identify the model prediction result in the vector format without relying on a dedicated code or a third-party tool. Therefore, the trained model prediction result can be directly returned to the user, millisecond-level response (the time required for directly returning the model prediction result is measured in milliseconds) is further realized, and the user experience is remarkably improved.

The embodiment of the invention also provides a storage medium, wherein a computer program is stored on the storage medium, and when being executed by a processor, the computer program realizes the spark-mllib calling method. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), a magnetic disk, or an optical disk.

Referring to fig. 2, an embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program running on the processor, and the processor implements the spark-mllib calling method when executing the computer program.

The spark-mllib calling system in the embodiment of the invention comprises a learning request forwarding module, at least 2 akka-http modules and a plurality of spark-mllib program training modules, wherein the learning request forwarding module is arranged in a nginx server.

The learning request forwarding module is used for: when a learning request is received, the learning request is sent to each akka-http module, and the specific flow is as follows: adding an upstream node in the configuration file of nginx, and forming proxy _ pass of nginx according to the unique identifier of each akka-http module and the upstream node; and sending a learning request according to proxy _ pass.

The akka-http module is used to: and when the learning request is received, comparing the url of the learning request with the route of a prearranged spark-mllib program, and calling a spark-mllib program training module corresponding to the route matched with the learning request to run through an actor System service which is pre-established in a server side if the route matched with the learning request exists.

It should be noted that: in the system provided in the embodiment of the present invention, when performing inter-module communication, only the division of each functional module is illustrated, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the system is divided into different functional modules to complete all or part of the above described functions.

Further, the present invention is not limited to the above-mentioned embodiments, and it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements are also considered to be within the scope of the present invention. Those not described in detail in this specification are within the skill of the art.

Claims

1. A spark-mllib calling method, comprising the steps of:

s3: after training the model by using a spark-mllib program, outputting a model prediction result, and converting the format of the model prediction result into a vector;

in S2, the actor system service created in advance in the server calls the spark-mllib program corresponding to the route matching the learning request to operate.

2. The spark-mlllib calling method as recited in claim 1, wherein: s1, the service end is nginx.

3. The spark-mlllib calling method as recited in claim 2, wherein: the process of sending the learning request to at least 2 akka-http in S1 includes: adding an upstream node in the configuration file of nginx, and forming proxy _ pass of nginx according to the unique identifier of each akka-http and the upstream node; and sending a learning request according to proxy _ pass.

4. A computer-readable storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, implements the method of any of claims 1 to 3.

5. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program for execution on the processor, the processor when executing the computer program implementing the method of any of claims 1 to 3.

6. A spark-mllib invocation system, characterized by: the system comprises a learning request forwarding module, at least 2 akka-http modules and a plurality of spark-mllib program training modules, wherein the learning request forwarding module is arranged in a server;

the akka-http module is used to: when a learning request is received, comparing the url of the learning request with the route of a prearranged spark-mllib program, and calling the spark-mllib program corresponding to the route matched with the learning request to run if the route matched with the learning request exists;

the spark-mllib program training module is used to: after a spark-mllib program training model is called, a model prediction result is output, and the format of the model prediction result is converted into a vector;

and the akka-http module calls a spark-mllib program corresponding to the route matched with the learning request to run through an actor System service pre-created in the server.

7. The spark-mlllib calling system as recited in claim 6, wherein: the server side is nginx.

8. The spark-mlllib calling system as recited in claim 7, wherein: the process that the learning request forwarding module sends the learning request to each akka-http module comprises the following steps: adding an upstream node in the configuration file of nginx, and forming proxy _ pass of nginx according to the unique identifier of each akka-http module and the upstream node; and sending a learning request according to proxy _ pass.