CN111176787B

CN111176787B - Data analysis method and device

Info

Publication number: CN111176787B
Application number: CN201911337325.8A
Authority: CN
Inventors: 李威; 覃鹏; 刘增文; 吴秦明; 叶长全
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2023-07-28
Anticipated expiration: 2039-12-23
Also published as: CN111176787A

Abstract

The invention discloses a data analysis method and a data analysis device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server; creating an online analysis processing type database in each virtualized memory; analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result. This embodiment enables efficient data analysis at a lower cost, reducing the time consumed for analysis.

Description

Data analysis method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data analysis method and apparatus.

Background

The existing data analysis methods are two, one is based on ETL and a distributed database, analysis is carried out in a running batch mode, and the other is that data analysis is realized through a data cube technology.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

both of the above data analysis methods require a long time to be consumed and have low analysis efficiency. And the cost of the data analysis method, which consumes a short time, is high.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a data analysis method and apparatus, which can implement efficient data analysis with low cost and reduce the time consumed for analysis.

To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a data analysis method.

The data analysis method of the embodiment of the invention comprises the following steps:

for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server;

creating an online analysis processing type database in each virtualized memory;

analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result.

In one embodiment, the method further comprises:

if the first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted;

wherein the first server is any one server in the server set.

In one embodiment, the automatic mounting of the sub-memory includes:

virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;

and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.

In one embodiment, partitioning the virtualized memory into a plurality of sub-memories includes:

the virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is the same.

In one embodiment, analyzing the data to be analyzed through a local catalog of a server and an online analysis processing type database to obtain an analysis result comprises:

storing the data to be analyzed into an online analysis processing database through a local catalog of a server;

and analyzing the data to be analyzed stored in the online analysis processing type database by calling the online analysis processing type database and adopting an online analysis processing method to obtain the analysis result.

To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided a data analysis apparatus.

The data analysis device of the embodiment of the invention comprises:

the first processing unit is used for virtualizing the local memory of the server to obtain a virtualized memory for each server in the server set, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to the local directory of the server;

the creation unit is used for creating an online analysis processing type database in each virtualized memory;

and the second processing unit is used for analyzing the data to be analyzed through the local catalog of the server and the online analysis processing database to obtain an analysis result.

In one embodiment, the first processing unit is configured to:

wherein the first server is any one server in the server set.

In one embodiment, the first processing unit is configured to:

In one embodiment, the second processing unit is configured to:

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus.

An electronic device according to an embodiment of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data analysis method provided by the embodiment of the invention.

To achieve the above object, according to still another aspect of an embodiment of the present invention, a computer-readable medium is provided.

A computer readable medium of an embodiment of the present invention stores a computer program thereon, which when executed by a processor implements the data analysis method provided by the embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: creating an online analysis processing type database in each virtualized memory; the data to be analyzed is analyzed through the local catalog of the server and the online analysis processing type database, and the read-write speed of the memory is obviously higher than that of the disk, so that the analysis efficiency is improved, and the user experience is improved. The virtualized memory is obtained by virtualizing the local memory of the server, and the cost of the server is low, so that the data analysis is realized efficiently at lower cost, and the time consumed by the analysis is reduced. The virtualized memory is divided into a plurality of sub memories and mounted, so that the reading and writing speed of the memory is further improved, the analysis efficiency is further improved, and the time consumed by analysis is further reduced.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main flow of a data analysis method according to an embodiment of the present invention;

FIG. 2 is an application scenario of a data analysis method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of main units of a data analysis apparatus according to an embodiment of the present invention;

FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 5 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict.

In order to solve the problems in the prior art, an embodiment of the present invention provides a data analysis method, as shown in fig. 1, including:

step S101, for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server.

In this step, the local memory of the server is virtualized by using the existing virtualization technology, and a virtualized memory is obtained. And mounting the sub-memory to a local directory of the server by adopting the existing mounting technology. The existing mounting technology may be a linux mounting technology.

As shown in fig. 2, this step is described below with a specific example:

the server set includes a server 1, a server 2, and a server 3.

The local memory of the server 1 is virtualized to obtain a virtualized memory, the virtualized memory is divided into 8 sub-memories, and each sub-memory in the 8 sub-memories is mounted to a local directory of the server 1.

The local memory of the server 2 is virtualized to obtain a virtualized memory, the virtualized memory is divided into 8 sub-memories, and each sub-memory in the 8 sub-memories is mounted to a local directory of the server 2.

The local memory of the server 3 is virtualized to obtain a virtualized memory, the virtualized memory is divided into 8 sub-memories, and each sub-memory in the 8 sub-memories is mounted to a local directory of the server 3.

The following description will take the server 1 as an example:

1.1 editing the Sudoers (/ etc/Sudoers) file so that non-root users mount partitions (root users execute)

Executing a whereeis mount/umcount view mount, umcount command location

Executing command, visual, add to the following configuration

Cmnd_Alias MOUNT＝/bin/mount,/bin/umount

omm ALL＝NOPASSWD:MOUNT

1.2 New ADB database installation user omm (root user execution) (Standby)

useradd-d/home/omm-m omm

passwd omm

1.3 sub memory mount (omm user execution)

1.3.1 New child memory mounted directory, here taking 8 child memory directories as an example

mkdir-p/home/omm/gpdata/gpmaster

mkdir-p/home/omm/gpdata/gpdatap1

mkdir-p/home/omm/gpdata/gpdatap2

mkdir-p/home/omm/gpdata/gpdatap3

mkdir-p/home/omm/gpdata/gpdatap4

mkdir-p/home/omm/gpdata/gpdatap5

mkdir-p/home/omm/gpdata/gpdatap6

mkdir-p/home/omm/gpdata/gpdatap7

mkdir-p/home/omm/gpdata/gpdatap8

1.3.2 mounting memory

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpmaster

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap1

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap2

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap3

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap4

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap5

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap6

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap7

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap8

sudo umount/home/omm/gpdata/gpmaster

sudo umount/home/omm/gpdata/gpdatap1

sudo umount/home/omm/gpdata/gpdatap2

sudo umount/home/omm/gpdata/gpdatap3

sudo umount/home/omm/gpdata/gpdatap4

sudo umount/home/omm/gpdata/gpdatap5

sudo umount/home/omm/gpdata/gpdatap6

sudo umount/home/omm/gpdata/gpdatap7

sudo umount/home/omm/gpdata/gpdatap8

Thus, the mounting of the sub-memory is completed.

Step S102, an online analysis processing type database is created in each virtualized memory.

In this step, when implemented, the method includes:

2.1 uploading an open Source MPPDB database installation package greenplus-db-4.3.5.2-build-1-RHEL 5-x86_64.Zip to a gp Master node Server (the Master node Server is one of a set of servers).

2.2 decompressing the installation package unopplum-db-4.3.5.2-build-1-RHEL 5-x86_64.Zip to obtain grepplum-db-4.3.5.2-build-1-RHEL 5-x86_64.Bin and READMEM_INSTALL

The system parameters are set according to READM_INSTALL.

2.3 modifying the hosts file, adding node IP configuration to facilitate the communication between the GP nodes (root user execution, all nodes)

vi/etc/hosts

IP1 ADB01

IP2 ADB02

IP3 ADB03

2.4 modifying System Kernel File (root user execution, all nodes)

vi/etc/sysctl. Conf, add or modify the following configuration

kernel.shmmax＝500000000

kernel.shmmni＝4096

kernel.shmall＝4000000000

kernel.sem＝250 512000 100 2048

kernel.sysrq＝1

kernel.core_uses_pid＝1

kernel.msgmnb＝65536

kernel.msgmax＝65536

kernel.msgmni＝2048

net.ipv4.tcp_syncookies＝1

net.ipv4.ip_forward＝0

net.ipv4.conf.default.accept_source_route＝0

net.ipv4.tcp_tw_recycle＝1

net.ipv4.tcp_max_syn_backlog＝4096

net.ipv4.conf.all.arp_filter＝1

net.ipv4.ip_local_port_range＝1025 65535

net.core.netdev_max_backlog＝10000

net.core.rmem_max＝2097152

net.core.wmem_max＝2097152

vm.overcommit_memory＝2

Finally, let the configuration take effect and input command sysctl-p

2.5 modifying Linux maximum connection number (root user execution, all nodes)

vi/etc/security/limits. Conf, newly added to the following

*soft nofile 65536

*hard nofile 65536

*soft nproc 131072

*hard nproc 131072

2.6 installation (omm user execution, master node)

Executing a command: the/bin/flash greenplus-db-4.3.5.2-build-1-RHEL 5-x86_64.Bin

The following installation options appear after the carriage return reading clause

2.7 child node (child node means any one of the servers in the server set except the master node server) installation

gpseginstall-f/home/omm/greenplum-db/etc/hostlist-u omm-p omm

2.8 Cluster System initialization (ensuring Firewall has been closed before installation)

gpinitsystem-c/home/omm/greenplum-db/etc/gpinitsystem_config-h/home/omm/greenplum-db/etc/seg_hosts

Then psql-h IP1-U omm sordb can use the cluster.

It should be noted that the online analysis processing database (also called OLAP database) may be an MPPDB, which is a distributed parallel database with Shared notification architecture, and has the characteristics of high performance, high availability and high expansion, so that it may provide a general purpose computing platform with high cost performance for ultra-large scale data management, and is widely used for supporting various data warehouse systems, BI systems and decision support systems.

Specifically, the installation package of the MPPDB is placed in one server in the server set, and the installation package of the MPPDB is respectively sent to each server except the one server in the server set by the one server. For each server in a set of servers, the server installs the installation package of the MPPDB on the server, modifies the configuration file, and initializes. The specific operation process is described in the specification of 2.1 to 2.8. In addition, the number of servers in the server set may be set according to requirements.

It should be noted that, the virtualized memory of each server in the server set creates an online analysis processing database.

In addition, databases include OLTP type databases (e.g., gemfire (geode)) and OLAP type databases. The embodiment of the invention is suitable for an OLAP type database and is not suitable for an OLTP type database.

And step S103, analyzing the data to be analyzed through a local catalog of the server and an online analysis processing database to obtain an analysis result.

In the specific implementation, a create table is adopted to create an icon tbl_cure_360 view, data to be analyzed is stored in an online analysis processing type database through a local directory of a server, the online analysis processing type database is called, and the data to be analyzed stored in the online analysis processing type database is analyzed by adopting an online analysis processing method, so that an analysis result is obtained.

It should be noted that the embodiment of the invention provides a distributed memory OLAP database.

In an embodiment of the present invention, the method further includes:

wherein the first server is any one server in the server set.

In this embodiment, it should be noted that, in the case that the first server fails (e.g., is down) at a single point, the first server needs to be restarted, and the restart may cause all the sub-memories of the first server to disappear. Therefore, when the first server is restarted, the automatic sub-memory mounting operation is executed, so that the normal operation of data analysis is ensured.

In particular, the servers in the server set may be X86 devices.

In this embodiment, the server is restarted due to a single point failure, and the automatic sub-memory mounting operation is to be performed, so that the sub-memory is prevented from disappearing when the server is restarted, and the normal operation of data analysis is ensured.

In the embodiment of the invention, the automatic sub-memory mounting comprises the following steps:

In this embodiment, in implementation, in a configuration file for starting up a server, the following commands are added: local/etc.

When the server is started, the following steps are automatically executed:

sh/home/omm/memPartition_start.sh

the contents of the memartion_start.sh are as follows:

mkdir-p/home/omm/gpdata/gpmaster

mkdir-p/home/omm/gpdata/gpdatap1

mkdir-p/home/omm/gpdata/gpdatap2

mkdir-p/home/omm/gpdata/gpdatap3

mkdir-p/home/omm/gpdata/gpdatap4

mkdir-p/home/omm/gpdata/gpdatap5

mkdir-p/home/omm/gpdata/gpdatap6

mkdir-p/home/omm/gpdata/gpdatap7

mkdir-p/home/omm/gpdata/gpdatap8

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpmaster

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap1

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap2

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap3

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap4

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap5

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap6

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap7

sudo mount-t tmpfs-o size＝20G tmpfs/home/omm/gpdata/gpdatap8

in an embodiment of the present invention, dividing a virtualized memory into a plurality of sub-memories includes:

In this embodiment, the embodiment is described below with a specific example: the storage space of the local memory of the server is 256GB, and because the local memory of the server also needs to process some daily operations, the local memory of the server with 240GB is virtualized to obtain a virtualized memory with 240 GB; the 240GB virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is 30GB. In addition, if the storage space of the local memory of the server is 192GB, the number of sub-memories is 8, and the storage space of each sub-memory is 20GB.

It should be noted that, dividing the virtualized memory into a plurality of sub-memories can exert the maximum read-write performance of the database.

The number of the sub-memories is different, and the effects are different, specifically shown in table 1, table 2 and table 3:

TABLE 1

TABLE 2

TABLE 3 Table 3

Query 1:

query 2:2, 3 complex like

Query 3:2, 5 complex like

Query 4:2, 5 complex like,2 or

In this embodiment, the virtualized memory is divided into 8 sub memories, and the storage space of each sub memory is the same, so that all aspects are optimized, and the user experience is further improved.

In the embodiment of the invention, the data to be analyzed is analyzed through the local catalog of the server and the online analysis processing database to obtain an analysis result, and the method comprises the following steps:

In this embodiment, in implementation, the data to be analyzed is saved to the open source MPPDB through the local directory of the server; the OLAP system emphasizes data analysis, emphasizes SQL execution markets, emphasizes disk I/O, emphasizes partitions, and the like, analyzes the data to be analyzed stored in the MPPDB, and obtains analysis results.

In addition, the data to be analyzed may be view data (e.g., customer 360view data), report data, or the like. The data to be analyzed may be saved to at least one online analytical processing type database.

Specifically, an online analysis processing method is adopted to analyze the data to be analyzed stored in an online analysis processing database in a local memory of a server, and the analysis result is obtained.

The financial industry, especially the representative of the industry-banks, has high business category complexity, multiple customer index dimensions, large total customer population, up to thousands of dimensions and hundreds of millions of customers in volume for 360-view or high-volume reports. The embodiment of the invention can be applied to the scene.

Two existing data analysis methods exist, one is based on an ETL (for example, azkaban, oozie or Kettle) framework, and a distributed database such as MPPDB or HIVE is used to analyze a client 360-view or large-capacity report in a batch running mode, so that the data size of data to be analyzed is huge, the efficiency is low, and the analysis time is usually tens of minutes or even hours. Of course, sampling analysis is also possible, and the accuracy of the analysis result cannot be ensured with respect to the full-scale analysis.

The other is realized by a data cube technology (such as Kylin), specifically, dimensional parameters of the cube are required to be configured in advance, secondary processing data are obtained by a HIVE batch running mode and stored in Hbase, and data query based on Hbase is used for integrating different dimensional data, so that an analysis result is obtained. This method has the following problems: before data analysis, dimension parameters and running batches are required to be configured, the running batch time is long, usually tens of minutes, even a few hours, the analysis result is limited by the configured dimension parameters, and real-time analysis cannot be achieved.

Other existing data analysis methods require high costs.

According to the embodiment of the invention, the running environment of the MPDB is replaced by the virtualized memory by the disk, and the read-write (IO) speed of the memory is much faster than that of the disk, so that the analysis efficiency is improved, the time consumed by analysis is reduced, and the high availability is realized. The virtualized memory is obtained by virtualizing the local memory of the server, the cost of the server is low, so that the high-efficiency analysis is realized at lower cost. Furthermore, the data analysis does not need to configure dimension parameters, is not limited by dimension parameters, and improves the user experience.

To implement the embodiments of the present invention, a developer is required to have the following capabilities:

1 are familiar with the shell language and knowledge of the Linux operating system.

2 are familiar with the various processes and features in ETL and data analysis techniques.

3 familiarity with cluster highly available technology design and load balancing design, etc.

The most important is the design of high availability and load balancing of the clusters, so that the data security, the data confidentiality and the high availability of the data are ensured, and the safe and stable operation of the data analysis device is ensured.

The embodiment of the invention can be applied to scenes with huge customer quantity and high informatization investment, in particular to banking industry, internet and the like.

And the memory database is used for storing the data in the database which is directly operated in the memory. Compared with a magnetic disk, the data reading and writing speed of the memory is higher by several orders of magnitude, and compared with the access from the magnetic disk, the data can be stored in the memory, so that the application performance can be greatly improved.

Big data analysis refers to analysis of huge-scale data, and common technologies include distributed platform Apache-Hadoop (and Hadoop-based Hive, pig, HBase, etc.), distributed database GreenPlum (such as EMC-GreenPlum).

In order to solve the problems existing in the prior art, an embodiment of the present invention provides a data analysis device, as shown in fig. 3, including:

the first processing unit 301 is configured to virtualize, for each server in the server set, a local memory of the server to obtain a virtualized memory, divide the virtualized memory into a plurality of sub-memories, and mount each sub-memory in the plurality of sub-memories to a local directory of the server.

The creating unit 302 is configured to create an online analysis processing database in each virtualized memory.

And the second processing unit 303 is configured to analyze the data to be analyzed through the local directory of the server and the online analysis processing database, so as to obtain an analysis result.

In the embodiment of the present invention, the first processing unit 301 is configured to:

wherein the first server is any one server in the server set.

In the embodiment of the present invention, the second processing unit 303 is configured to:

It should be understood that the functions performed by the components of the data analysis device according to the embodiment of the present invention have been described in detail in the data analysis method according to the foregoing embodiment, and will not be described herein.

Fig. 4 illustrates an exemplary system architecture 400 in which the data analysis method or data analysis apparatus of embodiments of the present invention may be applied.

As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 is used as a medium to provide communication links between the terminal devices 401, 402, 403 and the server 405. The network 404 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 405 via the network 404 using the terminal devices 401, 402, 403 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 401, 402, 403.

The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 405 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 401, 402, 403. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.

It should be noted that, the data analysis method provided in the embodiment of the present invention is generally executed by the server 405, and accordingly, the data analysis device is generally disposed in the server 405.

It should be understood that the number of terminal devices, networks and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 501.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first processing unit, a creation unit, and a second processing unit. The names of these units do not limit the unit itself in some cases, and for example, the second processing unit may also be described as "a unit that analyzes data to be analyzed through a local directory of a server and an online analysis processing database to obtain an analysis result".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server; creating an online analysis processing type database in each virtualized memory; analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result.

According to the technical scheme of the embodiment of the invention, an online analysis processing type database is created in each virtualized memory; the data to be analyzed is analyzed through the local catalog of the server and the online analysis processing type database, and the read-write speed of the memory is obviously higher than that of the disk, so that the analysis efficiency is improved, and the user experience is improved. The virtualized memory is obtained by virtualizing the local memory of the server, and the cost of the server is low, so that the data analysis is realized efficiently at lower cost, and the time consumed by the analysis is reduced. The virtualized memory is divided into a plurality of sub memories and mounted, so that the reading and writing speed of the memory is further improved, the analysis efficiency is further improved, and the time consumed by analysis is further reduced.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of data analysis, comprising:

for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server; if a first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted, wherein the first server is any one server in the server set;

creating an online analysis processing type database in each virtualized memory; the method comprises the steps that an installation package of an online analysis processing type database is placed in one server in a server set, and the server sends the installation package of the online analysis processing type database to each server except the server in the server set; for each server in the server set, installing an installation package of the online analysis processing type database on the server, modifying a configuration file, and initializing;

2. The method of claim 1, wherein the sub-memory is automatically mounted, comprising:

3. The method of claim 1, wherein dividing the virtualized memory into a plurality of sub-memories comprises:

4. The method of claim 1, wherein analyzing the data to be analyzed by the local directory of the server and the online analysis processing database to obtain the analysis result comprises:

5. A data analysis device, comprising:

the first processing unit is used for virtualizing the local memory of the server to obtain a virtualized memory for each server in the server set, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to the local directory of the server; if a first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted, wherein the first server is any one server in the server set;

the creation unit is used for creating an online analysis processing type database in each virtualized memory; the method comprises the steps that an installation package of an online analysis processing type database is placed in one server in a server set, and the server sends the installation package of the online analysis processing type database to each server except the server in the server set; for each server in the server set, installing an installation package of the online analysis processing type database on the server, modifying a configuration file, and initializing;

6. The apparatus of claim 5, wherein the first processing unit is configured to:

7. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.