CN111176787B - Data analysis method and device - Google Patents

Data analysis method and device Download PDF

Info

Publication number
CN111176787B
CN111176787B CN201911337325.8A CN201911337325A CN111176787B CN 111176787 B CN111176787 B CN 111176787B CN 201911337325 A CN201911337325 A CN 201911337325A CN 111176787 B CN111176787 B CN 111176787B
Authority
CN
China
Prior art keywords
server
memory
sub
memories
analysis processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911337325.8A
Other languages
Chinese (zh)
Other versions
CN111176787A (en
Inventor
李威
覃鹏
刘增文
吴秦明
叶长全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN201911337325.8A priority Critical patent/CN111176787B/en
Publication of CN111176787A publication Critical patent/CN111176787A/en
Application granted granted Critical
Publication of CN111176787B publication Critical patent/CN111176787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/4555Para-virtualisation, i.e. guest operating system has to be modified
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Abstract

The invention discloses a data analysis method and a data analysis device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server; creating an online analysis processing type database in each virtualized memory; analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result. This embodiment enables efficient data analysis at a lower cost, reducing the time consumed for analysis.

Description

Data analysis method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data analysis method and apparatus.
Background
The existing data analysis methods are two, one is based on ETL and a distributed database, analysis is carried out in a running batch mode, and the other is that data analysis is realized through a data cube technology.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
both of the above data analysis methods require a long time to be consumed and have low analysis efficiency. And the cost of the data analysis method, which consumes a short time, is high.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a data analysis method and apparatus, which can implement efficient data analysis with low cost and reduce the time consumed for analysis.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a data analysis method.
The data analysis method of the embodiment of the invention comprises the following steps:
for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server;
creating an online analysis processing type database in each virtualized memory;
analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result.
In one embodiment, the method further comprises:
if the first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted;
wherein the first server is any one server in the server set.
In one embodiment, the automatic mounting of the sub-memory includes:
virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;
and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.
In one embodiment, partitioning the virtualized memory into a plurality of sub-memories includes:
the virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is the same.
In one embodiment, analyzing the data to be analyzed through a local catalog of a server and an online analysis processing type database to obtain an analysis result comprises:
storing the data to be analyzed into an online analysis processing database through a local catalog of a server;
and analyzing the data to be analyzed stored in the online analysis processing type database by calling the online analysis processing type database and adopting an online analysis processing method to obtain the analysis result.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided a data analysis apparatus.
The data analysis device of the embodiment of the invention comprises:
the first processing unit is used for virtualizing the local memory of the server to obtain a virtualized memory for each server in the server set, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to the local directory of the server;
the creation unit is used for creating an online analysis processing type database in each virtualized memory;
and the second processing unit is used for analyzing the data to be analyzed through the local catalog of the server and the online analysis processing database to obtain an analysis result.
In one embodiment, the first processing unit is configured to:
if the first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted;
wherein the first server is any one server in the server set.
In one embodiment, the first processing unit is configured to:
virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;
and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.
In one embodiment, the first processing unit is configured to:
the virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is the same.
In one embodiment, the second processing unit is configured to:
storing the data to be analyzed into an online analysis processing database through a local catalog of a server;
and analyzing the data to be analyzed stored in the online analysis processing type database by calling the online analysis processing type database and adopting an online analysis processing method to obtain the analysis result.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus.
An electronic device according to an embodiment of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data analysis method provided by the embodiment of the invention.
To achieve the above object, according to still another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer readable medium of an embodiment of the present invention stores a computer program thereon, which when executed by a processor implements the data analysis method provided by the embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: creating an online analysis processing type database in each virtualized memory; the data to be analyzed is analyzed through the local catalog of the server and the online analysis processing type database, and the read-write speed of the memory is obviously higher than that of the disk, so that the analysis efficiency is improved, and the user experience is improved. The virtualized memory is obtained by virtualizing the local memory of the server, and the cost of the server is low, so that the data analysis is realized efficiently at lower cost, and the time consumed by the analysis is reduced. The virtualized memory is divided into a plurality of sub memories and mounted, so that the reading and writing speed of the memory is further improved, the analysis efficiency is further improved, and the time consumed by analysis is further reduced.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a data analysis method according to an embodiment of the present invention;
FIG. 2 is an application scenario of a data analysis method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of main units of a data analysis apparatus according to an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 5 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict.
In order to solve the problems in the prior art, an embodiment of the present invention provides a data analysis method, as shown in fig. 1, including:
step S101, for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server.
In this step, the local memory of the server is virtualized by using the existing virtualization technology, and a virtualized memory is obtained. And mounting the sub-memory to a local directory of the server by adopting the existing mounting technology. The existing mounting technology may be a linux mounting technology.
As shown in fig. 2, this step is described below with a specific example:
the server set includes a server 1, a server 2, and a server 3.
The local memory of the server 1 is virtualized to obtain a virtualized memory, the virtualized memory is divided into 8 sub-memories, and each sub-memory in the 8 sub-memories is mounted to a local directory of the server 1.
The local memory of the server 2 is virtualized to obtain a virtualized memory, the virtualized memory is divided into 8 sub-memories, and each sub-memory in the 8 sub-memories is mounted to a local directory of the server 2.
The local memory of the server 3 is virtualized to obtain a virtualized memory, the virtualized memory is divided into 8 sub-memories, and each sub-memory in the 8 sub-memories is mounted to a local directory of the server 3.
The following description will take the server 1 as an example:
1.1 editing the Sudoers (/ etc/Sudoers) file so that non-root users mount partitions (root users execute)
Executing a whereeis mount/umcount view mount, umcount command location
Executing command, visual, add to the following configuration
Cmnd_Alias MOUNT=/bin/mount,/bin/umount
omm ALL=NOPASSWD:MOUNT
1.2 New ADB database installation user omm (root user execution) (Standby)
useradd-d/home/omm-m omm
passwd omm
1.3 sub memory mount (omm user execution)
1.3.1 New child memory mounted directory, here taking 8 child memory directories as an example
mkdir-p/home/omm/gpdata/gpmaster
mkdir-p/home/omm/gpdata/gpdatap1
mkdir-p/home/omm/gpdata/gpdatap2
mkdir-p/home/omm/gpdata/gpdatap3
mkdir-p/home/omm/gpdata/gpdatap4
mkdir-p/home/omm/gpdata/gpdatap5
mkdir-p/home/omm/gpdata/gpdatap6
mkdir-p/home/omm/gpdata/gpdatap7
mkdir-p/home/omm/gpdata/gpdatap8
1.3.2 mounting memory
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpmaster
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap1
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap2
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap3
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap4
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap5
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap6
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap7
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap8
sudo umount/home/omm/gpdata/gpmaster
sudo umount/home/omm/gpdata/gpdatap1
sudo umount/home/omm/gpdata/gpdatap2
sudo umount/home/omm/gpdata/gpdatap3
sudo umount/home/omm/gpdata/gpdatap4
sudo umount/home/omm/gpdata/gpdatap5
sudo umount/home/omm/gpdata/gpdatap6
sudo umount/home/omm/gpdata/gpdatap7
sudo umount/home/omm/gpdata/gpdatap8
Thus, the mounting of the sub-memory is completed.
Step S102, an online analysis processing type database is created in each virtualized memory.
In this step, when implemented, the method includes:
2.1 uploading an open Source MPPDB database installation package greenplus-db-4.3.5.2-build-1-RHEL 5-x86_64.Zip to a gp Master node Server (the Master node Server is one of a set of servers).
2.2 decompressing the installation package unopplum-db-4.3.5.2-build-1-RHEL 5-x86_64.Zip to obtain grepplum-db-4.3.5.2-build-1-RHEL 5-x86_64.Bin and READMEM_INSTALL
The system parameters are set according to READM_INSTALL.
2.3 modifying the hosts file, adding node IP configuration to facilitate the communication between the GP nodes (root user execution, all nodes)
vi/etc/hosts
IP1 ADB01
IP2 ADB02
IP3 ADB03
2.4 modifying System Kernel File (root user execution, all nodes)
vi/etc/sysctl. Conf, add or modify the following configuration
kernel.shmmax=500000000
kernel.shmmni=4096
kernel.shmall=4000000000
kernel.sem=250 512000 100 2048
kernel.sysrq=1
kernel.core_uses_pid=1
kernel.msgmnb=65536
kernel.msgmax=65536
kernel.msgmni=2048
net.ipv4.tcp_syncookies=1
net.ipv4.ip_forward=0
net.ipv4.conf.default.accept_source_route=0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.conf.all.arp_filter=1
net.ipv4.ip_local_port_range=1025 65535
net.core.netdev_max_backlog=10000
net.core.rmem_max=2097152
net.core.wmem_max=2097152
vm.overcommit_memory=2
Finally, let the configuration take effect and input command sysctl-p
2.5 modifying Linux maximum connection number (root user execution, all nodes)
vi/etc/security/limits. Conf, newly added to the following
*soft nofile 65536
*hard nofile 65536
*soft nproc 131072
*hard nproc 131072
2.6 installation (omm user execution, master node)
Executing a command: the/bin/flash greenplus-db-4.3.5.2-build-1-RHEL 5-x86_64.Bin
The following installation options appear after the carriage return reading clause
2.7 child node (child node means any one of the servers in the server set except the master node server) installation
gpseginstall-f/home/omm/greenplum-db/etc/hostlist-u omm-p omm
2.8 Cluster System initialization (ensuring Firewall has been closed before installation)
gpinitsystem-c/home/omm/greenplum-db/etc/gpinitsystem_config-h/home/omm/greenplum-db/etc/seg_hosts
Then psql-h IP1-U omm sordb can use the cluster.
It should be noted that the online analysis processing database (also called OLAP database) may be an MPPDB, which is a distributed parallel database with Shared notification architecture, and has the characteristics of high performance, high availability and high expansion, so that it may provide a general purpose computing platform with high cost performance for ultra-large scale data management, and is widely used for supporting various data warehouse systems, BI systems and decision support systems.
Specifically, the installation package of the MPPDB is placed in one server in the server set, and the installation package of the MPPDB is respectively sent to each server except the one server in the server set by the one server. For each server in a set of servers, the server installs the installation package of the MPPDB on the server, modifies the configuration file, and initializes. The specific operation process is described in the specification of 2.1 to 2.8. In addition, the number of servers in the server set may be set according to requirements.
It should be noted that, the virtualized memory of each server in the server set creates an online analysis processing database.
In addition, databases include OLTP type databases (e.g., gemfire (geode)) and OLAP type databases. The embodiment of the invention is suitable for an OLAP type database and is not suitable for an OLTP type database.
And step S103, analyzing the data to be analyzed through a local catalog of the server and an online analysis processing database to obtain an analysis result.
In the specific implementation, a create table is adopted to create an icon tbl_cure_360 view, data to be analyzed is stored in an online analysis processing type database through a local directory of a server, the online analysis processing type database is called, and the data to be analyzed stored in the online analysis processing type database is analyzed by adopting an online analysis processing method, so that an analysis result is obtained.
It should be noted that the embodiment of the invention provides a distributed memory OLAP database.
In an embodiment of the present invention, the method further includes:
if the first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted;
wherein the first server is any one server in the server set.
In this embodiment, it should be noted that, in the case that the first server fails (e.g., is down) at a single point, the first server needs to be restarted, and the restart may cause all the sub-memories of the first server to disappear. Therefore, when the first server is restarted, the automatic sub-memory mounting operation is executed, so that the normal operation of data analysis is ensured.
In particular, the servers in the server set may be X86 devices.
In this embodiment, the server is restarted due to a single point failure, and the automatic sub-memory mounting operation is to be performed, so that the sub-memory is prevented from disappearing when the server is restarted, and the normal operation of data analysis is ensured.
In the embodiment of the invention, the automatic sub-memory mounting comprises the following steps:
virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;
and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.
In this embodiment, in implementation, in a configuration file for starting up a server, the following commands are added: local/etc.
When the server is started, the following steps are automatically executed:
sh/home/omm/memPartition_start.sh
the contents of the memartion_start.sh are as follows:
mkdir-p/home/omm/gpdata/gpmaster
mkdir-p/home/omm/gpdata/gpdatap1
mkdir-p/home/omm/gpdata/gpdatap2
mkdir-p/home/omm/gpdata/gpdatap3
mkdir-p/home/omm/gpdata/gpdatap4
mkdir-p/home/omm/gpdata/gpdatap5
mkdir-p/home/omm/gpdata/gpdatap6
mkdir-p/home/omm/gpdata/gpdatap7
mkdir-p/home/omm/gpdata/gpdatap8
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpmaster
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap1
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap2
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap3
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap4
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap5
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap6
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap7
sudo mount-t tmpfs-o size=20G tmpfs/home/omm/gpdata/gpdatap8
in an embodiment of the present invention, dividing a virtualized memory into a plurality of sub-memories includes:
the virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is the same.
In this embodiment, the embodiment is described below with a specific example: the storage space of the local memory of the server is 256GB, and because the local memory of the server also needs to process some daily operations, the local memory of the server with 240GB is virtualized to obtain a virtualized memory with 240 GB; the 240GB virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is 30GB. In addition, if the storage space of the local memory of the server is 192GB, the number of sub-memories is 8, and the storage space of each sub-memory is 20GB.
It should be noted that, dividing the virtualized memory into a plurality of sub-memories can exert the maximum read-write performance of the database.
The number of the sub-memories is different, and the effects are different, specifically shown in table 1, table 2 and table 3:
TABLE 1
TABLE 2
TABLE 3 Table 3
Query 1:
query 2:2, 3 complex like
Query 3:2, 5 complex like
Query 4:2, 5 complex like,2 or
In this embodiment, the virtualized memory is divided into 8 sub memories, and the storage space of each sub memory is the same, so that all aspects are optimized, and the user experience is further improved.
In the embodiment of the invention, the data to be analyzed is analyzed through the local catalog of the server and the online analysis processing database to obtain an analysis result, and the method comprises the following steps:
storing the data to be analyzed into an online analysis processing database through a local catalog of a server;
and analyzing the data to be analyzed stored in the online analysis processing type database by calling the online analysis processing type database and adopting an online analysis processing method to obtain the analysis result.
In this embodiment, in implementation, the data to be analyzed is saved to the open source MPPDB through the local directory of the server; the OLAP system emphasizes data analysis, emphasizes SQL execution markets, emphasizes disk I/O, emphasizes partitions, and the like, analyzes the data to be analyzed stored in the MPPDB, and obtains analysis results.
In addition, the data to be analyzed may be view data (e.g., customer 360view data), report data, or the like. The data to be analyzed may be saved to at least one online analytical processing type database.
Specifically, an online analysis processing method is adopted to analyze the data to be analyzed stored in an online analysis processing database in a local memory of a server, and the analysis result is obtained.
The financial industry, especially the representative of the industry-banks, has high business category complexity, multiple customer index dimensions, large total customer population, up to thousands of dimensions and hundreds of millions of customers in volume for 360-view or high-volume reports. The embodiment of the invention can be applied to the scene.
Two existing data analysis methods exist, one is based on an ETL (for example, azkaban, oozie or Kettle) framework, and a distributed database such as MPPDB or HIVE is used to analyze a client 360-view or large-capacity report in a batch running mode, so that the data size of data to be analyzed is huge, the efficiency is low, and the analysis time is usually tens of minutes or even hours. Of course, sampling analysis is also possible, and the accuracy of the analysis result cannot be ensured with respect to the full-scale analysis.
The other is realized by a data cube technology (such as Kylin), specifically, dimensional parameters of the cube are required to be configured in advance, secondary processing data are obtained by a HIVE batch running mode and stored in Hbase, and data query based on Hbase is used for integrating different dimensional data, so that an analysis result is obtained. This method has the following problems: before data analysis, dimension parameters and running batches are required to be configured, the running batch time is long, usually tens of minutes, even a few hours, the analysis result is limited by the configured dimension parameters, and real-time analysis cannot be achieved.
Other existing data analysis methods require high costs.
According to the embodiment of the invention, the running environment of the MPDB is replaced by the virtualized memory by the disk, and the read-write (IO) speed of the memory is much faster than that of the disk, so that the analysis efficiency is improved, the time consumed by analysis is reduced, and the high availability is realized. The virtualized memory is obtained by virtualizing the local memory of the server, the cost of the server is low, so that the high-efficiency analysis is realized at lower cost. Furthermore, the data analysis does not need to configure dimension parameters, is not limited by dimension parameters, and improves the user experience.
To implement the embodiments of the present invention, a developer is required to have the following capabilities:
1 are familiar with the shell language and knowledge of the Linux operating system.
2 are familiar with the various processes and features in ETL and data analysis techniques.
3 familiarity with cluster highly available technology design and load balancing design, etc.
The most important is the design of high availability and load balancing of the clusters, so that the data security, the data confidentiality and the high availability of the data are ensured, and the safe and stable operation of the data analysis device is ensured.
The embodiment of the invention can be applied to scenes with huge customer quantity and high informatization investment, in particular to banking industry, internet and the like.
And the memory database is used for storing the data in the database which is directly operated in the memory. Compared with a magnetic disk, the data reading and writing speed of the memory is higher by several orders of magnitude, and compared with the access from the magnetic disk, the data can be stored in the memory, so that the application performance can be greatly improved.
Big data analysis refers to analysis of huge-scale data, and common technologies include distributed platform Apache-Hadoop (and Hadoop-based Hive, pig, HBase, etc.), distributed database GreenPlum (such as EMC-GreenPlum).
In order to solve the problems existing in the prior art, an embodiment of the present invention provides a data analysis device, as shown in fig. 3, including:
the first processing unit 301 is configured to virtualize, for each server in the server set, a local memory of the server to obtain a virtualized memory, divide the virtualized memory into a plurality of sub-memories, and mount each sub-memory in the plurality of sub-memories to a local directory of the server.
The creating unit 302 is configured to create an online analysis processing database in each virtualized memory.
And the second processing unit 303 is configured to analyze the data to be analyzed through the local directory of the server and the online analysis processing database, so as to obtain an analysis result.
In the embodiment of the present invention, the first processing unit 301 is configured to:
if the first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted;
wherein the first server is any one server in the server set.
In the embodiment of the present invention, the first processing unit 301 is configured to:
virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;
and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.
In the embodiment of the present invention, the first processing unit 301 is configured to:
the virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is the same.
In the embodiment of the present invention, the second processing unit 303 is configured to:
storing the data to be analyzed into an online analysis processing database through a local catalog of a server;
and analyzing the data to be analyzed stored in the online analysis processing type database by calling the online analysis processing type database and adopting an online analysis processing method to obtain the analysis result.
It should be understood that the functions performed by the components of the data analysis device according to the embodiment of the present invention have been described in detail in the data analysis method according to the foregoing embodiment, and will not be described herein.
Fig. 4 illustrates an exemplary system architecture 400 in which the data analysis method or data analysis apparatus of embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 is used as a medium to provide communication links between the terminal devices 401, 402, 403 and the server 405. The network 404 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 405 via the network 404 using the terminal devices 401, 402, 403 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 401, 402, 403.
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 401, 402, 403. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the data analysis method provided in the embodiment of the present invention is generally executed by the server 405, and accordingly, the data analysis device is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 501.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a first processing unit, a creation unit, and a second processing unit. The names of these units do not limit the unit itself in some cases, and for example, the second processing unit may also be described as "a unit that analyzes data to be analyzed through a local directory of a server and an online analysis processing database to obtain an analysis result".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server; creating an online analysis processing type database in each virtualized memory; analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result.
According to the technical scheme of the embodiment of the invention, an online analysis processing type database is created in each virtualized memory; the data to be analyzed is analyzed through the local catalog of the server and the online analysis processing type database, and the read-write speed of the memory is obviously higher than that of the disk, so that the analysis efficiency is improved, and the user experience is improved. The virtualized memory is obtained by virtualizing the local memory of the server, and the cost of the server is low, so that the data analysis is realized efficiently at lower cost, and the time consumed by the analysis is reduced. The virtualized memory is divided into a plurality of sub memories and mounted, so that the reading and writing speed of the memory is further improved, the analysis efficiency is further improved, and the time consumed by analysis is further reduced.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method of data analysis, comprising:
for each server in a server set, virtualizing a local memory of the server to obtain a virtualized memory, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to a local directory of the server; if a first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted, wherein the first server is any one server in the server set;
creating an online analysis processing type database in each virtualized memory; the method comprises the steps that an installation package of an online analysis processing type database is placed in one server in a server set, and the server sends the installation package of the online analysis processing type database to each server except the server in the server set; for each server in the server set, installing an installation package of the online analysis processing type database on the server, modifying a configuration file, and initializing;
analyzing the data to be analyzed through a local catalog of the server and an online analysis processing type database to obtain an analysis result.
2. The method of claim 1, wherein the sub-memory is automatically mounted, comprising:
virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;
and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.
3. The method of claim 1, wherein dividing the virtualized memory into a plurality of sub-memories comprises:
the virtualized memory is divided into 8 sub-memories, and the storage space of each sub-memory is the same.
4. The method of claim 1, wherein analyzing the data to be analyzed by the local directory of the server and the online analysis processing database to obtain the analysis result comprises:
storing the data to be analyzed into an online analysis processing database through a local catalog of a server;
and analyzing the data to be analyzed stored in the online analysis processing type database by calling the online analysis processing type database and adopting an online analysis processing method to obtain the analysis result.
5. A data analysis device, comprising:
the first processing unit is used for virtualizing the local memory of the server to obtain a virtualized memory for each server in the server set, dividing the virtualized memory into a plurality of sub-memories, and mounting each sub-memory in the plurality of sub-memories to the local directory of the server; if a first server has single-point failure, executing automatic sub-memory mounting operation when the first server is restarted, wherein the first server is any one server in the server set;
the creation unit is used for creating an online analysis processing type database in each virtualized memory; the method comprises the steps that an installation package of an online analysis processing type database is placed in one server in a server set, and the server sends the installation package of the online analysis processing type database to each server except the server in the server set; for each server in the server set, installing an installation package of the online analysis processing type database on the server, modifying a configuration file, and initializing;
and the second processing unit is used for analyzing the data to be analyzed through the local catalog of the server and the online analysis processing database to obtain an analysis result.
6. The apparatus of claim 5, wherein the first processing unit is configured to:
virtualizing a local memory of the first server to obtain a virtualized memory, and dividing the virtualized memory into a plurality of sub-memories;
and mounting each sub-memory in the plurality of sub-memories to a local directory of the first server.
7. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.
8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.
CN201911337325.8A 2019-12-23 2019-12-23 Data analysis method and device Active CN111176787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911337325.8A CN111176787B (en) 2019-12-23 2019-12-23 Data analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911337325.8A CN111176787B (en) 2019-12-23 2019-12-23 Data analysis method and device

Publications (2)

Publication Number Publication Date
CN111176787A CN111176787A (en) 2020-05-19
CN111176787B true CN111176787B (en) 2023-07-28

Family

ID=70652088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911337325.8A Active CN111176787B (en) 2019-12-23 2019-12-23 Data analysis method and device

Country Status (1)

Country Link
CN (1) CN111176787B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102710503A (en) * 2012-05-15 2012-10-03 浪潮电子信息产业股份有限公司 Network load balancing method based on cloud sea OS (operation system)
JP2017076161A (en) * 2015-10-13 2017-04-20 日本電信電話株式会社 Analyzing method, analyzing device, and analyzing program
CN108600321A (en) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 A kind of diagram data storage method and system based on distributed memory cloud
CN108694014A (en) * 2017-04-06 2018-10-23 群晖科技股份有限公司 For carrying out the method and apparatus of memory headroom reservation and management
CN109977093A (en) * 2019-04-04 2019-07-05 中科创达(重庆)汽车科技有限公司 More virtual systems based on LXC check the method and device of container log
CN110008004A (en) * 2019-04-11 2019-07-12 广东电网有限责任公司 A kind of power system computation analysis application virtualization method, apparatus and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102710503A (en) * 2012-05-15 2012-10-03 浪潮电子信息产业股份有限公司 Network load balancing method based on cloud sea OS (operation system)
JP2017076161A (en) * 2015-10-13 2017-04-20 日本電信電話株式会社 Analyzing method, analyzing device, and analyzing program
CN108694014A (en) * 2017-04-06 2018-10-23 群晖科技股份有限公司 For carrying out the method and apparatus of memory headroom reservation and management
CN108600321A (en) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 A kind of diagram data storage method and system based on distributed memory cloud
CN109977093A (en) * 2019-04-04 2019-07-05 中科创达(重庆)汽车科技有限公司 More virtual systems based on LXC check the method and device of container log
CN110008004A (en) * 2019-04-11 2019-07-12 广东电网有限责任公司 A kind of power system computation analysis application virtualization method, apparatus and equipment

Also Published As

Publication number Publication date
CN111176787A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN105893139B (en) Method and device for providing storage service for tenant in cloud storage environment
US9304697B2 (en) Common contiguous memory region optimized virtual machine migration within a workgroup
US10574734B2 (en) Dynamic data and compute management
US20120054182A1 (en) Systems and methods for massive structured data management over cloud aware distributed file system
US11030169B1 (en) Data re-sharding
US11076020B2 (en) Dynamically transitioning the file system role of compute nodes for provisioning a storlet
US11132458B2 (en) Tape processing offload to object storage
US10956499B2 (en) Efficient property graph storage for streaming/multi-versioning graphs
US10685033B1 (en) Systems and methods for building an extract, transform, load pipeline
CN112905854A (en) Data processing method and device, computing equipment and storage medium
US20230058908A1 (en) File tiering to different storage classes of a cloud storage provider
US10599626B2 (en) Organization for efficient data analytics
US11036698B2 (en) Non-relational database coprocessor for reading raw data files copied from relational databases
CN111176787B (en) Data analysis method and device
US11941453B2 (en) Containerized computing environments
US11290532B2 (en) Tape reconstruction from object storage
US20140074785A1 (en) Open file rebalance
US11086836B2 (en) Index leaf page splits avoidance or reduction
US10503731B2 (en) Efficient analysis of distinct aggregations
CN109213815B (en) Method, device, server terminal and readable medium for controlling execution times
US10037241B2 (en) Category dependent pre-processor for batch commands
US11194758B1 (en) Data archiving using a compute efficient format in a service provider environment
US11822914B2 (en) Upgrade for relational database dependent application
US11520781B2 (en) Efficient bulk loading multiple rows or partitions for a single target table
US11314730B1 (en) Memory-efficient streaming count estimation for multisets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220921

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant