WO2022082892A1

WO2022082892A1 - Big data analysis method and system, and computer device and storage medium thereof

Info

Publication number: WO2022082892A1
Application number: PCT/CN2020/127948
Authority: WO
Inventors: 彭加山; 彭晓芳
Original assignee: 苏州莱锦机电自动化有限公司
Priority date: 2020-10-20
Filing date: 2020-11-11
Publication date: 2022-04-28
Also published as: CN112269830A

Abstract

A big data analysis method and system, and a computer device and a storage medium thereof, relating to the technical field of big data analysis. Said method comprises: acquiring big data information accessed by a user within a target time period (S100); segmenting the big data information according to time and storing same in a distributed database (S200); on the basis of a predetermined rule, executing a data analysis task on the big data information stored in the distributed database, so as to obtain analysis results (S300); configuring a data buffering server to buffer the analysis results (S400); and when a data query request of a front end is acquired, invoking, according to parameters of the data query request, the corresponding analysis result data in the data buffering server as a query result, and outputting the query result to the front end and performing visual display (S500). The present invention can reduce the burden of a processor of a big data analysis system, increase the access speed of a user, avoid the occurrence of stuck access, and ensure smooth user access.

Description

Big data analysis method, system, computer equipment and storage medium thereof

technical field

The embodiments of the present invention relate to the technical field of big data analysis, in particular to a big data analysis method, system, computer equipment and storage medium thereof.

Background technique

With the advent of the era of big data, the amount of information in the network increases exponentially, which brings the problem of information overload. Recommendation systems are one of the most effective ways to solve information overload. The research hotspot, the "big data era" has arrived. With the advent of the era of "big data", people are mining and using massive data, which heralds the arrival of a new wave of productivity growth and consumer surplus. Big data is another major disruptive technological revolution in the IT industry after cloud computing and the Internet of Things. With the advent of big data, people's demand for big data continues to increase, which will increase the burden on the intelligent analysis system and cause the burden on the processor of the intelligent analysis system. When the intelligent analysis system is burdened, it will reduce the user's The access speed may be stuck, or the data cannot be loaded, which will cause the user to experience unsmooth access.

technical solutions

The purpose of the embodiments of the present invention is to provide a big data analysis method, a system, a computer device and a storage medium thereof, so as to solve the problems raised in the above background art.

To achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

A big data analysis method, the method includes the following steps:

Obtain big data information accessed by users within the target time period;

storing the big data information in a distributed database according to time slices;

Perform data analysis tasks on the big data information stored in the distributed database based on predetermined rules to obtain analysis results;

configuring the data cache server to cache the analysis results;

When the front-end data query request is obtained, the corresponding analysis result data is called in the data cache server as the query result according to the parameters of the data query request, and the query result is output to the front-end for visual display.

As a further limitation of the technical solutions of the embodiments of the present invention, the distributed database is an Hbase database.

As a further limitation of the technical solution of the embodiment of the present invention, the step of caching the analysis result configuration data cache server includes:

configure the first cache server and the second cache server;

storing the analysis result data whose number of visits is not greater than a preset threshold in the second cache server;

When the access times of the analysis result data in the second cache server is greater than a preset threshold, the analysis result data is transferred to the first cache server.

As a further limitation of the technical solution of the embodiment of the present invention, the step of caching the analysis result configuration data cache server further includes:

Eliminate the analysis result data with the earliest access time in the first cache server;

transferring the analysis result data eliminated in the first cache server to the second cache server;

Clear the analysis result data with the earliest storage time in the second cache server.

Eliminate the analysis result data with the least number of visits within the preset time in the first cache server;

Big data analysis system, the system includes:

an acquisition unit, the acquisition unit is used to acquire the big data information accessed by the user within the target time period;

a storage unit, the storage unit is configured to store the big data information in a distributed database by time slices;

an execution unit, configured to perform a data analysis task on the big data information stored in the distributed database based on a predetermined rule to obtain an analysis result;

a cache unit, the cache unit is configured to cache the analysis result configuration data cache server; and

an output unit, the output unit is used to call the corresponding analysis result data in the data cache server as the query result according to the parameters of the data query request when the front-end data query request is obtained, and output the query result to the front-end and visualize it exhibit.

As a further limitation of the technical solutions of the embodiments of the present invention, the cache unit includes:

a configuration module, the configuration module is used to configure the first cache server and the second cache server;

a storage module, the storage module is configured to store the analysis result data whose access times are not greater than a preset threshold in the second cache server; and

A transfer module, configured to transfer the analysis result data to the first cache server when the access times of the analysis result data in the second cache server is greater than a preset threshold.

A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.

A storage medium storing a computer program, the computer program implementing the steps of the method when executed by the processor.

beneficial effect

Compared with the prior art, the big data analysis method provided by the embodiment of the present invention obtains the big data information accessed by the user within the target time period; stores the big data information in a distributed database by time slices; Predetermined rules perform data analysis tasks on the big data information stored in the distributed database to obtain analysis results; configure a data cache server to cache the analysis results; and when the front-end data query request is obtained, according to the data query The requested parameters call the corresponding analysis result data in the data cache server as the query result, and output the query result to the front end and display it visually, which can reduce the burden on the processor of the big data analysis system, improve the user's access speed, and avoid generating Access to the situation of stuck, to ensure the smoothness of user access.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present invention. some examples.

FIG. 1 is an architectural diagram of a big data analysis method applicable to an embodiment of the present invention provided by an embodiment of the present invention.

FIG. 2 is an implementation flowchart of the big data analysis method provided in Embodiment 1 of the present invention.

FIG. 3 is an implementation flowchart of the big data analysis method provided in Embodiment 2 of the present invention.

FIG. 4 is an implementation flowchart of the big data analysis method provided in Embodiment 3 of the present invention.

FIG. 5 is an implementation flowchart of the big data analysis method provided in Embodiment 4 of the present invention.

FIG. 6 is a structural block diagram of a big data analysis system according to Embodiment 5 of the present invention.

FIG. 7 is a structural block diagram of a cache unit in a big data analysis system according to Embodiment 6 of the present invention.

Embodiments of the present invention

In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

In the embodiment of the present invention, the big data information accessed by the user within the target time period is obtained; the big data information is stored in the distributed database by time slices; and then based on predetermined rules, the data is stored in the distributed database Execute data analysis task on the big data information of the database to obtain the analysis result; configure the data cache server to cache the analysis result; and when the front-end data query request is obtained, call the data cache server according to the parameters of the data query request The corresponding analysis result data is used as the query result, and the query result is output to the front end for visual display, which can reduce the burden on the processor of the big data analysis system, improve the user's access speed, avoid access jams, and ensure smooth user access. sex.

FIG. 1 shows an exemplary system architecture diagram to which an embodiment of the big data analysis method of the present disclosure can be applied.

As shown in Figure 1, the system architecture may include terminals, distributed databases and cache servers.

The user can use the terminal to interact with the cache server through the network to receive or send messages and so on.

Terminals can be hardware or software. When the terminal is hardware, it can be various electronic devices with communication functions, including but not limited to smart phones, tablet computers, e-book readers, MP3 players, MP4 players, laptop computers and desktop computers, etc. . When the terminal is software, it can be installed in the electronic devices listed above. It can be implemented as a plurality of software or software modules, and can also be implemented as a single software or software module. There is no specific limitation here.

It should be noted that the cache server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server is software, it may be implemented as multiple software or software modules, or may be implemented as a single software or software module. There is no specific limitation here.

It should be understood that the numbers of terminals and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

Embodiment 1: The embodiment of the present invention provides a big data analysis method.

Please refer to FIG. 2 , which shows a flow of an embodiment of a big data analysis method. This embodiment is mainly illustrated by applying the method to an electronic device having a certain computing capability, and the electronic device may be the terminal shown in FIG. 1 . The big data analysis method includes the following steps:

Step S100: acquiring the big data information accessed by the user within the target time period;

In step S100 provided by the embodiment of the present invention, when a user uses a terminal to perform search access, the terminal acquires data of the user's search access within a target time period, and stores the time node and text information of the data, and stores the data. The time node information and text information are sent through the network.

The network may be the medium used to provide the communication link between the terminal and the server. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc., without limitation.

Step S200: storing the big data information in a distributed database by time slices;

In step S200 provided by the embodiment of the present invention, the time slice may be set to one week as required, and the big data called by the server will be overwritten by the new big data after one week, thereby realizing the updating of the big data.

In the big data analysis method, before the data is stored in the distributed database, the integrity verification and legality verification of the big data are also included.

Step S300: performing a data analysis task on the big data information stored in the distributed database based on a predetermined rule to obtain an analysis result;

Step S400: Cache the analysis result configuration data cache server;

Step S500: When the data query request from the front end is obtained, call the corresponding analysis result data in the data cache server as the query result according to the parameters of the data query request, and output the query result to the front end for visual display.

In step S500 provided by the embodiment of the present invention, the visual display uses the display screen of the terminal to display the output query result, so that the user can obtain the query result.

Further, in a preferred embodiment provided by the present invention, the distributed database is an Hbase database, and the big data is stored in the form of row keys (rowkeys) and column names.

In the big data analysis method, before the data is stored in the distributed database, the integrity verification and legality verification of the big data are also included, wherein the integrity verification is completed by redis in the network system. Send big data to the server to complete legality verification locally.

Among them, Redis is an open-source, network-supporting log-type, key-value database that can be memory-based or persistent.

Embodiment 2: FIG. 3 discloses a schematic flowchart of step S400 of configuring a data cache server to cache the analysis result in the big data analysis method provided by the embodiment of the present invention, wherein the data cache server is configured for the analysis result. The step S400 of caching includes:

Step S401: configure the first cache server and the second cache server;

Step S402: Store the analysis result data whose access times are not greater than a preset threshold in the second cache server;

Step S403: When the access times of the analysis result data in the second cache server is greater than a preset threshold, transfer the analysis result data to the first cache server.

In the above step S400 of configuring a data cache server to cache the analysis result, two cache servers are configured to store cached data with more access times and less access times respectively, and the two cache servers adopt independent elimination strategies for data storage. Elimination can avoid inaccurate judgment of the single-cache server, and eliminate some data expected to be cached, thereby effectively improving the accuracy of cached data.

Embodiment 3: FIG. 4 discloses a schematic flowchart of a further description of step S400 of configuring a data cache server to cache the analysis result in the big data analysis method provided by the embodiment of the present invention. Wherein, the step S400 of caching the analysis result configuration data cache server further includes:

Step S4401: Eliminate the analysis result data with the earliest access time in the first cache server;

Step S4501: Transfer the analysis result data eliminated in the first cache server to the second cache server;

Step S4601: Clear the analysis result data with the earliest storage time in the second cache server.

Specifically, in a preferred embodiment, data elimination is performed after the data storage of the first cache server is full.

The data with the farthest access time from the current time in the first cache server within a preset period of time is obtained, and the data with the earliest access time within the preset time period is preferentially eliminated, and the eliminated data can be transferred to the second cache server.

As a result, some data with earlier access time but less recent access frequency will not be directly eliminated, avoiding misjudgment, inaccurate judgment of single-cache server, and eliminating some data that is expected to be cached, thus effectively. Improved the accuracy of cached data.

Embodiment 4: FIG. 5 discloses a schematic flowchart of a further description of step S400 of configuring a data cache server to cache the analysis result in the big data analysis method provided by the embodiment of the present invention. Wherein, the step S400 of caching the analysis result configuration data cache server further includes:

Step S4401: Eliminate the analysis result data with the least access times within the preset time in the first cache server;

Step S4502: Transfer the analysis result data eliminated in the first cache server to the second cache server;

Step S4602: Clear the analysis result data with the earliest storage time in the second cache server.

The number of times that each data in the first cache server is accessed within a preset period of time is obtained, and the data with the least number of visits within the preset period of time is preferentially eliminated, and the eliminated data can be transferred to the second cache server.

As a result, some data with a large number of visits but a low frequency of recent visits will not be directly eliminated, avoiding misjudgment, inaccurate judgment of the single-cache server, and eliminating some data that are expected to be cached, thus effectively. Improved the accuracy of cached data.

Embodiment 5: The embodiment of the present invention provides a big data analysis system 600 .

Specifically, the big data analysis system 600 includes:

Obtaining unit 601, the obtaining unit is used to obtain the big data information accessed by the user within the target time period;

A storage unit 602, which is configured to store the big data information in a distributed database by time slices;

In the embodiment of the present invention, the time slicing can be set to one week as required, and the big data called by the server will be covered by the new big data after one week, so as to realize the updating of the big data. In the big data analysis method, before the data is stored in the distributed database, the integrity verification and legality verification of the big data are also included.

an execution unit 603, the execution unit is configured to perform a data analysis task on the big data information stored in the distributed database based on a predetermined rule, and obtain an analysis result;

a cache unit 604, the cache unit is configured to cache the analysis result configuration data cache server; and

Output unit 605, the output unit is used to call the corresponding analysis result data in the data cache server as the query result according to the parameters of the data query request when the front-end data query request is obtained, and output the query result to the front-end and perform the query. Visual display.

Embodiment 6: FIG. 7 shows a structural block diagram of the cache unit 604 in the big data analysis system provided by Embodiment 6 of the present invention. Wherein, the cache unit 604 includes:

A configuration module 6041, the configuration module is used to configure the first cache server and the second cache server;

A storage module 6042, the storage module is configured to store the analysis result data whose access times are not greater than a preset threshold in the second cache server; and

Transfer module 6043, the transfer module is configured to transfer the analysis result data to the first cache server when the access times of the analysis result data in the second cache server is greater than a preset threshold.

The cache unit 604 configures two cache servers to store cached data with more visits and fewer visits respectively, and the two cache servers adopt independent elimination strategies to eliminate data, which can avoid inaccurate judgment of a single cache server, and use Some expect cached data eviction, thereby effectively improving the accuracy of cached data.

Embodiment 7: Embodiment 7 of the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes The computer program implements the steps of the big data analysis method.

Embodiment 8: Embodiment 8 of the present invention further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the big data analysis method are implemented.

Exemplarily, a computer program may be divided into one or more modules, and the one or more modules are stored in a memory and executed by a processor to accomplish the present invention. One or more modules may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal device. For example, the above-mentioned computer program can be divided into units or modules of the berth status display system provided by each of the above-mentioned system embodiments.

Those skilled in the art can understand that the above description of the terminal device is only an example, and does not constitute a limitation on the terminal device, and may include more or less components than the above description, or combine some components, or different components, such as It can include input and output devices, network access devices, buses, etc.

The processor may be a central processing unit, or other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. . The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. The above-mentioned processor is the control center of the above-mentioned terminal equipment, and uses various interfaces and lines to connect various parts of the entire user terminal.

The above-mentioned memory can be used to store computer programs and/or modules, and the above-mentioned processor implements various functions of the above-mentioned terminal device by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory can mainly include a stored program area and a stored data area, wherein the stored program area can store the operating system, application programs required for at least one function (such as information collection template display function, product information release function, etc.), etc.; Store the data created according to the use of the berth status display system (such as product information collection templates corresponding to different product types, product information that different product providers need to publish, etc.), etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory such as hard disks, internal memory, plug-in hard disks, smart memory cards, secure digital cards, flash memory cards, at least one magnetic disk storage device, flash memory devices, or other volatile solid-state storage devices.

If the modules/units integrated in the terminal equipment are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present invention realizes all or part of the modules/units in the system of the above-mentioned embodiments, and can also be completed by instructing the relevant hardware through a computer program, and the above-mentioned computer program can be stored in a computer-readable storage medium, the When the computer program is executed by the processor, the functions of the above-described various system embodiments can be realized.

Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate forms, and the like. Computer readable media may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, removable hard disks, magnetic disks, optical discs, computer memory, read-only memory, random access memory, electrical carrier signals, telecommunication signals and software distribution media.

To sum up, the big data analysis method and big data analysis system provided by the embodiments of the present invention obtain the big data information accessed by users within the target time period; and store the big data information in a distributed database by time slices Then perform data analysis task on the big data information stored in the distributed database based on predetermined rules, and obtain the analysis result; configure the data cache server to cache the analysis result; and when obtaining the front-end data query request, According to the parameters of the data query request, the corresponding analysis result data is called in the data cache server as the query result, and the query result is output to the front end for visual display, which can reduce the burden on the processor of the big data analysis system and improve the user's access speed. , to avoid access freezes and ensure the smoothness of user access.

Alternatively, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.

The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

A big data analysis method, characterized in that the method comprises the following steps: acquiring big data information accessed by users within a target time period; storing the big data information in a distributed database by time slices; Perform data analysis tasks on the big data information in the distributed database to obtain analysis results; configure a data cache server to cache the analysis results;

When the front-end data query request is obtained, the corresponding analysis result data is called in the data cache server as the query result according to the parameters of the data query request, and the query result is output to the front-end for visual display.
The big data analysis method according to claim 1, wherein the distributed database is an HBase database.
The big data analysis method according to claim 1 or 2, wherein the step of configuring a data cache server to cache the analysis result comprises: configuring a first cache server and a second cache server; The analysis result data with a threshold is stored in the second cache server; when the number of accesses to the analysis result data in the second cache server is greater than a preset threshold, the analysis result data is transferred to the first cache server .
The big data analysis method according to claim 3, wherein the step of configuring a data cache server for caching the analysis results further comprises: eliminating the analysis result data with the earliest access time in the first cache server; The analysis result data eliminated in the first cache server is transferred to the second cache server; the analysis result data with the earliest storage time in the second cache server is cleared.
The big data analysis method according to claim 3, wherein the step of configuring a data cache server to cache the analysis results further comprises: eliminating the analysis with the least number of visits within a preset time in the first cache server result data; transfer the analysis result data eliminated in the first cache server to the second cache server; clear the analysis result data with the earliest storage time in the second cache server.
A big data analysis system, characterized in that the system comprises: an acquisition unit, which is used for acquiring big data information accessed by users within a target time period; and a storage unit, which is used for storing the big data information Stored in a distributed database by time slices; an execution unit, configured to perform a data analysis task on the big data information stored in the distributed database based on predetermined rules, and obtain analysis results; a cache unit, the The cache unit is configured to cache the analysis result configuration data cache server; and the output unit is configured to call the data cache server according to the parameters of the data query request when the front-end data query request is obtained. The corresponding analysis result data is used as the query result, and the query result is output to the front end and displayed visually.
The big data analysis system according to claim 6, wherein the cache unit comprises: a configuration module, the configuration module is used to configure the first cache server and the second cache server; a storage module, the storage module is used for for storing the analysis result data whose number of visits is not greater than a preset threshold in the second cache server; and a transfer module, the transfer module is used when the number of visits of the analysis result data in the second cache server is greater than a predetermined threshold; When the threshold is set, the analysis result data is transferred to the first cache server.
Computer equipment, characterized in that it includes a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the computer program, the implementation of the claims 1-5 steps of the method of any one of claims.
A storage medium, wherein the storage medium stores a computer program, characterized in that, when the computer program is executed by the processor, the steps of the method according to any one of claims 1-5 are implemented.