CN111506605B

CN111506605B - Data analysis method, device, equipment and computer readable storage medium

Info

Publication number: CN111506605B
Application number: CN202010257617.7A
Authority: CN
Inventors: 魏新宇; 王雪冬; 连丰; 陈建豪
Original assignee: Shang Yu Software Shenzhen Co ltd
Current assignee: Shang Yu Software Shenzhen Co ltd
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2023-07-25
Anticipated expiration: 2040-04-02
Also published as: CN111506605A

Abstract

The invention discloses a data analysis method, a device, equipment and a storage medium, wherein the method comprises the following steps: obtaining data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed; carrying out aggregation processing on the data streams to be analyzed to generate an aggregation file of the data streams to be analyzed; and carrying out set operation on the set files to obtain an analysis result of the data to be analyzed. The method and the device realize that the data to be analyzed in the database are obtained, the data to be analyzed are converted into the data stream to be analyzed, the data stream to be analyzed is subjected to aggregation processing to obtain the aggregate file, and finally the aggregate file is subjected to aggregation operation to obtain the analysis result of the data to be analyzed, so that the data query efficiency and the resource utilization rate are improved.

Description

Data analysis method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of data analysis technologies, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for analyzing data.

Background

The existing data analysis method comprises a direct query database analysis method and a distributed cluster data analysis method, wherein the direct query database analysis method directly queries the user data to be analyzed, so that data analysis is performed, the database query speed is reduced due to the fact that a large amount of user data exist in the database, the database system is seriously crashed, in addition, each time the database query is performed, a new query result cannot be cached, and therefore machine performance is consumed and query resources are wasted. According to the distributed cluster data analysis method, a distributed cluster data analysis system needs to be built, a plurality of high-performance servers are needed to operate at the same time, the distributed cluster data analysis system needs to be used, and additional distributed computing frames and programming modes need to be learned, so that resource waste is caused. As can be seen from this, the conventional data analysis method has low data query efficiency and low resource utilization.

Disclosure of Invention

The invention mainly aims to provide a data analysis method, a device, equipment and a storage medium, and aims to solve the technical problems of low data query efficiency and low resource utilization rate in the prior art.

In order to achieve the above object, the present invention provides a data analysis method, including the steps of:

obtaining data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed;

carrying out aggregation processing on the data streams to be analyzed to generate an aggregation file of the data streams to be analyzed;

and carrying out set operation on the set files to obtain an analysis result of the data to be analyzed.

Preferably, the step of performing a collective operation on the collective file to obtain an analysis result of the data to be analyzed includes:

acquiring a query condition, determining dimension information in the query condition, and acquiring the set file corresponding to a name field in the dimension information;

decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;

and carrying out set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.

Preferably, the step of obtaining the analysis result of the data to be analyzed by performing a set operation on the dimension set through a statistical logic operation includes:

and performing bit operation on the dimension set according to the logic conditions in the query conditions to obtain an analysis result of the data to be analyzed.

Preferably, the step of performing aggregate processing on the data stream to be analyzed to generate an aggregate file of the data stream to be analyzed includes:

acquiring dimension information in the data stream to be analyzed, and creating a dimension set corresponding to the dimension information;

acquiring a data identifier corresponding to the dimension information, and inquiring a data bit corresponding to the data identifier in the dimension set;

and assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format.

Preferably, the step of obtaining the dimension information in the data stream to be analyzed and creating the dimension set corresponding to the dimension information includes:

acquiring dimension information in the data stream to be analyzed, and detecting whether a dimension set corresponding to the dimension information exists or not;

if the dimension set corresponding to the dimension information is detected to exist, loading the dimension set corresponding to the dimension information as an initial dimension set;

if the fact that the dimension set corresponding to the dimension information does not exist is detected, the dimension set corresponding to the dimension information is created.

Preferably, the step of obtaining the data to be analyzed in the database and converting the data to be analyzed into the data stream to be analyzed includes:

and acquiring data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed carrying data identification and dimension information through a preset data reading tool.

Preferably, after the step of performing the aggregate operation on the aggregate file to obtain the analysis result of the data to be analyzed, the method further includes:

and caching the analysis result into the database, and after obtaining the query condition same as the analysis result, sending the analysis result to a terminal device so that the terminal device can output the analysis result after receiving the analysis result.

In order to achieve the above object, the present invention provides a data analysis device including:

the acquisition module is used for acquiring data to be analyzed in the database;

the conversion module is used for converting the data to be analyzed into a data stream to be analyzed;

the generation module is used for carrying out aggregation processing on the data streams to be analyzed and generating an aggregation file of the data streams to be analyzed;

and the operation module is used for carrying out set operation on the set files to obtain an analysis result of the data to be analyzed.

In addition, in order to achieve the above object, the present invention also provides a data analysis apparatus including a memory, a processor, and an analysis program of data stored on the memory and running on the processor, which when executed by the processor, implements the steps of the data analysis method as described above.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an analysis program of data, which when executed by a processor, implements the steps of the data analysis method as described above.

The method and the device realize that the data to be analyzed in the database are obtained, the data to be analyzed are converted into the data stream to be analyzed, the data stream to be analyzed is subjected to aggregation treatment to obtain an aggregate file, and finally the aggregate file is subjected to aggregation operation to obtain an analysis result of the data to be analyzed. Therefore, in the process of data analysis, the invention utilizes the collective operation to operate the collective file, thereby obtaining the analysis result of the data to be analyzed. The collective operations are very fast in the server operation process, thereby improving the efficiency of data query. In the process of collection operation, the used collection files are obtained through collection processing, and the database is not required to be queried again, so that the occupied query resources are reduced, and the resource utilization rate is improved.

Drawings

FIG. 1 is a flow chart of a first embodiment of a method for analyzing data according to the present invention;

FIG. 2 is a schematic diagram showing a preferred structure of the data analysis device of the present invention;

FIG. 3 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 4 is an analytical schematic of a first embodiment of the method for analyzing data according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a data analysis method, referring to fig. 1, fig. 1 is a flow chart of a first embodiment of the data analysis method of the invention.

The invention provides a data analysis method, referring to fig. 4, fig. 4 is an analysis schematic diagram of a first embodiment of the data analysis method of the invention.

The embodiments of the present invention provide embodiments of a method of analyzing data, it should be noted that although a logic sequence is shown in the flowchart, under certain data, the steps shown or described may be performed in a different order than that shown or described herein.

The data analysis method comprises the following steps:

step S10, data to be analyzed in a database are obtained, and the data to be analyzed are converted into data streams to be analyzed.

The server acquires data to be analyzed in a server database, and then converts the data to be analyzed into a data stream to be analyzed. The server may obtain the data to be analyzed in the database through PHP (hypertext preprocessor) and sql dataadapter (database adapter), and the implementation does not limit the manner of obtaining the data to be analyzed. There are many databases, including MySQL (database management system) database, kafka (distributed publish-subscribe messaging system) database, etc., and the present implementation is not limited to the form of the database. A data stream is an ordered set of byte data sequences having a start point and an end point.

In this embodiment, only one server is needed for the data analysis process.

The step S10 includes:

step a, obtaining data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed carrying data identification and dimension information through a preset data reading tool.

Specifically, the server acquires data to be analyzed in a server database, and then reads dimension information and a data identifier in the data to be analyzed through a data reading tool preset by the server, and converts the dimension information and the data identifier into a data stream to be analyzed with a similar query character string specific structure. The data reading tool is a virtual tool in the server, and comprises two parts of data reading and data conversion, wherein the data reading is to read out the data preset in the data reading tool by the server, and the data conversion is to convert the read data into a data stream with a specific structure. The dimension information includes channel information and version information, i.e., data to be analyzed, which is recorded by the user under what channel and version conditions, and the data identification refers to the ID (Identity document, identification number) of the user, and is unique. The specific structure of the similar query string is uid=xxx & chn=xxx & ver=xxx, uid (User Identification ) is data identification, chn (domain name) is channel information, and ver (Version) is Version information.

In this embodiment, for example, the data identifier of the data to be analyzed is 123, the channel information is 02, the version information is 330, and the data stream to be analyzed of the data to be analyzed similar to the specific structure of the query string is uid=123 & chn=02 & ver=330.

And step S20, carrying out aggregation processing on the data streams to be analyzed to generate an aggregation file of the data streams to be analyzed.

The server carries out structure splitting on the data stream to be analyzed with a specific structure similar to the query character string to obtain data information of each section of character string structure, and then respectively carries out aggregation processing on the data information to generate an aggregation file of the data stream to be analyzed. The structure splitting refers to splitting the data stream to be analyzed with the specific structure similar to the query string according to the structure of the string section, for example, the data stream to be analyzed with the specific structure similar to the query string is uid=123 & chn=02 & ver=330, and after the structure splitting, the data information of each section of string structure is uid=123, chn=02, ver=330. The set file is a file storing a large number of data bits, each set file contains a plurality of data bits, and each section of character string structure has corresponding data bits. The data bit is a group of ordered byte data sequences with a starting point and an ending point, a numerical value such as a binary numerical value, a boolean numerical value, a character numerical value and the like is stored, the binary numerical value is most commonly used, the data bit is represented by binary numerical values of 0 and 1, the 1 represents that the set file contains the character string structure, the 0 represents that the set file does not contain the character string structure, and the default value of the data bit is 0. In this embodiment, for example, the data identifier uid=123 of the data stream to be analyzed with a similar query string specific structure corresponds to the number 2 of the data bit in the aggregate file, and if the value on the number 2 is "1", it indicates that the aggregate file contains the user with uid=123, and if the value on the number 2 is "0", it indicates that the aggregate file does not contain the user with uid=123.

It should be noted that the number of data bits of each aggregate file is determined according to the requirement of the server.

The step S20 includes:

step b, acquiring dimension information in the data stream to be analyzed, and creating a dimension set corresponding to the dimension information;

step c, acquiring a data identifier corresponding to the dimension information, and inquiring a data bit corresponding to the data identifier in the dimension set;

and d, assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format.

The server obtains dimension information in the data stream to be analyzed, creates a corresponding dimension set according to the dimension information, obtains a data identifier corresponding to the dimension information in the data stream to be analyzed, inquires data bits of the data identifier in the dimension set, assigns values in the data bits, and finally compresses the assigned dimension set into a set file with a specific suffix script format. The dimension set includes a channel set, a version set, and the like, and the specific suffix script format is various, such as list, bsz, set, and the like, and the embodiment is not limited to the form of the specific suffix script format. The aggregate file and the dimension aggregate are corresponding, for example, the dimension aggregate comprises a channel aggregate and a version aggregate, and the aggregate file comprises a channel aggregate file and a version aggregate file. Compression techniques include space compression, storage system compression, and snappy (a development kit of C Plus compression and decompression) compression, etc., and the present implementation is not limited to the form of compression techniques.

In this example, for example, the data stream to be analyzed is uid=123 & chn=02 & ver=330, the dimension information includes channel information chn=02 and version information ver=330, the specific suffix script format is bsz, and the compression technique is snappy compression. The server obtains dimension information of a data stream to be analyzed as channel information chn=02 and version information ver=330, then creates chn=02 channel set and ver=330 version set respectively, wherein data bits of the chn=02 channel set and ver=330 version set are 100 bits, the data bits adopt binary values, then obtains data marks corresponding to the channel information chn=02 and the version information ver=330 as uid=123, inquires that the data bits of the data marks of the uid=123 in the chn=02 channel set and ver=330 version set are serial numbers 2, the server assigns the data bits with serial numbers of 2 in the chn=02 channel set and ver=330 version set as '1', and after assigning, the server compresses the assigned chn=02 channel set and ver=330 version set into a chn_02.Bsz set file and a ver_330.Bsz version set file through a snappy compression technology.

Further, the step b includes:

step e, acquiring dimension information in the data stream to be analyzed, and detecting whether a dimension set corresponding to the dimension information exists or not;

f, if the fact that the dimension set corresponding to the dimension information exists is detected, loading the dimension set corresponding to the dimension information as an initial dimension set;

and g, if the fact that the dimension set corresponding to the dimension information does not exist is detected, creating the dimension set corresponding to the dimension information.

Specifically, the server acquires dimension information in the data stream to be analyzed, before creating a dimension set corresponding to the dimension information, the server detects whether the dimension set corresponding to the dimension information exists in a server database according to the dimension information, if the dimension set corresponding to the dimension information exists, the server directly loads the dimension set as an initial dimension set without creating a new dimension set again, and if the dimension set corresponding to the dimension information does not exist, the server needs to create the dimension set corresponding to the dimension information.

In this embodiment, for example, the data stream to be analyzed is uid=123 & chn=02 & ver=330, the dimensional information acquired by the server is channel information chn=02 and version information ver=330, if the server detects that the chn=02 channel set exists in the server database, the server directly loads the channel set as an initial set, without creating a chn=02 channel set again, and if the server detects that the ver=330 version set does not exist in the server database, a new ver=330 version set needs to be created.

And step S30, carrying out collection operation on the collection file to obtain an analysis result of the data to be analyzed.

The server performs set operation on the set files through a preset operation method to obtain operation results, and then analyzes the operation results to obtain analysis results of data to be analyzed. The preset operation method includes a bit operation method, a boolean operation method, etc., and the present embodiment is not limited to the form of the preset operation method.

The step S30 further includes:

step h, acquiring query conditions, determining dimension information in the query conditions, and acquiring the set file corresponding to a name field in the dimension information;

step i, decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;

and j, performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.

Specifically, before the server obtains the query condition, the user inputs the query condition in the display of the terminal device, after the terminal device detects the query condition, the query condition is inserted into a form and the form is sent to the server, after the server receives the form, the form is analyzed, the query condition in the form is obtained, then dimension information in the query condition is distinguished from other information according to the structure of data in the query condition through a data structure filter, so that the dimension information in the query condition is determined, then a name field in the dimension information is determined through a character acquirer, an aggregate file corresponding to the name field is obtained in a server database, then the aggregate file is decompressed through a decompression technology, a dimension aggregate corresponding to the name field is obtained, and then the aggregate operation is carried out on the dimension aggregate through the statistical logic operation of the server, so that the analysis result of the data to be analyzed is obtained.

The data structure filter is a device for distinguishing data according to the structure of the data, for example, the query condition includes data such as "uid=123 & chn=02 & ver=330", "i=1" and "a= 'I want to'", and the data structure filter classifies "uid=123 & chn=02 & ver=330" as one type and "i=1" and "a= 'I want to'" as one type according to the structure of the data. The character acquirer is a device that acquires character information in a character string, for example, the character string is uid=123 & chn=02 & ver=330 ", and the characters" uid=123 "," chn=02 "and" ver=330 "acquired by the character acquirer. Forms include a form and a post form, and the form and post are modes in which the terminal device transmits data to the server, and the form is not limited in this embodiment. The decompression technique includes zip (file format for data compression and document storage), gzip (file compression program) and tar (compression packaging tool) techniques, and the like, and the present embodiment does not limit the form of the decompression technique. The statistical logical operation includes bit operation, intersection operation, complement operation, difference operation, etc., and the present embodiment does not limit the form of the statistical logical operation.

It should be noted that the data structure filter and the character acquirer are virtual devices in the server.

In this embodiment, for example, there are "uid=123 & chn=02 & ver=330", "i=1" and "a= 'I want to'" in the query condition, the server classifies "uid=123 & chn=02 & ver=330" into one type according to the data structure by the data structure filter, determines that the dimension information in "uid=123 & chn=02 & ver=330" is "chn=02 & ver=330", acquires name fields "chn=02" and "ver=330" in "chn=02 & ver=330" by the character acquirer, then acquires chn_02.Bsz set files and ver=330. Bsz set files in the server database, obtains chn=02 channel set and ver=330 version set after decompression, and finally obtains the analysis result set by statistical logic operation of the server to-be-analyzed set of chn=02 and ver=330.

Further, the step j includes:

and step k, performing bit operation on the dimension set according to the logic conditions in the query conditions to obtain an analysis result of the data to be analyzed.

Specifically, the server performs bit operation on each corresponding data bit in the obtained dimension set according to the logic condition in the query condition to obtain an operation result of bit operation, counts the operation result, counts the number of matching operation results and unmatched operation results, and analyzes the number of matching operation results and unmatched operation results to obtain an analysis result obtained in the query condition. In this example, for example, if the query condition is that the number of users existing under the conditions of chn=02 and ver=330 is obtained, the number of users is the analysis result obtained by the query condition. Wherein the logic conditions are AND, OR, NOT and the like.

In this embodiment, for example, the query condition is to obtain the number of users present under the conditions of chn=02 and ver=330, and the data bit is a binary value. And the server performs intersection operation on each corresponding data bit in the chn=02 channel set and the ver=330 version set according to the logic condition of the query condition, the corresponding data bit is 1 at the same time, the result obtained after intersection is 1, namely the operation result is matched, otherwise, the operation result is 0, namely the operation result is not matched, and finally the number of the operation result is 1 is counted, namely the analysis result to be obtained. If the number of "1" s in the statistical result is 1, it is explained that the number of users existing under the conditions of chn=02 and ver=330 is 1.

According to the embodiment, the data to be analyzed in the database is obtained, the data to be analyzed is converted into the data stream to be analyzed, the data stream to be analyzed is subjected to aggregation processing to obtain an aggregate file, and finally the aggregate file is subjected to aggregation operation through bit operation of the server to obtain an analysis result of the data to be analyzed. Therefore, in the data analysis process, the method utilizes the bit operation of the server to perform the set operation on the set file, so as to obtain the analysis result of the data to be analyzed. The collective operations are very fast in the server operation process, thereby improving the efficiency of data query. In the process of collection operation, the used collection files are obtained through collection processing, and the database is not required to be queried again, so that the occupied query resources are reduced, and the resource utilization rate is improved.

Further, a second embodiment of the method for analyzing data according to the present invention is presented.

The second embodiment of the data analysis method is different from the first embodiment of the data analysis method in that the data analysis method further includes:

and step l, caching the analysis result into the database, and after obtaining the same query condition as the analysis result, sending the analysis result to a terminal device so that the terminal device can output the analysis result after receiving the analysis result.

Specifically, after the server obtains the analysis result of the data to be analyzed, the analysis result is cached in a database of the server, after the next time the server receives the same query condition sent by the terminal equipment, the analysis result is directly sent to the terminal equipment, after the terminal equipment receives the analysis result, the analysis result is output, and the user obtains the required data according to the output analysis result.

In this embodiment, for example, the query condition is that the number of users present under the conditions of chn=02 and ver=330 is obtained, and the analysis result after the aggregation operation is that the number of users present under the conditions of chn=02 and ver=330 is 1. The server acquires the number of users existing under the conditions of chn=02 and ver=330 again, and directly sends an analysis result of "1" to the terminal equipment, and the terminal equipment outputs the analysis result, so that the user obtains the required data as "1".

The embodiment caches the analysis result in the database, and after obtaining the query condition same as the analysis result, the embodiment sends the analysis result to the terminal device, and after receiving the analysis result, the terminal device outputs the analysis result. Therefore, after receiving the same query condition again, the server directly sends the analysis result cached in the database to the terminal equipment, and the user can quickly obtain the required data according to the analysis data output by the terminal equipment without going through the steps in the first embodiment again, so that the data query efficiency is improved, and the resource utilization rate is improved.

In addition, the present invention also provides an apparatus for analyzing data, referring to fig. 2, the apparatus for analyzing data including:

the acquisition module 10 is used for acquiring data to be analyzed in the database;

a conversion module 20, configured to convert the data to be analyzed into a data stream to be analyzed;

the generating module 30 is configured to perform aggregate processing on the data stream to be analyzed, and generate an aggregate file of the data stream to be analyzed;

and the operation module 40 is used for carrying out collective operation on the collective file to obtain an analysis result of the data to be analyzed.

Further, the operation module 40 further includes:

the first acquisition unit is used for acquiring the query condition;

a determining unit, configured to determine dimension information in the query condition;

the first obtaining unit is further configured to obtain the set file corresponding to a name field in the dimension information;

the decompression unit is used for decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;

and the operation unit is used for carrying out set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.

Further, the operation unit is further configured to perform a bit operation on the dimension set according to a logic condition in the query condition;

the determining unit is also used for obtaining an analysis result of the data to be analyzed.

Further, the generating module 30 further includes:

the second acquisition unit is used for acquiring dimension information in the data stream to be analyzed;

the creating unit is used for creating a dimension set corresponding to the dimension information;

the second obtaining unit is further used for obtaining a data identifier in the data stream to be analyzed;

the inquiring unit is used for inquiring the data bit corresponding to the data identifier in the dimension set;

an assignment unit configured to assign values in the data bits;

and the compression unit is used for compressing the assigned dimension set into a set file in a specific script format.

Further, the creation unit further includes:

the acquisition subunit is used for acquiring dimension information in the data stream to be analyzed;

a detection subunit, configured to detect whether a dimension set corresponding to the dimension information exists;

the loading subunit is used for loading the dimension set corresponding to the dimension information as an initial dimension set if the dimension set corresponding to the dimension information is detected to exist;

the creation subunit is configured to create a dimension set corresponding to the dimension information if it is detected that the dimension set corresponding to the dimension information does not exist.

Further, the conversion module 20 further includes:

the third acquisition unit is used for acquiring data to be analyzed in the database;

the conversion unit is used for converting the data to be analyzed into a data stream to be analyzed carrying data identification and dimension information through a preset data reading tool.

Further, the data analysis device further includes:

the caching module is used for caching the analysis result into the database;

and the sending module is used for sending the analysis result to the terminal equipment after obtaining the query condition which is the same as the analysis result, so that the terminal equipment can output the analysis result after receiving the analysis result.

The specific implementation manner of the data-based analysis device of the present invention is substantially the same as that of each embodiment of the data-based analysis method, and will not be described herein.

In addition, the invention also provides a data analysis device. As shown in fig. 3, fig. 3 is a schematic structural diagram of a hardware running environment according to an embodiment of the present invention.

It should be noted that fig. 3 is a schematic structural diagram of a hardware operating environment of the data analysis device.

Fig. 3 is a schematic structural diagram of a hardware operating environment of the data analysis device.

As shown, the analysis device of the data may include: a processor 1001, such as a CPU, memory 1005, user interface 1003, network interface 1004, communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Optionally, the data analysis device may further include an RF (Radio Frequency) circuit, a sensor, a WiFi module, and the like.

It will be appreciated by those skilled in the art that the structure of the data analysis device shown in fig. 3 does not constitute a limitation of the data analysis device, and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components.

As shown in fig. 3, an operating system, a network communication module, a user interface module, and an analysis program of data may be included in the memory 1005, which is a type of computer storage medium. The operating system is a program for managing and controlling hardware and software resources of the analysis device of the data, and supports the operation of analysis programs of the data and other software or programs.

In the analysis device of the data shown in the figure, the user interface 1003 is mainly used for a terminal device of a user, so that the user can input query conditions to the server and/or display analysis results returned by the server; the network interface 1004 is mainly used for a server to communicate data with a user terminal; the processor 1001 may be used to call an analysis program of data stored in the memory 1005 and execute the steps of the control method of the analysis apparatus of data as described above.

The specific implementation manner of the data analysis device of the present invention is basically the same as that of each embodiment of the data analysis method, and will not be described herein.

In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a data analysis program, and the data analysis program realizes the steps of the data analysis method when being executed by a processor.

The specific embodiment of the computer readable storage medium of the present invention is substantially the same as each embodiment of the data analysis method, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above embodiment method may be implemented by means of software plus necessary general hardware platform, or of course by means of hardware, but the former is a preferred embodiment under many data. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part contributing to the prior art in the form of software goods stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing an analysis device of data to perform the method according to the embodiments of the present invention.

Claims

1. A method of analyzing data, the method comprising the steps of:

performing set operation on the set files to obtain an analysis result of the data to be analyzed;

the step of performing the aggregate operation on the aggregate file to obtain the analysis result of the data to be analyzed includes:

receiving and analyzing a query form sent by a user terminal to obtain query conditions;

determining dimension information in the query condition according to the structure of the data in the query condition and a preset data structure filter;

acquiring and decompressing the set file corresponding to the name field in the dimension information to obtain a dimension set corresponding to the name field;

performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed, wherein the statistical logic operation comprises the following steps: bit operation, intersection operation, complement operation and difference operation;

the step of performing aggregation processing on the data stream to be analyzed to generate an aggregate file of the data stream to be analyzed comprises the following steps:

assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format;

the step of obtaining dimension information in the data stream to be analyzed and creating a dimension set corresponding to the dimension information includes:

2. The method for analyzing data according to claim 1, wherein the step of performing a set operation on the set of dimensions by a statistical logical operation to obtain an analysis result of the data to be analyzed comprises:

3. The method for analyzing data according to claim 1, wherein the step of obtaining data to be analyzed in the database and converting the data to be analyzed into a data stream to be analyzed comprises:

4. The method for analyzing data according to any one of claims 1 to 3, wherein after the step of performing a collection operation on the collection file to obtain the analysis result of the data to be analyzed, the method further comprises:

5. A data analysis device, characterized in that the data analysis device comprises:

the operation module is used for carrying out collection operation on the collection file to obtain an analysis result of the data to be analyzed;

the operation module is also used for receiving and analyzing the query form sent by the user terminal to obtain the query condition; determining dimension information in the query condition according to the structure of the data in the query condition and a preset data structure filter; acquiring and decompressing the set file corresponding to the name field in the dimension information to obtain a dimension set corresponding to the name field; performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed, wherein the statistical logic operation comprises the following steps: bit operation, intersection operation, complement operation and difference operation;

the generating module is further configured to obtain dimension information in the data stream to be analyzed, and create a dimension set corresponding to the dimension information; acquiring a data identifier corresponding to the dimension information, and inquiring a data bit corresponding to the data identifier in the dimension set; assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format;

the generating module is further used for acquiring dimension information in the data stream to be analyzed and detecting whether a dimension set corresponding to the dimension information exists or not; if the dimension set corresponding to the dimension information is detected to exist, loading the dimension set corresponding to the dimension information as an initial dimension set; if the fact that the dimension set corresponding to the dimension information does not exist is detected, the dimension set corresponding to the dimension information is created.

6. A data analysis device, characterized in that it comprises a memory, a processor and a data analysis program stored on the memory and running on the processor, which data analysis program, when executed by the processor, implements the steps of the data analysis method according to any one of claims 1 to 4.

7. A computer-readable storage medium, on which an analysis program of data is stored, which when executed by a processor, implements the steps of the data analysis method according to any one of claims 1 to 4.