CN111506605A - Data analysis method, device, equipment and computer readable storage medium - Google Patents

Data analysis method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN111506605A
CN111506605A CN202010257617.7A CN202010257617A CN111506605A CN 111506605 A CN111506605 A CN 111506605A CN 202010257617 A CN202010257617 A CN 202010257617A CN 111506605 A CN111506605 A CN 111506605A
Authority
CN
China
Prior art keywords
data
analyzed
dimension
analysis result
dimension information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010257617.7A
Other languages
Chinese (zh)
Other versions
CN111506605B (en
Inventor
魏新宇
王雪冬
连丰
陈建豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shang Yu Software Shenzhen Co ltd
Original Assignee
Shang Yu Software Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shang Yu Software Shenzhen Co ltd filed Critical Shang Yu Software Shenzhen Co ltd
Priority to CN202010257617.7A priority Critical patent/CN111506605B/en
Publication of CN111506605A publication Critical patent/CN111506605A/en
Application granted granted Critical
Publication of CN111506605B publication Critical patent/CN111506605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data analysis method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed; performing set processing on the data stream to be analyzed to generate a set file of the data stream to be analyzed; and performing set operation on the set file to obtain an analysis result of the data to be analyzed. The method and the device realize that the analysis result of the data to be analyzed is obtained by acquiring the data to be analyzed in the database, converting the data to be analyzed into the data stream to be analyzed, then performing set processing on the data stream to be analyzed to obtain the set file, and finally performing set operation on the set file, thereby improving the data query efficiency and the resource utilization rate.

Description

Data analysis method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of data analysis technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for data analysis.
Background
The prior data analysis method comprises a direct query database analysis method and a distributed cluster data analysis method, wherein the direct query database analysis method is to directly perform database query on user data to be analyzed so as to perform data analysis, the database query speed is reduced due to the existence of a large amount of user data, a database system can be seriously crashed, and each time the database query is performed is a new query, the last query result cannot be cached, so that the machine performance is consumed and the query resources are wasted. According to the distributed cluster data analysis method, a distributed cluster data analysis system needs to be built, a plurality of high-performance servers need to operate, the distributed cluster data analysis system needs to be used, and extra distributed computing frames and programming modes need to be learned, so that resource waste is caused. Therefore, the data query efficiency and the resource utilization rate of the conventional data analysis method are low.
Disclosure of Invention
The invention mainly aims to provide a data analysis method, a data analysis device, data analysis equipment and a data analysis storage medium, and aims to solve the technical problems of low data query efficiency and low resource utilization rate in the prior art.
In order to achieve the above object, the present invention provides a method for analyzing data, the method comprising the steps of:
acquiring data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed;
performing set processing on the data stream to be analyzed to generate a set file of the data stream to be analyzed;
and performing set operation on the set file to obtain an analysis result of the data to be analyzed.
Preferably, the step of performing a set operation on the set file to obtain an analysis result of the data to be analyzed includes:
acquiring a query condition, determining dimension information in the query condition, and acquiring the aggregate file corresponding to a name field in the dimension information;
decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;
and performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.
Preferably, the step of performing set operation on the dimension set through a statistical logic operation to obtain an analysis result of the data to be analyzed includes:
and carrying out bit operation on the dimension set according to the logic conditions in the query conditions to obtain the analysis result of the data to be analyzed.
Preferably, the step of performing aggregation processing on the data stream to be analyzed to generate an aggregate file of the data stream to be analyzed includes:
obtaining dimension information in the data stream to be analyzed, and creating a dimension set corresponding to the dimension information;
acquiring a data identifier corresponding to the dimension information, and inquiring a data bit corresponding to the data identifier in the dimension set;
and assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format.
Preferably, the step of obtaining the dimension information in the data stream to be analyzed and creating the dimension set corresponding to the dimension information includes:
acquiring dimension information in the data stream to be analyzed, and detecting whether a dimension set corresponding to the dimension information exists or not;
if the dimension set corresponding to the dimension information is detected to exist, loading the dimension set corresponding to the dimension information as an initial dimension set;
and if the fact that the dimension set corresponding to the dimension information does not exist is detected, the dimension set corresponding to the dimension information is created.
Preferably, the step of acquiring data to be analyzed in the database and converting the data to be analyzed into a data stream to be analyzed includes:
the method comprises the steps of obtaining data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed, which carries data identification and dimension information, through a preset data reading tool.
Preferably, after the step of performing the aggregation operation on the aggregation file to obtain the analysis result of the data to be analyzed, the method further includes:
and caching the analysis result into the database, and after the query condition which is the same as the analysis result is obtained, sending the analysis result to the terminal equipment so that the terminal equipment can receive the analysis result and output the analysis result.
In order to achieve the above object, the present invention also provides a data analysis device including:
the acquisition module is used for acquiring data to be analyzed in the database;
the conversion module is used for converting the data to be analyzed into a data stream to be analyzed;
the generating module is used for performing set processing on the data stream to be analyzed to generate a set file of the data stream to be analyzed;
and the operation module is used for carrying out set operation on the set file to obtain an analysis result of the data to be analyzed.
In addition, in order to achieve the above object, the present invention also provides an apparatus for analyzing data, which includes a memory, a processor, and a program for analyzing data stored on the memory and running on the processor, wherein the program for analyzing data implements the steps of the method for analyzing data as described above when executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an analysis program of data, which when executed by a processor, implements the steps of the analysis method of data as described above.
The method and the device realize that the analysis result of the data to be analyzed is obtained by acquiring the data to be analyzed in the database, converting the data to be analyzed into the data stream to be analyzed, performing set processing on the data stream to be analyzed to obtain a set file, and finally performing set operation on the set file. Therefore, in the data analysis process, the set file is operated by using set operation, so that the analysis result of the data to be analyzed is obtained. The set operation is very rapid in the operation process of the server, so that the efficiency of data query is improved. In the process of set operation, the used set files are obtained through set processing, and the database does not need to be queried again, so that the occupied query resources are reduced, and the resource utilization rate is improved.
Drawings
FIG. 1 is a schematic flow chart of a first embodiment of a method for analyzing data according to the present invention;
FIG. 2 is a schematic diagram of a preferred structure of the data analysis device of the present invention;
FIG. 3 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 4 is an analysis diagram of a first embodiment of the method of analyzing data according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a data analysis method, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the data analysis method of the invention.
The invention provides a data analysis method, and referring to fig. 4, fig. 4 is an analysis schematic diagram of a first embodiment of the data analysis method of the invention.
While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than presented herein.
The data analysis method comprises the following steps:
and step S10, acquiring data to be analyzed in the database, and converting the data to be analyzed into a data stream to be analyzed.
The method includes the steps that a server obtains data to be analyzed in a server database and then converts the data to be analyzed into data streams to be analyzed, wherein the server can obtain the data to be analyzed in the database in a PHP (hypertext preprocessor) mode, a SqlDataAdapter (database adaptor) mode and the like, the implementation does not limit the mode of obtaining the data to be analyzed, the database has various modes, the database comprises a MySQ L (database management system) database, a Kafka (distributed publish-subscribe message system) database and the like, the implementation does not limit the form of the database, and the data streams are a set of ordered byte data sequences with a starting point and an ending point.
It should be noted that, in this embodiment, the process of data analysis only needs one server.
The step S10 includes:
step a, acquiring data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed, which carries data identification and dimension information, through a preset data reading tool.
Specifically, the server obtains data to be analyzed in a server database, reads dimension information and data identification in the data to be analyzed through a data reading tool preset by the server, and converts the dimension information and the data identification into a data stream to be analyzed with a specific structure similar to a query character string. The data reading tool is a virtual tool in the server and comprises two parts, namely data reading and data conversion, wherein the data reading is to read data preset in the data reading tool by the server, and the data conversion is to convert the read data into a data stream with a specific structure. The dimension information includes channel information and version information, that is, data to be analyzed recorded under what channel and version conditions the user records, the data identifier refers to an ID (Identity document) of the user, and the data identifier is unique. The similar query string has a specific structure of uid & xxx & ver & xxx, uid (User Identification) is a data identifier, chn (domain name) is channel information, and ver (Version) is Version information.
In this embodiment, for example, the data identifier of the data to be analyzed is 123, the channel information is 02, the version information is 330, and the data stream to be analyzed, in which the data to be analyzed is similar to the query string specific structure, is uid 123& chn 02& ver 330.
Step S20, performing aggregation processing on the data stream to be analyzed, and generating an aggregation file of the data stream to be analyzed.
The server divides the structure of the data stream to be analyzed similar to the specific structure of the query character string to obtain the data information of each section of character string structure, and then generates a set file of the data stream to be analyzed by respectively carrying out set processing on the data information. The structure splitting refers to splitting a data stream to be analyzed similar to the query string specific structure according to a structure of a string section, for example, the data stream to be analyzed similar to the query string specific structure is uid 123& chn 02& ver 330, and after the structure splitting, data information of each section of the string structure is uid 123, chn 02, or ver 330. The aggregate file is a file storing a large number of data bits, each aggregate file contains a plurality of data bits, and each section of the character string structure has corresponding data bits. The data bit is a set of ordered byte data sequence with a start point and an end point, and a value such as binary value, boolean value, character value, etc. is stored, most commonly binary values, with the data bit being represented by binary values "0" and "1", with "1" representing that the collection file contains the string structure, with "0" representing that the collection file does not contain the string structure, and with the default value of "0" for the data bit. In this embodiment, for example, the data identifier uid of the data stream to be analyzed similar to the query string specific structure is 123, which corresponds to the sequence number 2 of the data bit in the aggregate file, and if the value on the sequence number 2 is "1", it indicates that the aggregate file contains the user uid 123, and if the value on the sequence number 2 is "0", it indicates that the aggregate file does not contain the user uid 123.
It should be noted that the number of data bits of each aggregate file is determined according to the requirements of the server.
The step S20 includes:
step b, obtaining dimension information in the data stream to be analyzed, and creating a dimension set corresponding to the dimension information;
step c, acquiring a data identifier corresponding to the dimension information, and inquiring a data bit corresponding to the data identifier in the dimension set;
and d, assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format.
Specifically, the server obtains dimension information in a data stream to be analyzed, creates a corresponding dimension set according to the dimension information, then obtains a data identifier corresponding to the dimension information in the data stream to be analyzed, queries a data bit of the data identifier in the dimension set, then assigns a value in the data bit, and finally compresses the assigned dimension set into a set file with a specific suffix script format. The dimension set includes a channel set, a version set, and the like, and there are many specific suffix script formats, such as list, bsz, set, and the like. The aggregate file and the dimension set are corresponding, for example, the dimension set includes a channel set and a version set, and the aggregate file includes a channel aggregate file and a version aggregate file. The compression techniques include space compression, storage system compression, snappy (a development kit for C Plus compression and decompression), and the like, and the present implementation does not limit the form of the compression techniques.
In this example, for example, the data stream to be analyzed is uid 123& chn 02& ver 330, the dimension information includes channel information chn 02 and version information ver 330, the specific suffix script format is bsz, and the compression technique is snappy compression. The server acquires dimension information of a data stream to be analyzed, namely channel information chn 02 and version information ver 330, then creates a chn 02 channel set and a ver 330 version set respectively, wherein, the data bits of the chn-02 channel set and the ver-330 version set are 100 bits, the data bits adopt binary type values, then acquiring the data identifier uid 123 corresponding to the channel information chn-02 and the version information ver-330, inquiring that the data bit of the data identifier uid 123 in the chn-02 channel set and ver-330 version set is sequence number 2, assigning the data bit with sequence number 2 in the chn-02 channel set and ver-330 version set to "1" by the server, after assignment is completed, the server compresses the assigned chn-02 channel set and ver-330 version set into a chn-02. bsz channel set file and ver-330. bsz version set file through a snappy compression technology.
Further, the step b comprises:
step e, obtaining the dimension information in the data stream to be analyzed, and detecting whether a dimension set corresponding to the dimension information exists;
step f, if the existence of the dimension set corresponding to the dimension information is detected, loading the dimension set corresponding to the dimension information as an initial dimension set;
step g, if it is detected that the dimension set corresponding to the dimension information does not exist, creating the dimension set corresponding to the dimension information.
Specifically, the server acquires dimension information in a data stream to be analyzed, before a dimension set corresponding to the dimension information is created, the server detects whether a dimension set corresponding to the dimension information exists in a server database according to the dimension information, if the dimension set corresponding to the dimension information is detected to exist, the server directly loads the dimension set as an initial dimension set without creating a new dimension set again, and if the dimension set corresponding to the dimension information does not exist, the server needs to create the dimension set corresponding to the dimension information.
In this embodiment, for example, the data stream to be analyzed is uid 123& chn 02& ver 330, the dimension information obtained by the server is channel information chn 02 and version information ver 330, the server detects that the server database has a chn 02 channel set, directly loads the channel set as the initial set without creating a chn 02 channel set again, and the server detects that the server database does not have a ver 330 version set, and needs to create a new ver 330 version set.
And step S30, performing set operation on the set file to obtain an analysis result of the data to be analyzed.
The server performs set operation on the set files through a preset operation method to obtain operation results, and then analyzes the operation results to obtain analysis results of the data to be analyzed. The predetermined operation method includes a bit operation method, a boolean operation, and the like, and the present embodiment does not limit the form of the predetermined operation method.
The step S30 further includes:
step h, acquiring a query condition, determining dimension information in the query condition, and acquiring the aggregate file corresponding to a name field in the dimension information;
step i, decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;
and j, performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.
Specifically, before a server acquires a query condition, a user inputs the query condition in a display of the terminal device, the terminal device inserts the query condition into a form after detecting the query condition, and sends the form to the server, the server analyzes the form after receiving the form, acquires the query condition in the form, then distinguishes dimension information in the query condition from other information through a data structure filter according to a data structure in the query condition, so as to determine dimension information in the query condition, determines a name field in the dimension information through a character acquirer, acquires a set file corresponding to the name field in a server database, decompresses the set file through a decompression technology by the server, obtains a set dimension corresponding to the name field, and performs set operation on the set dimension through statistical logic operation of the server, thereby obtaining the analysis result of the data to be analyzed.
The data structure filter is a device that separates data according to the structure of the data, for example, the query condition includes data such as "uid 123& chn 02& ver 330", "I1" and "a" I wait to '", and the data structure filter classifies" uid 123& chn 02& ver 330 "into one type and" I wait to' "into one type according to the structure of the data. The character acquirer is a device that acquires character information in a character string, for example, the character string is uid 123& chn 02& ver 330 ", and the character acquirer acquires characters" uid 123 "," chn 02 ", and" ver 330 ". The form includes a form, a post form, and the like, and both the form and the post are manners for the terminal device to transmit data to the server. The decompression technology includes a zip (file format for data compression and document storage) technology, a gzip (file compression program) technology, a tar (tape archive) technology, and the like, and the present embodiment does not limit the form of the decompression technology. The statistical logic operation includes bit operation, intersection operation, complement operation, difference operation, etc., and the present embodiment does not limit the form of the statistical logic operation.
It should be noted that the data structure filter and the character acquirer are virtual devices in the server.
In the present embodiment, for example, there are data such as "uid 123& chn 02& ver 330", "I1" and "a" iwante '"in the query condition, the server classifies" uid 123& chn 02& ver 330 "into one type by the data structure filter according to the structure of the data," I1 "and" a "iwante'" into one type, determines the dimension information in "uid 123& chn 02& ver 330" as "chn 02& ver 330", acquires the name fields "chn 02" and "ver 330" by the character acquirer, acquires the chn 02& ver 330 "in the server database, then acquires the chn 02. bz files and bz files in the server database, and then performs the operation of collecting and collecting the versions by the decompression channel 330, and then performs the operation of collecting and collecting the chn 02& ver 330, thereby obtaining the analysis result of the data to be analyzed.
Further, the step j includes:
and k, performing bit operation on the dimension set according to the logic conditions in the query conditions to obtain the analysis result of the data to be analyzed.
Specifically, the server performs bit operation on each corresponding data bit in the obtained dimension set according to the logic conditions in the query conditions to obtain an operation result of the bit operation, then performs statistics on the operation result, performs statistics on the number of the operation result matching and the operation result mismatching, and then analyzes the number of the operation result matching and the operation result mismatching, thereby obtaining an analysis result obtained in advance in the query conditions. In this example, if the query condition is that the number of users exists under the conditions of chn-02 and ver-330, the number of users is the analysis result obtained by the query condition. Wherein the logic conditions are AND, OR, AND, NOT, etc.
In this embodiment, for example, the query condition is to obtain the number of users existing under the conditions of chn-02 and ver-330, and the data bit is a binary type value. And the server performs intersection operation on each corresponding data bit in the chn-02 channel set and the ver-330 version set according to the logic condition of the query condition, the corresponding data bits must be simultaneously '1', the result obtained after intersection is taken is '1', namely the operation results are matched, otherwise, the result is '0', namely the operation results are not matched, and finally, the number of the operation results which are '1' is counted, namely the analysis result to be obtained. If the number of "1" s in the statistical result is 1, the number of users present under the conditions of chn being 02 and ver being 330 is 1.
In the embodiment, the data to be analyzed in the database is acquired, the data to be analyzed is converted into the data stream to be analyzed, the data stream to be analyzed is subjected to aggregation processing to obtain the aggregate file, and finally the aggregate file is subjected to aggregation operation through bit operation of the server to obtain the analysis result of the data to be analyzed. Therefore, in the data analysis process, the set file is subjected to set operation by using the bit operation of the server, so that the analysis result of the data to be analyzed is obtained. The set operation is very rapid in the operation process of the server, so that the efficiency of data query is improved. In the process of set operation, the used set files are obtained through set processing, and the database does not need to be queried again, so that the occupied query resources are reduced, and the resource utilization rate is improved.
Further, a second embodiment of the method for analyzing data of the present invention is presented.
The second embodiment of the method for analyzing data is different from the first embodiment of the method for analyzing data in that the method for analyzing data further includes:
step l, caching the analysis result in the database, sending the analysis result to a terminal device after obtaining the query condition same as the analysis result, so that the terminal device can receive the analysis result and output the analysis result.
Specifically, after obtaining an analysis result of data to be analyzed, the server caches the analysis result in a database of the server, and after receiving the same query condition sent by the terminal device next time, the server directly sends the analysis result to the terminal device, and after receiving the analysis result, the terminal device outputs the analysis result, and the user obtains the required data according to the output analysis result.
In this embodiment, for example, the query condition is that the number of users existing under the conditions chn-02 and ver-330 is obtained, and the analysis result after the set operation is that the number of users existing under the conditions chn-02 and ver-330 is 1. When the server acquires that the number of users is equal to that under the condition that the query condition is that chn is 02 and ver is 330, the server directly transmits the analysis result "1" to the terminal device, and the terminal device outputs the analysis result, so that the user acquires that the required data is "1".
In this embodiment, the analysis result is cached in the database, the analysis result is sent to the terminal device after the query condition identical to the analysis result is obtained, and the terminal device outputs the analysis result after receiving the analysis result. Therefore, after receiving the same query condition again, the server directly sends the analysis result cached in the database to the terminal device, and the user can quickly obtain the required data according to the analysis data output by the terminal device without going through the steps in the first embodiment again, so that the efficiency of data query is improved, and the resource utilization rate is improved.
Further, the present invention provides a data analysis device, referring to fig. 2, the data analysis device including:
the acquisition module 10 is used for acquiring data to be analyzed in a database;
a conversion module 20, configured to convert the data to be analyzed into a data stream to be analyzed;
a generating module 30, configured to perform set processing on the data stream to be analyzed, and generate a set file of the data stream to be analyzed;
and the operation module 40 is configured to perform set operation on the set file to obtain an analysis result of the data to be analyzed.
Further, the operation module 40 further includes:
a first obtaining unit, configured to obtain a query condition;
the determining unit is used for determining dimension information in the query condition;
the first obtaining unit is further configured to obtain the aggregate file corresponding to a name field in the dimension information;
the decompression unit is used for decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;
and the operation unit is used for performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.
Further, the operation unit is further configured to perform a bit operation on the dimension set according to a logic condition in the query condition;
the determining unit is further configured to obtain an analysis result of the data to be analyzed.
Further, the generating module 30 further includes:
the second acquisition unit is used for acquiring the dimension information in the data stream to be analyzed;
the creating unit is used for creating a dimension set corresponding to the dimension information;
the second obtaining unit is further configured to obtain a data identifier in the data stream to be analyzed;
the query unit is used for querying the corresponding data bit of the data identifier in the dimension set;
an assignment unit for assigning values in the data bits;
and the compression unit is used for compressing the assigned dimension set into a set file in a specific script format.
Further, the creating unit further includes:
the acquisition subunit is used for acquiring the dimension information in the data stream to be analyzed;
the detection subunit is used for detecting whether a dimension set corresponding to the dimension information exists or not;
a loading subunit, configured to, if it is detected that a dimension set corresponding to the dimension information exists, load the dimension set corresponding to the dimension information as an initial dimension set;
and the creating subunit is configured to create a dimension set corresponding to the dimension information if it is detected that the dimension set corresponding to the dimension information does not exist.
Further, the conversion module 20 further includes:
the third acquisition unit is used for acquiring data to be analyzed in the database;
and the conversion unit is used for converting the data to be analyzed into a data stream to be analyzed, which carries the data identification and the dimension information, through a preset data reading tool.
Further, the data analysis device further includes:
the cache module is used for caching the analysis result into the database;
and the sending module is used for sending the analysis result to the terminal equipment after the query condition which is the same as the analysis result is obtained, so that the terminal equipment can receive the analysis result and output the analysis result.
The specific implementation of the data-based analysis apparatus of the present invention is substantially the same as that of each embodiment of the data-based analysis method, and is not described herein again.
In addition, the invention also provides a data analysis device. As shown in fig. 3, fig. 3 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a hardware operating environment of a data analysis device.
Fig. 3 is a schematic diagram of a hardware operating environment of a data analysis device.
As shown, the data analysis device may include: a processor 1001, such as a CPU, a memory 1005, a user interface 1003, a network interface 1004, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the data analysis device may further include RF (Radio Frequency) circuits, sensors, WiFi modules, and the like.
Those skilled in the art will appreciate that the data analysis device configuration shown in FIG. 3 does not constitute a limitation of the data analysis device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 3, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an analysis program of data. Among them, the operating system is a program that manages and controls hardware and software resources of the analyzing apparatus of data, an analyzing program that supports data, and the execution of other software or programs.
In the data analysis device shown in the figure, the user interface 1003 is mainly used for a terminal device of a user, so that the user can input query conditions to the server and/or display analysis results returned by the server; the network interface 1004 is mainly used for a server, and performs data communication with a user terminal; the processor 1001 may be used to call an analysis program of data stored in the memory 1005 and execute the steps of the control method of the analysis apparatus of data as described above.
The specific implementation of the data analysis device of the present invention is substantially the same as the embodiments of the data analysis method described above, and is not described herein again.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an analysis program of data is stored, and when the analysis program of data is executed by a processor, the steps of the data analysis method described above are implemented.
The specific implementation manner of the computer-readable storage medium of the present invention is substantially the same as that of the embodiments of the data analysis method described above, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation manner in many cases. Based on such understanding, the technical solution of the present invention may be embodied in the form of software goods, which are stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and include instructions for causing a data analysis device to execute the method according to the embodiments of the present invention.

Claims (10)

1. A method for analyzing data, comprising the steps of:
acquiring data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed;
performing set processing on the data stream to be analyzed to generate a set file of the data stream to be analyzed;
and performing set operation on the set file to obtain an analysis result of the data to be analyzed.
2. The method for analyzing data according to claim 1, wherein the step of performing a set operation on the set file to obtain an analysis result of the data to be analyzed comprises:
acquiring a query condition, determining dimension information in the query condition, and acquiring the aggregate file corresponding to a name field in the dimension information;
decompressing the set file corresponding to the name field to obtain a dimension set corresponding to the name field;
and performing set operation on the dimension set through statistical logic operation to obtain an analysis result of the data to be analyzed.
3. The method for analyzing data according to claim 2, wherein the step of performing a set operation on the dimension set through a statistical logic operation to obtain an analysis result of the data to be analyzed comprises:
and carrying out bit operation on the dimension set according to the logic conditions in the query conditions to obtain the analysis result of the data to be analyzed.
4. The method for analyzing data according to claim 1, wherein the step of performing an aggregation process on the data stream to be analyzed to generate an aggregation file of the data stream to be analyzed comprises:
obtaining dimension information in the data stream to be analyzed, and creating a dimension set corresponding to the dimension information;
acquiring a data identifier corresponding to the dimension information, and inquiring a data bit corresponding to the data identifier in the dimension set;
and assigning values in the data bits, and compressing the assigned dimension set into a set file in a specific script format.
5. The method for analyzing data according to claim 4, wherein the step of obtaining the dimension information in the data stream to be analyzed and creating the dimension set corresponding to the dimension information comprises:
acquiring dimension information in the data stream to be analyzed, and detecting whether a dimension set corresponding to the dimension information exists or not;
if the dimension set corresponding to the dimension information is detected to exist, loading the dimension set corresponding to the dimension information as an initial dimension set;
and if the fact that the dimension set corresponding to the dimension information does not exist is detected, the dimension set corresponding to the dimension information is created.
6. The method for analyzing data according to claim 1, wherein the step of obtaining the data to be analyzed in the database and converting the data to be analyzed into the data stream to be analyzed comprises:
the method comprises the steps of obtaining data to be analyzed in a database, and converting the data to be analyzed into a data stream to be analyzed, which carries data identification and dimension information, through a preset data reading tool.
7. The method for analyzing data according to any one of claims 1 to 6, wherein after the step of performing a set operation on the set file to obtain an analysis result of the data to be analyzed, the method further comprises:
and caching the analysis result into the database, and after the query condition which is the same as the analysis result is obtained, sending the analysis result to the terminal equipment so that the terminal equipment can receive the analysis result and output the analysis result.
8. An apparatus for analyzing data, comprising:
the acquisition module is used for acquiring data to be analyzed in the database;
the conversion module is used for converting the data to be analyzed into a data stream to be analyzed;
the generating module is used for performing set processing on the data stream to be analyzed to generate a set file of the data stream to be analyzed;
and the operation module is used for carrying out set operation on the set file to obtain an analysis result of the data to be analyzed.
9. An apparatus for analyzing data, characterized in that the apparatus comprises a memory, a processor and a program for analyzing data stored on the memory and running on the processor, the program for analyzing data implementing the steps of the method for analyzing data according to any one of claims 1 to 7 when executed by the processor.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an analysis program of data, which when executed by a processor implements the steps of the analysis method of data according to any one of claims 1 to 7.
CN202010257617.7A 2020-04-02 2020-04-02 Data analysis method, device, equipment and computer readable storage medium Active CN111506605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010257617.7A CN111506605B (en) 2020-04-02 2020-04-02 Data analysis method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010257617.7A CN111506605B (en) 2020-04-02 2020-04-02 Data analysis method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111506605A true CN111506605A (en) 2020-08-07
CN111506605B CN111506605B (en) 2023-07-25

Family

ID=71871835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010257617.7A Active CN111506605B (en) 2020-04-02 2020-04-02 Data analysis method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111506605B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282425A1 (en) * 2005-04-20 2006-12-14 International Business Machines Corporation Method and apparatus for processing data streams
US20120023380A1 (en) * 2010-07-21 2012-01-26 Fujitsu Limited Algorithmic matching of a deskew channel
CN106372240A (en) * 2016-09-14 2017-02-01 北京搜狐新动力信息技术有限公司 Method and device for data analysis
CN106407290A (en) * 2016-08-29 2017-02-15 北京首信科技股份有限公司 Method for efficiently calculating multi-dimensional user number from massive data
US20170063723A1 (en) * 2015-08-26 2017-03-02 International Business Machines Corporation Asset arrangement management for a shared pool of configurable computing resources associated with a streaming application
CN107634848A (en) * 2017-08-07 2018-01-26 上海天旦网络科技发展有限公司 A kind of system and method for collection analysis network equipment information
US9892020B1 (en) * 2016-03-11 2018-02-13 Signalfx, Inc. User interface for specifying data stream processing language programs for analyzing instrumented software
US20190007206A1 (en) * 2017-06-30 2019-01-03 Microsoft Technology Licensing, Llc Encrypting object index in a distributed storage environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282425A1 (en) * 2005-04-20 2006-12-14 International Business Machines Corporation Method and apparatus for processing data streams
US20120023380A1 (en) * 2010-07-21 2012-01-26 Fujitsu Limited Algorithmic matching of a deskew channel
US20170063723A1 (en) * 2015-08-26 2017-03-02 International Business Machines Corporation Asset arrangement management for a shared pool of configurable computing resources associated with a streaming application
US9892020B1 (en) * 2016-03-11 2018-02-13 Signalfx, Inc. User interface for specifying data stream processing language programs for analyzing instrumented software
CN106407290A (en) * 2016-08-29 2017-02-15 北京首信科技股份有限公司 Method for efficiently calculating multi-dimensional user number from massive data
CN106372240A (en) * 2016-09-14 2017-02-01 北京搜狐新动力信息技术有限公司 Method and device for data analysis
US20190007206A1 (en) * 2017-06-30 2019-01-03 Microsoft Technology Licensing, Llc Encrypting object index in a distributed storage environment
CN107634848A (en) * 2017-08-07 2018-01-26 上海天旦网络科技发展有限公司 A kind of system and method for collection analysis network equipment information

Also Published As

Publication number Publication date
CN111506605B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN105988996B (en) Index file generation method and device
CN104899204B (en) Data storage method and device
CN110888842A (en) File storage method, file query method, file storage device, file query device and file query equipment
CN111679886A (en) Heterogeneous computing resource scheduling method, system, electronic device and storage medium
CN112464034A (en) User data extraction method and device, electronic equipment and computer readable medium
CN116755844A (en) Data processing method, device and equipment of simulation engine and storage medium
CN112765103A (en) File analysis method, system, device and equipment
CN114900570A (en) Standardized data acquisition and transmission method and system
CN112363904B (en) Log data analysis positioning method and device and computer readable storage medium
CN110808738A (en) Data compression method, device, equipment and computer readable storage medium
CN113704199A (en) File preview method, system, equipment and computer readable storage medium
CN110505289B (en) File downloading method and device, computer readable medium and wireless communication equipment
CN110032432B (en) Example compression method and device and example decompression method and device
CN111506605B (en) Data analysis method, device, equipment and computer readable storage medium
CN111680288A (en) Command execution method, device and equipment for container and storage medium
CN111552713A (en) Data verification method and device
CN113407541B (en) Data acquisition method, data acquisition equipment, storage medium and device
CN113760849B (en) Log processing method, system, electronic device and computer readable storage medium
WO2022253131A1 (en) Data parsing method and apparatus, computer device, and storage medium
CN114063943A (en) Data transmission system, method, device, medium, and apparatus
CN114218175A (en) Resource cross-platform sharing method and device, terminal equipment and storage medium
CN113691548A (en) Data acquisition and classified storage method and system thereof
CN109960630B (en) Method for rapidly extracting logs from large-batch compressed files
CN113742385A (en) Data query method and device
CN113536767A (en) Data processing method, device, equipment, medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant