CN112667627B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN112667627B
CN112667627B CN201910983940.XA CN201910983940A CN112667627B CN 112667627 B CN112667627 B CN 112667627B CN 201910983940 A CN201910983940 A CN 201910983940A CN 112667627 B CN112667627 B CN 112667627B
Authority
CN
China
Prior art keywords
data
dimension
dimensions
summarizing
use request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910983940.XA
Other languages
Chinese (zh)
Other versions
CN112667627A (en
Inventor
张舜
张彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN201910983940.XA priority Critical patent/CN112667627B/en
Publication of CN112667627A publication Critical patent/CN112667627A/en
Application granted granted Critical
Publication of CN112667627B publication Critical patent/CN112667627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a data processing method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: determining a summary mode, wherein the summary mode indicates a plurality of first data dimensions of data and a calculation relation among the plurality of first data dimensions; summarizing the data with a plurality of first data dimensions according to a summarization mode to form summarized data with a second data dimension and indexes corresponding to the summarization mode, wherein the second data dimension is generated based on the plurality of first data dimensions according to a calculation relation indicated by the summarization mode; receiving a data use request, wherein the data use request indicates the data dimension of data to be used; when the data dimension indicated by the data use request exists in the second data dimension, summary data corresponding to the data dimension indicated by the data use request is extracted according to the index. The embodiment improves the data processing efficiency and reduces the feedback delay of the data.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
With the development of computer technology, the data processing platform needs to process larger and larger amounts of data.
When a user uses data based on data dimensions on a data processing platform, the user usually needs to extract data of multiple data dimensions at the same time, for example, when the user counts the total amount of products in a certain area, the data processing platform needs to collect the total amount of products according to the data of three data dimensions, namely the area, the product type and the number. And because the data processing platform processes a large amount of data, the data processing platform gathers data of a plurality of data dimensions in real time according to the use request of a user, so that the efficiency of data processing is reduced, and the feedback delay of the data is caused.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a data processing method and apparatus, which can improve data processing efficiency and reduce feedback delay of data.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a data processing method.
The data processing method of the embodiment of the invention comprises the following steps:
determining a summary manner, wherein the summary manner indicates a plurality of first data dimensions of data and a calculation relation among the plurality of first data dimensions;
summarizing the data with the plurality of first data dimensions according to the summarization mode to form summarized data with second data dimensions and indexes corresponding to the summarization mode, wherein the second data dimensions are generated based on the plurality of first data dimensions according to the calculation relation indicated by the summarization mode;
receiving a data use request, wherein the data use request indicates a data dimension of data to be used;
and extracting summarized data corresponding to the data dimension indicated by the data use request according to the index when the data dimension indicated by the data use request exists in the second data dimension.
Alternatively, the process may be carried out in a single-stage,
according to the summarizing mode, summarizing the data with the plurality of first data dimensions, including:
summarizing the data with the plurality of first data dimensions and the data volume less than the summarization threshold.
Alternatively, the process may be carried out in a single-stage,
the aggregating the data having the plurality of first data dimensions and the data volume less than the aggregation threshold comprises:
sorting the plurality of first data dimensions according to the attribute value numbers respectively corresponding to the plurality of first data dimensions;
determining a third data dimension according to the sorting result and the following formula, wherein the number of attribute values of the third data dimension is not more than the number of attribute values of the nth first data dimension in the sorting result, and summarizing the data with the third data dimension;
wherein D is i Characterizing the genus of the ith first data dimensionAnd the quantity of the sex values, K, represents the summarization threshold value.
Alternatively, the process may be carried out in a single-stage,
the sorting the plurality of first data dimensions according to the number of attribute values respectively corresponding to the plurality of first data dimensions includes:
and respectively carrying out uniqueness processing on the plurality of first data dimensions, and sequencing the plurality of first data dimensions according to the attribute value quantity of the first data dimensions after the uniqueness processing.
Optionally, the method further comprises:
when the data dimension corresponding to the data dimension indicated by the data use request does not exist in the second data dimension, extracting the data corresponding to the data dimension indicated by the data use request.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a data processing apparatus.
The data processing device of the embodiment of the invention comprises: the system comprises a rule determining module, a data summarizing module, a request receiving module and a data extracting module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the rule determining module is used for determining a summarizing mode, wherein the summarizing mode indicates a plurality of first data dimensions of data and a calculation relation among the plurality of first data dimensions;
the data summarizing module is configured to summarize the data with the plurality of first data dimensions according to the summarizing manner, so as to form summarized data with a second data dimension and an index corresponding to the summarizing manner, where the second data dimension is generated based on the plurality of first data dimensions according to a calculation relationship indicated by the summarizing manner;
the request receiving module is used for receiving a data use request, wherein the data use request indicates the data dimension of data to be used;
the data extraction module is used for extracting summarized data corresponding to the data dimension indicated by the data use request according to the index when the data dimension indicated by the data use request exists in the second data dimension.
Alternatively, the process may be carried out in a single-stage,
and the data summarizing module is used for summarizing the data with the plurality of first data dimensions and the data quantity smaller than the summarizing threshold value.
Alternatively, the process may be carried out in a single-stage,
the data summarization module is used for sequencing the plurality of first data dimensions according to the attribute value numbers respectively corresponding to the plurality of first data dimensions; determining a third data dimension according to the sorting result and the following formula, wherein the number of attribute values of the third data dimension is not more than the number of attribute values of the nth first data dimension in the sorting result, and summarizing the data with the third data dimension;
wherein D is i And representing the attribute value quantity of the ith first data dimension, wherein K represents the summarization threshold value.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic device for data processing.
An electronic device for data processing according to an embodiment of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data processing method according to the embodiment of the invention.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program which, when executed by a processor, implements a method for data processing of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: by summarizing data with a plurality of first data dimensions according to a summarization mode in advance before receiving a data use request, summarized data with a second data dimension and corresponding indexes are formed. When the data use request is received, summary data corresponding to the data dimension indicated by the data use request can be extracted according to the index, and real-time calculation is performed on corresponding data according to the data dimension of the data to be used indicated by the data use request after the data use request is received. The data summarizing process can be performed when the data processing device is idle, so that the data processing efficiency can be improved, corresponding summarized data can be rapidly determined according to the index after a data use request input by a user is received, and the corresponding summarized data is directly fed back to the user, so that the feedback delay of the data is avoided.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the main modules of a data processing apparatus according to an embodiment of the present invention;
FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 4 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features in the embodiments may be combined with each other without collision.
Fig. 1 is a schematic diagram of main steps of a data processing method according to an embodiment of the present invention.
As shown in fig. 1, the data processing method in the embodiment of the present invention mainly includes the following steps:
step S101: a summary manner is determined that indicates a plurality of first data dimensions of the data and a calculated relationship between the plurality of first data dimensions.
The summary manner relates to an actual business scenario, for example, in a business scenario involving product logistics or retail, the first data dimension indicated by the summary manner may be a region, a product, a quantity, a unit price, a material, and the like. In addition to indicating the first data dimensions, the summary manner indicates a calculation relationship among a plurality of first data dimensions, for example, the summary manner may indicate that a calculation relationship between the first data dimensions corresponding to the number and the unit price respectively is a product relationship, and after the first data dimensions corresponding to the number and the unit price respectively are summarized according to the summary manner, the obtained second data dimension corresponds to sales.
Step S102: and summarizing the data with the plurality of first data dimensions according to the summarization mode to form summarized data with second data dimensions and indexes corresponding to the summarization mode, wherein the second data dimensions are generated based on the plurality of first data dimensions according to the calculation relation indicated by the summarization mode.
Before data summarization, firstly, the data to be processed which are not processed are obtained, a model layer to be processed of a data model can be formed according to the data to be processed, and then, pretreatment operations such as data cleaning and the like can be carried out on the data to be processed of the model layer to be processed.
When the data is preprocessed, the plurality of first data dimensions can be subjected to the unique processing respectively, and the plurality of first data dimensions are ordered according to the attribute value number of the first data dimensions after the unique processing. The attribute value of the first data dimension is a specific parameter in the first data dimension, for example, when the first data dimension is a region, the attribute value may be eastern China, north China, and the like. When the same first data dimension has a plurality of identical attribute values, according to different specific service scenarios, the identical attribute values in the same first data dimension can be subjected to unique processing in a de-duplication or summation mode, so that the attribute values in the same first data dimension are different from each other, and later data summarization is facilitated. The preprocessed data may form a detail model layer. When data is summarized, the data of the detail model layer is mainly summarized.
After preprocessing the data, sorting the plurality of first data dimensions according to the attribute value numbers corresponding to the plurality of first data dimensions, determining a third data dimension according to a sorting result and the following formula, wherein the attribute value number of the third data dimension is not greater than the attribute value number of the nth first data dimension in the sorting result, and summarizing the data with the third data dimension;
wherein D is i And representing the attribute value quantity of the ith first data dimension, wherein K represents the summarization threshold value.
For example, the first data dimension dim1 represents a region, its corresponding attribute values are china and north china, the first data dimension dim2 represents a product, its corresponding attribute values are a and B, and the corresponding data structure may be as shown in table 1:
TABLE 1
dim1 dim2
Huazhong (Chinese style) A
Huazhong (Chinese style) B
North China A
According to the data structure in table 1, the respective attribute value numbers of dim1 and dim2 can be determined, specifically, since two identical attribute values (in the middle of the China) exist under dim1, the two wanted attribute values need to be subjected to the unique processing, the attribute value corresponding to dim1 is 2, and similarly, the attribute value corresponding to dim2 is also 2.
When the data volume is too large, the data volume after data summarization is still large, and at the moment, the data acceleration processing effect generated by data summarization is small, so that in order to reduce the total data summarization, the data summarization efficiency is improved, and detail data is still reserved for the part of data without data summarization. In other words, when data aggregation is performed, only data having a plurality of first data dimensions and having a data amount smaller than an aggregation threshold value is aggregated.
Based on the above, during data summarization, the plurality of first data dimensions may be ranked according to the determined attribute value numbers corresponding to the first data dimensions, where the ranking results are dim1, dim2, dim3 … … dimy, and the attribute value numbers corresponding to the first data dimension dim1 are D 1 The number of attribute values corresponding to dim2 is D 2 By analogy, the number of attribute values corresponding to the dimy is D y . To facilitate determining the amount of data combining, a histogram corresponding to the plurality of first data dimensions may be determined, the histogram having an abscissa representing the identity of the first data dimension and an ordinate representing the pair of first data dimensionsThe histogram of the y first data dimensions is D: { 'dim1': D 1 ,‘dim2’:D 2 ……‘dimy’:D y }. The maximum data combination amount T of these y first data dimensions is based on the histogram of the respective first data dimensions and the data combination principleAssuming that the summary threshold is K, when +.>In this case, y first data dimensions are all directly used as third data dimensions, that is, the data corresponding to the y first data dimensions can be summarized.
If the maximum data combination amount T>In K, the first data dimension with the largest attribute value is removed, and then whether the data combination amount is not more than the summarization threshold value is determinedWhether or not it is not greater than K, if at this time +.>If the value is still larger than K, continuing to remove the first data dimension with the largest attribute value in y-1 first data dimensions, and then judging +.>Whether or not it is not greater than K, by this cycle, until +.>And then taking the first data dimension with the attribute value number not larger than that of the nth first data dimension in the sorting result as a third data dimension, in this example, taking dim1, dim2 and dim3 … … dimn as the third data dimension, and removing the data dimension with larger data quantity so as to better gather the data. Wherein the summary threshold can be set according to actual requirements, e.g. setting the summary threshold100W.
After the third data dimension is determined, the data with the plurality of third data dimensions can be summarized to form summarized data with second data dimensions and indexes corresponding to the summarization mode, wherein the second data dimensions are summarized data dimensions generated based on the third data dimensions. The summarized rule description of the second data dimension may be shown in table 2, the rule description of the corresponding index may be shown in table 3, and the summarized data may form a summarized layer of the data model according to the rule description of the corresponding second data dimension and the rule description of the index, and the data storage format of the summarized layer may be shown in table 4. In practical application, the method for designating the first data dimension is wide in applicability, and the method for designating the first data dimension is generally used in platforms such as hadoop according to the full-dimension summarization.
TABLE 2
TABLE 3 Table 3
TABLE 4 Table 4
It will be appreciated that when the summary data (summary layer) is stored, the data in the data dimension may not be stored any more, but only the index and the specific summary data may be stored, that is, the data storage format of the summary layer may also be as shown in table 5, and when the summary data is extracted, the related summary data may be extracted directly according to the index.
TABLE 5
Step S103: a data usage request is received, the data usage request indicating a data dimension of data to be used.
Step S104: and extracting summarized data corresponding to the data dimension indicated by the data use request according to the index when the data dimension indicated by the data use request exists in the second data dimension.
When a data usage request is received, a second data dimension corresponding to the data dimension indicated by the data usage request may be determined according to the index in table 4 or table 5, and then the corresponding summary data is extracted.
And extracting data corresponding to the data dimension indicated by the data use request when the data dimension indicated by the data use request is not summarized, that is, when the data dimension corresponding to the data dimension indicated by the data use request does not exist in the second data dimension. In other words, when the data dimensions indicated by the data usage request are not summarized, the corresponding data may be extracted directly from the detail model layer of the data model.
That is, when a data use request is received, it may be determined whether summary data of a data dimension indicated by the data use request exists according to the index, if so, data is extracted from the summary layer of the data model, and if not, data is extracted from the detail layer of the data model, thereby reducing the amount of data extraction and reducing the amount of data calculation in the data extraction process, thereby improving the data processing efficiency.
According to the data processing method provided by the embodiment of the invention, before receiving the data use request, the data with a plurality of first data dimensions are summarized in advance according to the summarization mode to form summarized data with a second data dimension and corresponding indexes. When the data use request is received, summary data corresponding to the data dimension indicated by the data use request can be extracted according to the index, and real-time calculation is performed on corresponding data according to the data dimension of the data to be used indicated by the data use request after the data use request is received. The data summarizing process can be performed when the data processing device is idle, so that the data processing efficiency can be improved, corresponding summarized data can be rapidly determined according to the index after a data use request input by a user is received, and the corresponding summarized data is directly fed back to the user, so that the feedback delay of the data is avoided.
Fig. 2 is a schematic diagram of main modules of a data processing apparatus according to an embodiment of the present invention.
As shown in fig. 2, a data processing apparatus 200 of an embodiment of the present invention includes: a rule determining module 201, a data summarizing module 202, a request receiving module 203 and a data extracting module 204; wherein, the liquid crystal display device comprises a liquid crystal display device,
the rule determining module 201 is configured to determine a summary manner, where the summary manner indicates a plurality of first data dimensions of data and a calculation relationship between the plurality of first data dimensions;
the data summarizing module 202 is configured to summarize the data with the plurality of first data dimensions according to the summarizing manner, so as to form summarized data with a second data dimension and an index corresponding to the summarizing manner, where the second data dimension is generated based on the plurality of first data dimensions according to a calculation relationship indicated by the summarizing manner;
the request receiving module 203 is configured to receive a data usage request, where the data usage request indicates a data dimension of data to be used;
the data extraction module 204 is configured to extract summary data corresponding to a data dimension indicated by the data usage request according to the index when there is a data dimension indicated by the data usage request in the second data dimension.
In one embodiment of the present invention, the data summarization module 202 is configured to summarize data having the plurality of first data dimensions and a data amount less than a summarization threshold.
In one embodiment of the present invention, the data summarizing module 202 is configured to sort the plurality of first data dimensions according to the attribute value numbers corresponding to the plurality of first data dimensions respectively; determining a third data dimension according to the sorting result and the following formula, wherein the number of attribute values of the third data dimension is not more than the number of attribute values of the nth first data dimension in the sorting result, and summarizing the data with the third data dimension;
wherein D is i And representing the attribute value quantity of the ith first data dimension, wherein K represents the summarization threshold value.
The embodiment of the invention also provides an electronic device for data processing, which comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data processing method according to the embodiment of the invention.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor implements a method for data processing according to the embodiment of the invention.
According to the data processing device provided by the embodiment of the invention, the data with a plurality of first data dimensions are summarized according to the summarization mode in advance before the data use request is received, so that summarized data with a second data dimension and corresponding indexes are formed. When the data use request is received, summary data corresponding to the data dimension indicated by the data use request can be extracted according to the index, and real-time calculation is performed on corresponding data according to the data dimension of the data to be used indicated by the data use request after the data use request is received. The data summarizing process can be performed when the data processing device is idle, so that the data processing efficiency can be improved, corresponding summarized data can be rapidly determined according to the index after a data use request input by a user is received, and the corresponding summarized data is directly fed back to the user, so that the feedback delay of the data is avoided.
FIG. 3 illustrates an exemplary system architecture 300 in which a data processing method or data processing apparatus of an embodiment of the present invention may be applied.
As shown in fig. 3, the system architecture 300 may include terminal devices 301, 302, 303, a network 304, and a server 305. The network 304 is used as a medium to provide communication links between the terminal devices 301, 302, 303 and the server 305. The network 304 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 305 via the network 304 using the terminal devices 301, 302, 303 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 301, 302, 303.
The terminal devices 301, 302, 303 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 305 may be a server providing various services, such as a background management server providing support for shopping-type websites browsed by the user using the terminal devices 301, 302, 303. The background management server can analyze and other processing on the received data such as the product information inquiry request and the like, and feed back processing results (such as target push information and product information) to the terminal equipment.
It should be noted that, the data processing method provided in the embodiment of the present invention is generally executed by the server 305, and accordingly, the data processing apparatus is generally disposed in the server 305.
It should be understood that the number of terminal devices, networks and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 4, there is illustrated a schematic diagram of a computer system 400 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 4 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 401.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a rule determination module, a data summarization module, a request receipt module, and a data extraction module. The names of these modules do not in some cases limit the module itself, and for example, the rule determination module may also be described as a "module that determines a summary manner".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: determining a summary manner, wherein the summary manner indicates a plurality of first data dimensions of data and a calculation relation among the plurality of first data dimensions; summarizing the data with the plurality of first data dimensions according to the summarization mode to form summarized data with second data dimensions and indexes corresponding to the summarization mode, wherein the second data dimensions are generated based on the plurality of first data dimensions according to the calculation relation indicated by the summarization mode; receiving a data use request, wherein the data use request indicates a data dimension of data to be used; and extracting summarized data corresponding to the data dimension indicated by the data use request according to the index when the data dimension indicated by the data use request exists in the second data dimension.
According to the technical scheme of the embodiment of the invention, the data with a plurality of first data dimensions are summarized in advance according to the summarization mode before the data use request is received, so that summarized data with a second data dimension and corresponding indexes are formed. When the data use request is received, summary data corresponding to the data dimension indicated by the data use request can be extracted according to the index, and real-time calculation is performed on corresponding data according to the data dimension of the data to be used indicated by the data use request after the data use request is received. The data summarizing process can be performed when the data processing device is idle, so that the data processing efficiency can be improved, corresponding summarized data can be rapidly determined according to the index after a data use request input by a user is received, and the corresponding summarized data is directly fed back to the user, so that the feedback delay of the data is avoided.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data processing, comprising:
determining a summary manner, wherein the summary manner indicates a plurality of first data dimensions of data and a calculation relation among the plurality of first data dimensions;
summarizing the data with the plurality of first data dimensions according to the summarization mode to form summarized data with second data dimensions and indexes corresponding to the summarization mode, wherein the second data dimensions are generated based on the plurality of first data dimensions according to the calculation relation indicated by the summarization mode;
receiving a data use request, wherein the data use request indicates a data dimension of data to be used;
and extracting summarized data corresponding to the data dimension indicated by the data use request according to the index when the data dimension indicated by the data use request exists in the second data dimension.
2. The method of claim 1, wherein aggregating the data having the plurality of first data dimensions according to an aggregation manner comprises:
summarizing the data with the plurality of first data dimensions and the data volume less than the summarization threshold.
3. The method of claim 2, wherein the aggregating data having the plurality of first data dimensions and a data volume less than an aggregation threshold comprises:
sorting the plurality of first data dimensions according to the attribute value numbers respectively corresponding to the plurality of first data dimensions;
determining a third data dimension according to the sorting result and the following formula, wherein the number of attribute values of the third data dimension is not more than the number of attribute values of the nth first data dimension in the sorting result, and summarizing the data with the third data dimension;
wherein D is i And representing the attribute value quantity of the ith first data dimension, wherein K represents the summarization threshold value.
4. A method according to claim 3, wherein said ordering said plurality of first data dimensions according to the number of attribute values respectively corresponding to said plurality of first data dimensions comprises:
and respectively carrying out uniqueness processing on the plurality of first data dimensions, and sequencing the plurality of first data dimensions according to the attribute value quantity of the first data dimensions after the uniqueness processing.
5. The method as recited in claim 1, further comprising:
when the data dimension corresponding to the data dimension indicated by the data use request does not exist in the second data dimension, extracting the data corresponding to the data dimension indicated by the data use request.
6. A data processing apparatus, comprising: the system comprises a rule determining module, a data summarizing module, a request receiving module and a data extracting module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the rule determining module is used for determining a summarizing mode, wherein the summarizing mode indicates a plurality of first data dimensions of data and a calculation relation among the plurality of first data dimensions;
the data summarizing module is configured to summarize the data with the plurality of first data dimensions according to the summarizing manner, so as to form summarized data with a second data dimension and an index corresponding to the summarizing manner, where the second data dimension is generated based on the plurality of first data dimensions according to a calculation relationship indicated by the summarizing manner;
the request receiving module is used for receiving a data use request, wherein the data use request indicates the data dimension of data to be used;
the data extraction module is used for extracting summarized data corresponding to the data dimension indicated by the data use request according to the index when the data dimension indicated by the data use request exists in the second data dimension.
7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
and the data summarizing module is used for summarizing the data with the plurality of first data dimensions and the data quantity smaller than the summarizing threshold value.
8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
the data summarization module is used for sequencing the plurality of first data dimensions according to the attribute value numbers respectively corresponding to the plurality of first data dimensions; determining a third data dimension according to the sorting result and the following formula, wherein the number of attribute values of the third data dimension is not more than the number of attribute values of the nth first data dimension in the sorting result, and summarizing the data with the third data dimension;
wherein D is i And representing the attribute value quantity of the ith first data dimension, wherein K represents the summarization threshold value.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.
CN201910983940.XA 2019-10-16 2019-10-16 Data processing method and device Active CN112667627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910983940.XA CN112667627B (en) 2019-10-16 2019-10-16 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910983940.XA CN112667627B (en) 2019-10-16 2019-10-16 Data processing method and device

Publications (2)

Publication Number Publication Date
CN112667627A CN112667627A (en) 2021-04-16
CN112667627B true CN112667627B (en) 2023-11-03

Family

ID=75400391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910983940.XA Active CN112667627B (en) 2019-10-16 2019-10-16 Data processing method and device

Country Status (1)

Country Link
CN (1) CN112667627B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5257365A (en) * 1990-03-16 1993-10-26 Powers Frederick A Database system with multi-dimensional summary search tree nodes for reducing the necessity to access records
CN109872015A (en) * 2017-12-01 2019-06-11 北京京东尚科信息技术有限公司 Method and device for behavioral data assessment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5257365A (en) * 1990-03-16 1993-10-26 Powers Frederick A Database system with multi-dimensional summary search tree nodes for reducing the necessity to access records
CN109872015A (en) * 2017-12-01 2019-06-11 北京京东尚科信息技术有限公司 Method and device for behavioral data assessment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于MDD的OLAP数据膨胀问题研究;许建;罗永强;;清远职业技术学院学报(03);全文 *

Also Published As

Publication number Publication date
CN112667627A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN109614402B (en) Multidimensional data query method and device
CN107480205B (en) Method and device for partitioning data
CN108595448B (en) Information pushing method and device
CN112527649A (en) Test case generation method and device
CN111695840A (en) Method and device for realizing flow control
CN107908662B (en) Method and device for realizing search system
CN113761565B (en) Data desensitization method and device
CN113190558A (en) Data processing method and system
CN111858706A (en) Data processing method and device
CN112667627B (en) Data processing method and device
CN107920100B (en) Information pushing method and device
CN115423030A (en) Equipment identification method and device
CN113590322A (en) Data processing method and device
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN110378714B (en) Method and device for processing access data
CN112862554A (en) Order data processing method and device
CN112184370A (en) Method and device for pushing product
CN113434754A (en) Method and device for determining recommended API (application program interface) service, electronic equipment and storage medium
CN112131287A (en) Method and device for reading data
CN112395510A (en) Method and device for determining target user based on activity
CN113590447B (en) Buried point processing method and device
CN113312521B (en) Content retrieval method, device, electronic equipment and medium
CN112783956B (en) Information processing method and device
CN110717826A (en) Asset filtering method and device
CN111782801B (en) Method and device for grouping keywords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant