CN107832347B - Data dimension reduction method and system and electronic equipment - Google Patents

Data dimension reduction method and system and electronic equipment Download PDF

Info

Publication number
CN107832347B
CN107832347B CN201710963184.5A CN201710963184A CN107832347B CN 107832347 B CN107832347 B CN 107832347B CN 201710963184 A CN201710963184 A CN 201710963184A CN 107832347 B CN107832347 B CN 107832347B
Authority
CN
China
Prior art keywords
data
subdata
processing
input data
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710963184.5A
Other languages
Chinese (zh)
Other versions
CN107832347A (en
Inventor
李树前
朱德伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710963184.5A priority Critical patent/CN107832347B/en
Publication of CN107832347A publication Critical patent/CN107832347A/en
Application granted granted Critical
Publication of CN107832347B publication Critical patent/CN107832347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data dimension reduction method, including: acquiring N-dimensional data, wherein the N-dimensional data comprises N dimensions, and N is greater than or equal to 1; performing dimensionality reduction on the data by: processing the input data to obtain subdata with dimensionality reduced by 1, and taking the subdata as input data of next processing; and executing the steps until zero-dimensional subdata is obtained, and recording the time for generating the corresponding subdata in the subdata obtained through the dimensionality reduction processing.

Description

Data dimension reduction method and system and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data dimension reduction method, system and electronic device.
Background
With the development and application of database technology, the amount of data stored in the database is shifted from megabytes (M) and gigabytes (G) of the 80 th 20 th century to terabytes (T) and gigabytes (P) at present, and meanwhile, the query demand of users is more and more complicated, and the query or manipulation not only refers to the query or manipulation of one or several records in a relational table, but also refers to the data analysis and information synthesis of tens of millions of records in multiple tables. However, in the course of implementing the disclosed concept, the inventors found that the following technical problems exist in the prior art: with the demand for multidimensional query of data in big data application, database systems cannot fully meet the demand because the existing databases are inefficient in one-dimensional or multi-dimensional query.
Disclosure of Invention
In view of this, the present disclosure provides a data dimension reduction method, system and electronic device.
One aspect of the present disclosure provides a data dimension reduction method, including: acquiring N-dimensional data, wherein the N-dimensional data comprises N dimensions, and N is greater than or equal to 1; performing dimensionality reduction on the data by: processing the input data to obtain subdata with dimensionality reduced by 1, and taking the subdata as input data of next processing; and executing the steps until zero-dimensional subdata is obtained, and recording the time for generating the corresponding subdata in the subdata obtained through the dimensionality reduction processing.
According to an embodiment of the present disclosure, the number of times of processing is N +1 times.
According to an embodiment of the present disclosure, the processing of the input data comprises sorting and/or aggregating the input data.
According to an embodiment of the present disclosure, the input data is sorted and/or aggregated by MapReduce.
According to an embodiment of the disclosure, the method further comprises: when data which is the same as or similar to the N-dimensional data is obtained again, dimension reduction processing is carried out on the data, and corresponding subdata is updated; and recording a time at which the updated sub data is generated in the updated sub data.
Another aspect of the present disclosure provides a data dimension reduction system, including: an obtaining module, configured to obtain N-dimensional data, where the N-dimensional data includes N dimensions, where N is greater than or equal to 1; the first processing module is used for carrying out dimensionality reduction on the data in the following mode: processing the input data to obtain subdata with dimensionality reduced by 1, and taking the subdata as input data of next processing; executing the steps until zero-dimensional subdata is obtained, wherein the first processing is to use the N-dimensional data as input data; and a first recording module for recording the time for generating the corresponding subdata in the subdata obtained by the dimensionality reduction processing.
According to an embodiment of the present disclosure, the number of times of processing is N +1 times.
According to an embodiment of the present disclosure, the processing of the input data comprises sorting and/or aggregating the input data.
According to an embodiment of the present disclosure, the input data is sorted and/or aggregated by MapReduce.
According to an embodiment of the present disclosure, the system further comprises: the second processing module is used for performing dimensionality reduction processing on the data when the data which is the same as or similar to the N-dimensional data is obtained again, and updating corresponding subdata; and recording a time at which the updated sub data is generated in the updated sub data.
Another aspect of the present disclosure provides an electronic device including: one or more processors; and one or more memories storing executable instructions that, when executed by the processor, cause the processor to perform the method as described above.
Another aspect of the present disclosure provides a readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, the efficiency of the database in one-dimensional or multi-dimensional query counting can be at least partially improved, so that the technical effect of quickly querying the result in multi-dimensional query counting in hundred million-level data can be achieved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an exemplary system architecture to which the data dimension reduction methods and apparatus of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow diagram of a method of data dimension reduction according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram of a method of data dimension reduction according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a system for data dimension reduction according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of a system for data dimension reduction, according to another embodiment of the present disclosure; and
FIG. 6 schematically shows a block diagram of an electronic device according to an embodiment of the present disclosure;
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system. In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The embodiment of the disclosure provides a data dimension reduction method, a data dimension reduction system and electronic equipment. The data dimension reduction method comprises the following steps: acquiring N-dimensional data, wherein the N-dimensional data comprises N dimensions, N is more than or equal to 1, and performing dimensionality reduction on the data in the following mode: processing the input data to obtain subdata with dimensionality reduced by 1 as input data for next processing, executing the steps until obtaining zero-dimensional subdata, and recording time for generating corresponding subdata in the subdata obtained through dimensionality reduction processing.
FIG. 1 schematically illustrates an exemplary system architecture to which the data dimension reduction methods and apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. Of course, it is to be understood that this architecture is merely an example, and that the components included in a particular architecture may be tailored to specific circumstances. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to query or store data or the like. The terminal device devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, smart homes, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for software updates browsed by users using the terminal devices 101, 102, 103. The background management server can analyze and process the received data, and update the processed data to the position of the same or similar data for the user to inquire. For example, the data may be stored in the cloud, or in a database at the server. Such data may be, for example, data of a user's shopping habits, a user's web browsing history, a user's search history, a user's communication history, and so forth.
It should be noted that the data dimension reduction method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the data dimension reduction system provided by the disclosed embodiments may be generally disposed in the server 105. The data dimension reduction method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the data dimension reduction system provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 schematically shows a flow diagram of a method of data dimension reduction according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes S201 to S203.
In operation S201, N-dimensional data including N dimensions, where N is greater than or equal to 1, is acquired.
In operation S202, the data is subjected to a dimensionality reduction process by: and processing the input data to obtain subdata with dimensionality reduced by 1, wherein the subdata is used as input data for next processing. And executing the steps until the zero-dimensional subdata is obtained, wherein the first processing is to use the N-dimensional data as input data and encode and/or combine N dimensions of the N dimensions.
In operation S203, a time for generating the corresponding sub data is recorded in the sub data obtained through the above-described dimension reduction process.
According to the embodiment of the disclosure, the obtained N-dimensional data is subjected to dimension reduction processing by using the method until zero-dimensional subdata is obtained, a plurality of subdata are obtained in the whole dimension reduction processing process, and the time for generating the corresponding subdata is recorded in the subdata, so that a user can quickly inquire data information in one dimension or multiple dimensions.
According to an embodiment of the present disclosure, the data acquired in operation S201 may be N-dimensional data, which may include N dimensions, where N is greater than or equal to 1. For example, the obtained N-dimensional data may include, but is not limited to, an identification of the user, an App version of the user's terminal device, an operating system of the user's terminal device or a version of the operating system of the user's terminal device, and so forth. In this example, the acquired data includes four dimensions, that is, the acquired data is four-dimensional data, the identifier of the user, the App version of the user's terminal device, the operating system of the user's terminal device, and the version of the operating system of the user's terminal device.
According to an embodiment of the present disclosure, in operation S202, input data is processed to obtain subdata with dimension reduced by 1, and the subdata with dimension reduced by 1 is used as input data for next processing, and the above operations are performed until zero-dimensional subdata is obtained, where the first processing is to use N-dimensional data as input data, and encode and/or combine N dimensions of the N dimensions. According to the embodiment of the disclosure, the processing times of the whole dimension reduction processing process are N +1 times.
For example, a four-dimensional data is obtained, where the four-dimensional data includes four dimensions, that is, an identifier of a user, an App version of a terminal device of the user, an operating system of the terminal device of the user, and a version of the operating system of the terminal device of the user. The four-dimensional data is processed as input data for the first processing to encode four dimensions, which are respectively represented as A, B, C, D, and correspond to the user identification, App version of the user's terminal device, operating system of the user's terminal device, and version of the operating system of the user's terminal device.
It should be noted that, while each dimension of the four-dimensional data is encoded, the four-dimensional data is also calculated, for example, all combinations between each dimension in the four-dimensional data are calculated. For example, ABCD, ABC, ABD, ACD, BCD, AB, AC, AD, BC, BD, CD, A, B, C, D may be included.
According to the embodiment of the disclosure, the four-dimensional data after the first processing is taken as the input data of the second processing and processed to obtain three-dimensional data with dimensionality reduced by 1: ABC, ABD, ACD and BCD. Then, taking ABC, ABD, ACD and BCD as input data for the third processing and processing to obtain two-dimensional data with dimension reduced by 1: AB. AC, AD, BC, BD, CD. And then processing the data by taking AB, AC, AD, BC, BD and CD as input data of the fourth processing to obtain one-dimensional data with dimensionality reduced by 1: A. b, C, D are provided. A, B, C, D is used as input data of the fifth processing and processed to obtain zero-dimension data with dimension reduced by 1. According to the embodiment of the present disclosure, the advantage of processing data in such a processing manner is that each processing is the result (e.g., ABC) of the last processing (e.g., ABCD) as input data, so that the repetitive processing can be reduced, and the processing speed will be faster and faster as the number of dimensions is reduced.
Processing the input data in operation S202 to obtain the subdata with dimensionality reduced by 1 may include sorting and/or aggregating the input data. According to the embodiment of the disclosure, in the process of aggregating the four-dimensional data ABCD, the dimension to be removed by aggregation (for example, dimension a, although the dimension to be removed may also be B, C, D, here, the dimension a is taken as an example only) may be removed from the four dimensions of the four-dimensional data ABCD, and new sub-data BCD is formed, and then the sub-data BCD is sorted and re-aggregated. Specifically, as shown in table 1, table 1 is the result of sorting and re-aggregating the sub-data BCD, as just one example:
TABLE 1
Figure BDA0001434785240000071
Figure BDA0001434785240000081
According to the embodiment of the disclosure, dimension B in table 1 may be sorted according to App version numbers, for example, sorted according to the order of App version numbers from low to high, or sorted according to the order of App version numbers from high to low. Dimension C may be sorted according to the category of the operating system name of the terminal, for example, when the operating system name of the terminal includes the cases that both letters (IOS) and chinese characters (android) exist, the letter names (IOS) may be arranged in front and the chinese characters names (android) may be arranged in back, or they may be sorted in a unified order of 26 letters. The dimension D may be sorted according to the level of the os version number of the terminal, for example, the os version numbers may be sorted from low to high, or the os version numbers may be sorted from high to low.
According to the embodiment of the present disclosure, when the three dimensions in table 1 are aggregated, specifically, the three dimensions may be regarded as a whole, that is, each row in table 1 may be regarded as an element in a table, and then the times of occurrence of the same elements are added. For example, adding the number of occurrences of the same element in Table 1 (e.g., 1.1.0, IOS, 5.2.6 as shown in the first and second rows of Table 1) results in the number of occurrences of the element 1.1.0, IOS, 5.2.6 in the entire column, i.e., twice. By aggregating the four-dimensional data in this way, the situation of repeated elements in the data can be avoided, and the technical effects of quick query and multi-dimensional query of a user are realized.
It should be noted that, in the processing of the four-dimensional data, the sorting and aggregation can be performed in a staggered manner, which can save time.
According to the embodiment of the present disclosure, the sorting and aggregation of the sub-data ABC, ABD, ACD, AB, AC, AD, BC, BD, CD, a, B, C, and D are the same as the above-mentioned process of sorting and aggregating the four-dimensional data ABCD to obtain the sub-data BCD in table 1, and are not described herein again.
According to an embodiment of the present disclosure, the sorting and aggregating of the input data may be sorting and/or aggregating the input data by MapReduce.
MapReduce is a computation model, a framework and a platform oriented to big data parallel processing, and implies the following three layers:
the first method comprises the following steps: MapReduce is a Cluster-based high-performance parallel computing platform (Cluster InfraStrustStructure). It allows the construction of a distributed and parallel computing cluster containing tens, hundreds or thousands of nodes with commercially available servers.
And the second method comprises the following steps: MapReduce is a parallel computing and running Software Framework (Software Framework). The parallel computing software framework is huge but has a fine design, can automatically complete the parallel processing of computing tasks, automatically divide computing data and computing tasks, automatically distribute and execute the tasks on cluster nodes and collect computing results, and sends many complex details at the bottom of the system related to the parallel computing such as data distribution storage, data communication, fault-tolerant processing and the like to the system for processing, thereby greatly reducing the burden of software developers.
And the third is that: MapReduce is a parallel Programming Model and method. The method provides a simple and convenient parallel program design method by means of the design idea of a functional programming language Lisp, realizes basic parallel computing tasks by using two functions of Map and Reduce, provides abstract operation and a parallel programming interface, and simply and conveniently completes the programming and computing processing of large-scale data.
It can be understood that sorting and/or aggregating the input data by applying the parallel computing characteristic of MapReduce can speed up the processing speed of the input data, and can also aggregate the repeated elements of each element in each column dimension in the input data, so that a user does not need to scan each element in each column dimension when querying, thereby speeding up the querying speed.
According to an embodiment of the present disclosure, in operation S203, a time when the corresponding sub data is generated is recorded in the sub data obtained through the above-described dimension reduction processing. The time of recording each subdata can be used for a user to accurately inquire the information of the subdata at a certain moment or a certain time period, and the inquiry is carried out in such a way, so that the inquiry result obtained by the user is more accurate.
FIG. 3 schematically illustrates a flow diagram of a method of data dimension reduction according to another embodiment of the present disclosure.
As shown in fig. 3, the method further includes operation S301 and operation S302.
In operation S201, N-dimensional data including N dimensions, where N is greater than or equal to 1, is acquired.
In operation S202, the data is subjected to a dimensionality reduction process by: and processing the input data to obtain subdata with dimensionality reduced by 1, wherein the subdata is used as input data for next processing. And executing the steps until the zero-dimensional sub-data is obtained, wherein the first processing is to use the N-dimensional data as input data.
In operation S203, a time for generating the corresponding sub data is recorded in the sub data obtained through the above-described dimension reduction process.
In operation S301, when data identical or similar to the N-dimensional data is acquired again, the data is subjected to dimensionality reduction processing, and corresponding sub-data is updated.
In operation S302, a time at which the updated sub data is generated is recorded in the updated sub data.
According to an embodiment of the present disclosure, the data acquired again in operation S301 being the same as or similar to the N-dimensional data may refer to that the number of dimensions of the acquired data is the same as or similar to the N-dimensional data, and the attribute of each dimension of the acquired data is also the same as or similar to the attribute of each dimension of the N-dimensional data, and may also refer to that the number of dimensions of the acquired data is different from or dissimilar to the N-dimensional data, but the attribute of the partial dimension of the acquired data may be the same as or similar to the attribute of the partial dimension of the N-dimensional data.
For example, the re-acquired data includes M dimensions, where M is equal to N, and the attribute of each of the M dimensions is the same as the attribute of each of the N-dimensional data acquired last time, i.e., the acquired M-dimensional data is the same data as the N-dimensional data acquired last time.
As another example, the re-acquired data includes M dimensions, where M is greater than zero, e.g., M equals 4, and the 4 dimensions are A, B, Z, X, respectively. The last acquired N-dimensional data includes A, B, H, J four dimensions, i.e., N equals 4. The four-dimensional data acquired again has only A, B two dimensions in comparison with the four-dimensional data acquired last time, i.e., the acquired M-dimensional data is similar to the N-dimensional data acquired last time. According to the embodiment of the present disclosure, when the data is acquired again including A, B, Z, X four dimensions, the A, B, Z, X four dimensions of the acquired data are subjected to dimensionality reduction processing, and the processing result is updated into the sub data (e.g., A, B, AB) corresponding to the last acquired N data, and then the time of generating the updated sub data is recorded in the updated sub data, so that the user can obtain the latest data information at the time of querying.
According to embodiments of the present disclosure, the above methods may be applied to a variety of databases, which may include, for example and without limitation, relational databases (e.g., mysql, oracle), distributed databases (e.g., distributed, column-oriented open source database, HBase).
FIG. 4 schematically illustrates a block diagram of a data dimension reduction system according to an embodiment of the present disclosure.
As shown in FIG. 4, the data dimension reduction system 400 includes an acquisition module 410, a first processing module 420, and a first recording module 430.
An obtaining module 410 is configured to obtain N-dimensional data, where the N-dimensional data includes N dimensions, where N is greater than or equal to 1.
A first processing module 420, configured to perform dimension reduction processing on data by: and processing the input data to obtain subdata with dimensionality reduced by 1, wherein the subdata is used as input data for next processing. And executing the steps until the zero-dimensional subdata is obtained, wherein the first processing is to use the N-dimensional data as input data and encode and/or combine N dimensions of the N dimensions.
The first recording module 430 is configured to record, in the sub data obtained through the above dimension reduction, a time for generating corresponding sub data.
The data acquisition, data processing, and sub-data recording time have been described in detail in the above-described method according to an embodiment of the present disclosure. Reference may be made specifically to the description above with reference to fig. 1-3, which will not be repeated here.
FIG. 5 schematically illustrates a block diagram of a system for data dimension reduction, according to another embodiment of the present disclosure.
As shown in fig. 5, the system 400 further includes a second processing module 440 and a second recording module 450.
And a second processing module 440, configured to, when data that is the same as or similar to the N-dimensional data is obtained again, perform dimension reduction processing on the data, and update corresponding sub-data.
The second recording module 450 is configured to record, in the updated sub data, a time when the updated sub data is generated.
It is understood that the obtaining module 410, the first processing module 420, the first recording module 430, the second processing module 440, and the second recording module 450 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present invention, at least one of the obtaining module 410, the first processing module 420, the first recording module 430, the second processing module 440, and the second recording module 450 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in a suitable combination of three implementations of software, hardware, and firmware. Alternatively, at least one of the acquisition module 410, the first processing module 420, the first recording module 430, the second processing module 440, and the second recording module 450 may be at least partially implemented as a computer program module that, when executed by a computer, may perform the functions of the respective modules.
Fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
As shown in fig. 6, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The CPU601, ROM 602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the electronic device 600 of the present disclosure are executed when the computer program is executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a transmitting unit, an obtaining unit, a determining unit, and a first processing unit. The names of these units do not in some cases constitute a limitation to the unit itself, and for example, the sending unit may also be described as a "unit sending a picture acquisition request to a connected server".
As another aspect, a computer-readable medium is also provided according to an embodiment of the present disclosure. The computer readable medium carries one or more programs which, when executed, implement a data dimension reduction method according to an embodiment of the present disclosure, including: acquiring N-dimensional data, wherein the N-dimensional data comprises N dimensions, and N is greater than or equal to 1; the data is subjected to dimensionality reduction treatment in the following way: processing the input data to obtain subdata with dimensionality reduced by 1, using the subdata as input data for next processing, and coding and/or combining N dimensionalities of the N dimensionalities; and executing the steps until zero-dimensional subdata is obtained, and recording the time for generating the corresponding subdata in the subdata obtained through the dimensionality reduction processing.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (8)

1. A method of data dimension reduction, comprising:
acquiring N-dimensional data, wherein the N-dimensional data comprises N dimensions, and N is greater than or equal to 1;
performing dimensionality reduction on the data by:
processing the input data to obtain subdata with dimensionality reduced by 1, wherein the subdata with dimensionality reduced by 1 is used as input data for next processing, and the step of processing the input data to obtain the subdata with dimensionality reduced by 1 comprises the following steps: sorting and/or aggregating the input data; and
executing the step of processing the input data to obtain subdata with dimensionality reduced by 1 as input data for next processing until obtaining zero-dimensional subdata,
wherein, the first processing is to take the N-dimensional data as input data and encode and/or combine N dimensions of the N-dimension; and
recording the time for generating corresponding subdata in the subdata obtained by the dimensionality reduction treatment;
wherein the input data is sorted and/or aggregated by MapReduce.
2. The method of claim 1, wherein the number of treatments is N + 1.
3. The method of claim 1, further comprising:
when data which is the same as or similar to the N-dimensional data is obtained again, dimension reduction processing is carried out on the data, and corresponding subdata is updated; and
the time when the updated sub data is generated is recorded in the updated sub data.
4. A data dimension reduction system, comprising:
an obtaining module, configured to obtain N-dimensional data, where the N-dimensional data includes N dimensions, where N is greater than or equal to 1;
the first processing module is used for carrying out dimensionality reduction on the data in the following mode:
processing the input data to obtain subdata with dimensionality reduced by 1, wherein the subdata with dimensionality reduced by 1 is used as input data for next processing, and the step of processing the input data to obtain the subdata with dimensionality reduced by 1 comprises the following steps: sorting and/or aggregating the input data; and
executing the step of processing the input data to obtain subdata with dimensionality reduced by 1 as input data for next processing until obtaining zero-dimensional subdata,
wherein, the first processing is to take the N-dimensional data as input data and encode and/or combine N dimensions of the N-dimension; and
a first recording module, which records the time for generating the corresponding subdata in the subdata obtained by the dimensionality reduction processing;
wherein the input data is sorted and/or aggregated by MapReduce.
5. The system of claim 4, wherein the number of treatments is N + 1.
6. The system of claim 4, further comprising:
the second processing module is used for performing dimensionality reduction processing on the data when the data which is the same as or similar to the N-dimensional data is obtained again, and updating corresponding subdata; and
and the second recording module is used for recording the time for generating the updated subdata in the updated subdata.
7. An electronic device, comprising:
one or more processors; and
one or more memories storing executable instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-3.
8. A readable storage medium having stored thereon instructions for performing the method of any of claims 1-3.
CN201710963184.5A 2017-10-16 2017-10-16 Data dimension reduction method and system and electronic equipment Active CN107832347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710963184.5A CN107832347B (en) 2017-10-16 2017-10-16 Data dimension reduction method and system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710963184.5A CN107832347B (en) 2017-10-16 2017-10-16 Data dimension reduction method and system and electronic equipment

Publications (2)

Publication Number Publication Date
CN107832347A CN107832347A (en) 2018-03-23
CN107832347B true CN107832347B (en) 2021-12-31

Family

ID=61648030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710963184.5A Active CN107832347B (en) 2017-10-16 2017-10-16 Data dimension reduction method and system and electronic equipment

Country Status (1)

Country Link
CN (1) CN107832347B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020019318A1 (en) * 2018-07-27 2020-01-30 西门子(中国)有限公司 Method, device and system for performing dimensionality reduction processing on multidimensional variables of data to be processed
CN111274243B (en) * 2020-01-07 2023-05-23 北京唐颐惠康生物医学技术有限公司 Information processing method and system based on multidimensional model form

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064991A (en) * 2013-02-05 2013-04-24 杭州易和网络有限公司 Mass data clustering method
CN103425524A (en) * 2013-07-17 2013-12-04 北京邮电大学 Method and system for balancing multi-service terminal aggregation
CN104699772A (en) * 2015-03-05 2015-06-10 孟海东 Big data text classifying method based on cloud computing
CN106372114A (en) * 2016-08-23 2017-02-01 电子科技大学 Big data-based online analytical processing system and method
CN106844713A (en) * 2017-02-07 2017-06-13 北京微影时代科技有限公司 A kind of method and device of data cube generation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5245562A (en) * 1991-09-17 1993-09-14 The Johns Hopkins University Accumulating arithmetic memory integrated circuit
US8150723B2 (en) * 2009-01-09 2012-04-03 Yahoo! Inc. Large-scale behavioral targeting for advertising over a network
CN101763417A (en) * 2009-12-30 2010-06-30 北京世纪高通科技有限公司 Data query method and device
CN103678550B (en) * 2013-09-09 2017-02-08 南京邮电大学 Mass data real-time query method based on dynamic index structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064991A (en) * 2013-02-05 2013-04-24 杭州易和网络有限公司 Mass data clustering method
CN103425524A (en) * 2013-07-17 2013-12-04 北京邮电大学 Method and system for balancing multi-service terminal aggregation
CN104699772A (en) * 2015-03-05 2015-06-10 孟海东 Big data text classifying method based on cloud computing
CN106372114A (en) * 2016-08-23 2017-02-01 电子科技大学 Big data-based online analytical processing system and method
CN106844713A (en) * 2017-02-07 2017-06-13 北京微影时代科技有限公司 A kind of method and device of data cube generation

Also Published As

Publication number Publication date
CN107832347A (en) 2018-03-23

Similar Documents

Publication Publication Date Title
US11036735B2 (en) Dimension context propagation techniques for optimizing SQL query plans
CN109997126B (en) Event driven extraction, transformation, and loading (ETL) processing
US11789978B2 (en) System and method for load, aggregate and batch calculation in one scan in a multidimensional database environment
US10372723B2 (en) Efficient query processing using histograms in a columnar database
US11126616B2 (en) Streamlined creation and updating of olap analytic databases
US11681651B1 (en) Lineage data for data records
CN107729399B (en) Data processing method and device
US20150310082A1 (en) Hadoop olap engine
US10915532B2 (en) Supporting a join operation against multiple NoSQL databases
US9754015B2 (en) Feature rich view of an entity subgraph
US20190370599A1 (en) Bounded Error Matching for Large Scale Numeric Datasets
US10248668B2 (en) Mapping database structure to software
CN111078761A (en) Data probing method, device, equipment and storage medium
US10055421B1 (en) Pre-execution query optimization
CN107832347B (en) Data dimension reduction method and system and electronic equipment
US20220300503A1 (en) Querying distributed databases
Abdelhafez Big data technologies and analytics: A review of emerging solutions
US10503731B2 (en) Efficient analysis of distinct aggregations
CN113448957A (en) Data query method and device
CN110019162B (en) Method and device for realizing attribute normalization
US9760618B2 (en) Distributed iceberg cubing over ordered dimensions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant