CN113051302B

CN113051302B - Overall design-oriented multi-dimensional data matching method and device and computer storage medium

Info

Publication number: CN113051302B
Application number: CN202110419464.6A
Authority: CN
Inventors: 叶东; 孙兆伟; 张洪珠; 李晖; 高祥博; 赵翰墨
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2022-04-29
Anticipated expiration: 2041-04-19
Also published as: CN113051302A

Abstract

The embodiment of the invention discloses a multidimensional data matching method and device for overall design and a computer storage medium; the method can comprise the following steps: establishing a corresponding hash value index for each multidimensional data item in the multidimensional data table according to a set hash function; determining a hash value corresponding to the multidimensional data item to be matched according to the hash function corresponding to the matching strategy which is accurate matching, and searching a set number of first target multidimensional data items from the multidimensional data table according to the hash value corresponding to the multidimensional data item to be matched; and correspondingly, the matching strategy is similarity matching, the similarity between the multidimensional data item to be matched and each multidimensional data item is acquired item by item in the multidimensional data table based on a set weighted Euclidean distance strategy, and a set number of second target multidimensional data items with the highest similarity are selected from the multidimensional data table.

Description

Overall design-oriented multi-dimensional data matching method and device and computer storage medium

Technical Field

The embodiment of the invention relates to the technical field of information, in particular to a multi-dimensional data matching method and device for overall design and a computer storage medium.

Background

With the explosive expansion of data scale, the value implicit in the data is continuously increased, and mining valuable information and knowledge in the big data is a popular research mode at present. Among the numerous big data mining and machine learning problems, how to efficiently realize accurate matching and similarity matching among large-scale data is a fundamental problem. For example, taking data cleansing work as an example, firstly, redundant data needs to be calculated and deleted through accurate matching and similarity among data, so as to reduce waste of storage space; or when the retrieval query task is executed, the data input for query is quickly matched with the data in the database from massive data items to obtain the data which best meets the query problem.

For a parameter library of a large amount of scale, the data that can be acquired is not limited to simple data of a single dimension, but is a multidimensional data object having multiple attribute dimensions and numerical values, for example, an article of a certain type has multiple attributes such as quality and power at the same time. The similarity matching algorithm for the multi-dimensional data at present generally performs similarity calculation by using inter-object distance calculation, such as methods based on euclidean distance, minimum boundary moment, and the like. Since the similarity is calculated by only depending on the distance, the result obtained by matching is not the result which is most expected by the user.

Disclosure of Invention

In view of this, embodiments of the present invention are directed to providing a multidimensional data matching method, apparatus and computer storage medium for overall design; the time complexity of the matching process can be reduced.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a multidimensional data matching method for overall design, where the method includes:

establishing a corresponding hash value index for each multidimensional data item in the multidimensional data table according to a set hash function;

determining a hash value corresponding to the multidimensional data item to be matched according to the hash function corresponding to the matching strategy which is accurate matching, and searching a set number of first target multidimensional data items from the multidimensional data table according to the hash value corresponding to the multidimensional data item to be matched; the first target multi-dimensional data item is accurately matched with the multi-dimensional data item to be matched;

corresponding to the matching strategy is similarity matching, acquiring the similarity between the multidimensional data item to be matched and each multidimensional data item by item in the multidimensional data table based on a set weighted Euclidean distance strategy, and selecting a set number of second target multidimensional data items with the highest similarity from the multidimensional data table; and the second target multi-dimensional data item is matched with the multi-dimensional data item to be matched in a similar way.

In a second aspect, an embodiment of the present invention provides an overall design-oriented multidimensional data matching apparatus, including: establishing a part, an accurate matching part and a similarity matching part; wherein the content of the first and second substances,

the establishing part is configured to establish a corresponding hash value index for each multi-dimensional data item in the multi-dimensional data table according to a set hash function;

the accurate matching part is configured to correspond to a matching strategy as accurate matching, determine a hash value corresponding to a multidimensional data item to be matched according to the hash function, and search a set number of first target multidimensional data items from the multidimensional data table according to the hash value corresponding to the multidimensional data item to be matched; the first target multi-dimensional data item is accurately matched with the multi-dimensional data item to be matched;

the similarity matching part is configured to be similarity matching corresponding to a matching strategy, acquire similarity between the multidimensional data item to be matched and each multidimensional data item by item in the multidimensional data table based on a set weighted Euclidean distance strategy, and select a set number of second target multidimensional data items with the highest similarity from the multidimensional data table; and the second target multi-dimensional data item is matched with the multi-dimensional data item to be matched in a similar way.

In a third aspect, an embodiment of the present invention provides a computer storage medium, where the computer storage medium stores an overall design-oriented multidimensional data matching program, and the overall design-oriented multidimensional data matching program, when executed by at least one processor, implements the overall design-oriented multidimensional data matching method steps of the first aspect.

The embodiment of the invention provides a multidimensional data matching method and device for overall design and a computer storage medium; the hash value is used for carrying out accurate matching on the multidimensional data items, and in addition, the weighted Euclidean distance is used for carrying out similarity matching, so that the matching time complexity can be reduced under the condition of ensuring that the matching accuracy is not changed.

Drawings

Fig. 1 is a schematic flow chart of a multidimensional data matching method for overall design according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an implementation of exact matching provided by an embodiment of the present invention;

FIG. 3 is a schematic representation of multidimensional data provided by an embodiment of the present invention;

FIG. 4 is a diagram of a multidimensional data item to be matched according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an implementation scheme of similarity matching according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a multi-dimensional data matching apparatus for overall design according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware structure of a computing device according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

First, some terms related to the embodiments of the present invention are explained to facilitate understanding by those skilled in the art.

In the embodiment of the present invention, the multidimensional data table may be a data table having m dimensions and containing n data items, and each dimension may correspond to one attribute in a specific implementation process, so that the following description may be generic between "multidimensional" and "multiattribute". Each data item may be considered a multidimensional data item, and it is understood that in the current large-scale data scenario, the number of multidimensional data items in the multidimensional data table may be up to "ten thousand" in number, and in some examples, each multidimensional data item may correspond to each row in the multidimensional data table, and correspondingly, each column in the multidimensional data table corresponds to each dimension. Based on this, the multidimensional data matching scheme oriented to the overall design, which is set forth in the embodiments of the present invention, is also expected to search the multidimensional data item which is matched with the multidimensional data item to be matched in the multidimensional data table set forth above.

And the accurate matching means that the multidimensional data items to be matched are matched with each multidimensional data item in the multidimensional data table one by one in a dimensionality mode, so that the multidimensional data items which are completely consistent with the multidimensional data items to be matched in numerical values of each dimensionality are obtained.

Similarity matching refers to matching between a multidimensional data item to be matched and each multidimensional data item in a multidimensional data table one by one, but the numerical values of all dimensions may have matching conditions which cannot be completely the same.

The weighted euclidean distance is a weighted euclidean distance between the multidimensional data items x and y expressed by ad (x, y), and taking the multidimensional data items x and y as m-dimensional data items as an example, the formula can be expressed as follows:

wherein x is_iAnd y_iRespectively representing the values of the multidimensional data items x and y in the ith dimension; unlike the conventional Euclidean distance, the coefficient in front of each term's perfect square difference formula is a_i(1 ≦ i ≦ m) represents the weight corresponding to the ith dimension, rather than the constant 1.

Based on the above definition and explanation of related concepts, design parameters typically have multiple attributes during the overall design process of a satellite, and thus can be considered as multidimensional data. In some examples, selecting appropriate design parameters from an existing parameter library may be considered a multi-dimensional data matching task, from which the same or similar design parameters need to be obtained for matching. In the face of the task requirement, the embodiment of the invention is expected to provide a multidimensional data matching scheme oriented to overall design, and the time complexity of the matching process can be reduced under the condition of ensuring that the matching accuracy is not changed.

Based on this, referring to fig. 1, a multi-dimensional data matching method for an overall design is shown, which may include:

s101: establishing a corresponding hash value index for each multidimensional data item in the multidimensional data table according to a set hash function;

s102: determining a hash value corresponding to the multidimensional data item to be matched according to the hash function corresponding to the matching strategy which is accurate matching, and searching a set number of first target multidimensional data items from the multidimensional data table according to the hash value corresponding to the multidimensional data item to be matched; the first target multi-dimensional data item is accurately matched with the multi-dimensional data item to be matched;

s103: corresponding to the matching strategy is similarity matching, acquiring the similarity between the multidimensional data item to be matched and each multidimensional data item by item in the multidimensional data table based on a set weighted Euclidean distance strategy, and selecting a set number of second target multidimensional data items with the highest similarity from the multidimensional data table; and the second target multi-dimensional data item is matched with the multi-dimensional data item to be matched in a similar way.

For the technical solution shown in fig. 1, it should be noted that the hash value is used to perform the precise matching of the multidimensional data item, and in addition, the weighted euclidean distance is used to perform the similarity matching, so that the matching time complexity can be reduced under the condition of ensuring that the matching accuracy is not changed. Further, for the matching policy, it refers to a way of performing a matching search, and in some examples, it may be indicated by receiving a selection instruction of a user that the adopted matching policy is an exact match or a similarity match.

As shown in fig. 1, in some possible implementations, the establishing a corresponding hash value index for each multidimensional data item in the multidimensional data table according to a set hash function includes:

determining the hash function H (key) key% p according to a division residue method; the key represents a multidimensional data item to be subjected to hash operation, p represents the maximum prime number not greater than n, and n represents the number of the multidimensional data items in the multidimensional data table;

and calculating the hash value corresponding to each multidimensional data item in the multidimensional data table item by item according to the hash function, and establishing an index for each multidimensional data item.

For the above implementation, in the specific implementation process, for a multidimensional data table containing a plurality of multidimensional data items, first, a maximum prime number p not greater than the number n of the multidimensional data table items may be selected; then, a reasonable hash function is set for the multidimensional data table by adopting a remainder dividing and remaining method H (key) (-) -key% p, and a hash value H [ i ] corresponding to the multidimensional data item represented by each row in the multidimensional data table is calculated based on the hash function, wherein i is more than or equal to 0 and less than n, and an index is established for the hash value, and the index range is [1: n ]. It will be appreciated that n may equally well represent the number of rows of the multidimensional data table.

Based on the foregoing implementation manner, in some examples, referring to fig. 2, the determining, according to the hash function, a hash value corresponding to a multidimensional data item to be matched, and searching, according to the hash value corresponding to the multidimensional data item to be matched, a set number of first target multidimensional data items from the multidimensional data table includes:

s201: establishing a hash bucket structure which is used for storing hash conflicts and has the size of p;

s202: calculating the hash value of the multidimensional data item to be matched based on the hash function;

s203: searching hash values corresponding to all multidimensional data items in the multidimensional data table item by item according to the hash values of the multidimensional data items to be matched, and storing the searched multidimensional data items in the hash bucket structure when the hash values of the multidimensional data items to be matched are the same as the hash values corresponding to the searched multidimensional data items;

s204: and after the item-by-item search is finished, traversing the multidimensional data items stored in the hash bucket structure, and determining the multidimensional data items stored in the hash bucket structure as the first target multidimensional data items.

For the above example, in a specific implementation process, when performing the exact matching, a hash bucket structure may be constructed based on the hash value index of the multidimensional data table, where the size of the hash bucket structure is p, and a linked list of the hash bucket structure may be used to store elements where hash conflicts occur. After the hash bucket structure is built, accurate matching can be performed based on a matching strategy selected by a user or an instruction, specifically, a hash value hash of a multidimensional data item to be matched is calculated according to the hash function, then, the hash value is searched item by item in a hash value index of a multidimensional data table according to the hash, an ith multidimensional data item in the multidimensional data table is set, and if the hash is equal to Hi, the ith multidimensional data item is inserted into the tail of a linked list of the hash bucket structure until item by item searching is completed. At this time, the multidimensional data item stored in the hash bucket structure can be regarded as an exact match with the multidimensional data item to be matched. For example, if only one element exists in the hash bucket, it indicates that the multidimensional data item to be matched has been accurately matched and the result is unique; if the hash bucket has more than one element, the multidimensional data item to be matched is accurately matched and the result is not unique; and if no element exists in the hash bucket, the fact that no data item which is exactly matched with the multidimensional data item to be matched exists in the multidimensional data table is shown.

For example, taking the multidimensional data table with size of 2 × 8 shown in fig. 3 and the multidimensional data item to be matched with size of 1 × 8 shown in fig. 4 as an example, 8 dimensions are set and respectively labeled as A, B, C, D, E, F, G, H

The first multidimensional data item in the multidimensional data table and the multidimensional data item to be matched are determined to be obtained through the exact matching scheme set forth in the above example; and the second multidimensional data item in the multidimensional data table is precisely matched with the multidimensional data item to be matched.

From the exact matching scheme set forth in the foregoing example, it can be seen that, in the conventional scheme, for one tool, the matching is performed item by item and dimension by dimension, as compared to the conventional exact matching schemeThe data table has m dimensionalities and comprises n data items, and a mode of sequentially storing the multidimensional data table is adopted in the matching process, so that a larger storage space is occupied; the exact matching scheme described in the foregoing example can use a hash table mode of hash value index to perform storage, which significantly saves storage space. Furthermore, in conventional schemes, the temporal complexity of exact matching is o (mn); whereas for the exact matching scheme set forth in the preceding example, its time complexity can be reduced to

And moreover, the hash collision is processed by adopting a hash bucket matching mode, the elements with the hash collision are stored in the linked list of the same bucket, and the insertion of other elements in the hash searching process is not influenced, so that the time complexity of the accurate matching is reduced under the condition of ensuring the accuracy consistency with the accurate matching of the conventional scheme, and the efficiency of the accurate matching is improved.

As shown in fig. 1, in some possible implementation manners, the obtaining, item by item, a similarity between the multidimensional data item to be matched and each multidimensional data item in the multidimensional data table based on the set weighted euclidean distance policy includes:

for each multidimensional data item within the multidimensional data table, performing the following steps item by item:

aiming at the ith multi-dimensional data item in the multi-dimensional data table, the multi-dimensional data item y to be matched and the ith multi-dimensional data item x are obtained according to the following formula_iWeighted euclidean distance between:

where 1 ≦ i ≦ n, n represents the number of multidimensional data items in the multidimensional data table, m represents the number of dimensions of the multidimensional data table or the multidimensional data item to be matched, a_jRepresents the weight value, x, corresponding to the jth dimension_i,jRepresenting the ith multidimensional data item x_iNumber of dimension jAccording to the value of y_jRepresenting the j dimension data value in the multi-dimensional data item y to be matched;

and according to the multidimensional data item y to be matched and the ith multidimensional data item x_iThe weighted Euclidean distance between the first multidimensional data item and the second multidimensional data item is obtained according to the following formula_iA similarity value theta (x) with the multi-dimensional data item to be matched y_i,y)：

For the above implementation manner, in combination with the definition and explanation of the foregoing related concepts, in detail, starting from the first multidimensional data item in the multidimensional data table, the weighted euclidean distance between the multidimensional data item to be matched and the multidimensional data item is calculated item by item, and it can be understood that, in the multidimensional data matching process, the importance of each dimension is not uniform, and a user may prefer to consider some attributes of the data, and correspondingly consider some other attributes, based on which, each dimension corresponds to a corresponding weight value to represent the importance degree of the dimension on the user side. In the process of calculating the weighted Euclidean distance item by item, the similarity value between the multidimensional data item to be matched and the weighted Euclidean distance can be calculated continuously. For the similarity value, it can be understood that the larger the value, the smaller the difference between the multi-dimensional data item to be matched and the representation, and the more similar the two. If the similarity value theta between a certain multidimensional data item in the multidimensional data table and the multidimensional data item to be matched is 1, the multidimensional data item and the multidimensional data item to be matched are completely identical, namely the degree of accurate matching is achieved.

For the above implementation, in the process of obtaining similarity values item by item, it is further required to timely store multidimensional data items that are very similar to the multidimensional data item to be matched in the multidimensional data table, and in some examples, referring to fig. 5, the selecting a set number of second target multidimensional data items with the highest similarity from the multidimensional data table includes:

s501: constructing a minimum heap structure which is used for storing a second target multi-dimensional data item and has the size of k, and initializing a heap top element value of the minimum heap structure to an index value of a first multi-dimensional data item in the multi-dimensional data table;

s502: in the multidimensional data table, comparing similarity values with similarity values of multidimensional data items corresponding to the heap top element values item by item starting from a second multidimensional data item, and if the similarity values of the compared multidimensional data items are greater than the similarity values of the multidimensional data items corresponding to the heap top element values, inserting indexes of the compared multidimensional data items into the heap top of the minimum heap structure, and sorting the inserted minimum pair structures;

s503: and determining the multidimensional data item corresponding to the element value in the minimum heap structure after item-by-item comparison in the multidimensional data table as the second target multidimensional data item.

For the above example, specifically, first, a minimum heap of size k may be constructed, and a heap top element of the minimum heap is defaulted to an index value of a first multidimensional data item in the multidimensional data table; then, starting from the second multidimensional data item, in the subsequent item-by-item similarity value acquisition process, the similarity value is compared with the similarity value size of the multidimensional data item corresponding to the top element of the minimum heap: and if the similarity value of the multidimensional data item is larger than the similarity value of the corresponding multidimensional data item, inserting the index value of the compared multidimensional data item into the heap top element, and sequencing the inserted minimum heap structure. And when the item-by-item traversal of the multidimensional data table is completed, the finally obtained elements recorded in the minimum heap structure are k multidimensional data items which are most similar to the multidimensional data items to be matched in the multidimensional data table, namely second target multidimensional data items.

For example, still taking the multidimensional data table shown in fig. 3 and the multidimensional data item to be matched shown in fig. 4 as an example, it can be known through the similarity matching scheme set forth in the foregoing implementation manner and examples thereof that the similarity value between the first multidimensional data item in the multidimensional data table and the multidimensional data item to be matched is smaller than 1, which indicates that a certain difference exists between the two items; and the similarity value between the second multidimensional data item in the multidimensional data table and the multidimensional data item to be matched is 1, which indicates that the data contents of the two multidimensional data items are completely the same.

According to the similarity matching scheme described in the above implementation manner and the example thereof, compared with a processing manner of performing similarity matching item by item and dimension by using a non-weighted euclidean distance in a conventional similarity matching scheme, it can be known that: because the similarity is obtained by adopting a weighted Euclidean distance mode, the finally obtained second target multidimensional data item is more inclined to the expectation of the user, and the matching accuracy is improved. Furthermore, for a data table with m dimensions and containing n data items, the time complexity of the conventional scheme is o (mn); the similarity matching scheme described in the above implementation manner and the example thereof reduces the operation complexity to o (n), thereby improving the similarity matching efficiency.

Based on the same inventive concept of the foregoing technical solution, referring to fig. 6, a multi-dimensional data matching apparatus 60 for general design according to an embodiment of the present invention is shown, where the apparatus 60 includes: a creation section 601, an exact matching section 602, and a similarity matching section 603; wherein the content of the first and second substances,

the establishing part 601 is configured to establish a corresponding hash value index for each multidimensional data item in the multidimensional data table according to a set hash function;

the exact matching part 602 is configured to determine a hash value corresponding to the multidimensional data item to be matched according to the hash function, and search a set number of first target multidimensional data items from the multidimensional data table according to the hash value corresponding to the multidimensional data item to be matched, corresponding to the matching policy as exact matching; the first target multi-dimensional data item is accurately matched with the multi-dimensional data item to be matched;

the similarity matching part 603 is configured to obtain similarity between the multidimensional data item to be matched and each multidimensional data item by item in the multidimensional data table based on a set weighted euclidean distance strategy, and select a set number of second target multidimensional data items with the highest similarity from the multidimensional data table; and the second target multi-dimensional data item is matched with the multi-dimensional data item to be matched in a similar way.

In the above scheme, the establishing part 601 is configured to:

In the above scheme, the exact match portion 602 is configured to:

establishing a hash bucket structure which is used for storing hash conflicts and has the size of p;

calculating the hash value of the multidimensional data item to be matched based on the hash function;

searching hash values corresponding to all multidimensional data items in the multidimensional data table item by item according to the hash values of the multidimensional data items to be matched, and storing the searched multidimensional data items in the hash bucket structure when the hash values of the multidimensional data items to be matched are the same as the hash values corresponding to the searched multidimensional data items;

and after the item-by-item search is finished, traversing the multidimensional data items stored in the hash bucket structure, and determining the multidimensional data items stored in the hash bucket structure as the first target multidimensional data items.

In the above scheme, the similarity matching section 603 is configured to:

where 1 ≦ i ≦ n, n represents the number of multidimensional data items in the multidimensional data table, m represents the number of dimensions of the multidimensional data table or the multidimensional data item to be matched, a_jRepresents the weight value, x, corresponding to the jth dimension_i,jRepresenting the ith multidimensional data item x_iData value of the j-th dimension, y_jRepresenting the j dimension data value in the multi-dimensional data item y to be matched;

In the above scheme, the similarity matching section 603 is configured to:

constructing a minimum heap structure which is used for storing a second target multi-dimensional data item and has the size of k, and initializing a heap top element value of the minimum heap structure to an index value of a first multi-dimensional data item in the multi-dimensional data table;

in the multidimensional data table, comparing similarity values with similarity values of multidimensional data items corresponding to the heap top element values item by item starting from a second multidimensional data item, and if the similarity values of the compared multidimensional data items are greater than the similarity values of the multidimensional data items corresponding to the heap top element values, inserting indexes of the compared multidimensional data items into the heap top of the minimum heap structure, and sorting the inserted minimum pair structures;

and determining the multidimensional data item corresponding to the element value in the minimum heap structure after item-by-item comparison in the multidimensional data table as the second target multidimensional data item.

It is understood that in this embodiment, "part" may be part of a circuit, part of a processor, part of a program or software, etc., and may also be a unit, and may also be a module or a non-modular.

In addition, each component in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Therefore, the present embodiment provides a computer storage medium, where the computer storage medium stores a multi-dimensional data matching program for overall design, and the multi-dimensional data matching program for overall design, when executed by at least one processor, implements the steps of the multi-dimensional data matching method for overall design in the foregoing technical solution.

Referring to fig. 7, a specific hardware structure of a computing device 70 capable of implementing the above-mentioned overall design-oriented multidimensional data matching apparatus 60 according to the embodiment of the present invention is shown, wherein the computing device 70 can be a wireless device, a mobile or cellular phone (including a so-called smart phone), a Personal Digital Assistant (PDA), a video game console (including a video display, a mobile video game apparatus, a mobile video conference unit), a laptop computer, a desktop computer, a television set-top box, a tablet computing apparatus, an e-book reader, a fixed or mobile media player, etc. The computing device 70 includes: a communication interface 701, a memory 702, and a processor 703; the various components are coupled together by a bus system 704. It is understood that the bus system 704 is used to enable communications among the components. The bus system 704 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled in fig. 7 as the bus system 704. Wherein the content of the first and second substances,

the communication interface 701 is configured to receive and transmit signals in a process of receiving and transmitting information with other external network elements;

the memory 702 is used for storing a computer program capable of running on the processor 703;

the processor 703 is configured to, when running the computer program, perform the following steps:

It is to be understood that the memory 702 in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 702 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The processor 703 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method may be implemented by hardware integrated logic circuits in the processor 703 or by instructions in the form of software. The Processor 703 may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 702, and the processor 703 reads the information in the memory 702 and performs the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Specifically, when the processor 703 is further configured to run the computer program, the steps of the multidimensional data matching method for the general design in the foregoing technical solution are executed, which are not described herein again.

It should be understood that the above-mentioned exemplary technical solutions of the overall design-oriented multidimensional data matching apparatus 60 and the computing device 70 belong to the same concept as the technical solution of the overall design-oriented multidimensional data matching method, and therefore, the above-mentioned detailed contents that are not described in detail for the technical solutions of the overall design-oriented multidimensional data matching apparatus 60 and the computing device 70 can be referred to the description of the technical solution of the overall design-oriented multidimensional data matching method. The embodiments of the present invention will not be described in detail herein.

It should be noted that: the technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An ensemble design oriented multidimensional data matching method, the method comprising:

2. The method according to claim 1, wherein the establishing a corresponding hash value index for each multidimensional data item in the multidimensional data table according to the set hash function comprises:

3. The method according to claim 2, wherein the determining a hash value corresponding to the multidimensional data item to be matched according to the hash function, and searching a set number of first target multidimensional data items from the multidimensional data table according to the hash value corresponding to the multidimensional data item to be matched comprises:

4. The method according to claim 1, wherein the obtaining the similarity between the multidimensional data item to be matched and each multidimensional data item by item in the multidimensional data table based on the set weighted euclidean distance strategy comprises:

aiming at the ith multi-dimensional data item in the multi-dimensional data table, the multi-dimensional data item y to be matched and the ith multi-dimensional data item are obtained according to the following formulax_iWeighted euclidean distance between:

5. The method of claim 4, wherein selecting a set number of second target multidimensional data items from the multidimensional data table with the highest similarity comprises:

6. An overall design oriented multidimensional data matching apparatus, the apparatus comprising: establishing a part, an accurate matching part and a similarity matching part; wherein the content of the first and second substances,

7. The apparatus of claim 6, wherein the exact match portion is configured to:

8. The apparatus of claim 6, wherein the similarity matching section is configured to:

9. The apparatus of claim 8, wherein the similarity matching section is configured to:

10. A computer storage medium, characterized in that the computer storage medium stores a multi-dimensional data matching program for overall design, and the multi-dimensional data matching program for overall design is executed by at least one processor to realize the steps of the multi-dimensional data matching method for overall design according to any one of claims 1 to 5.