CN113672619A - Method for segmenting data more uniformly according to hash rule - Google Patents

Method for segmenting data more uniformly according to hash rule Download PDF

Info

Publication number
CN113672619A
CN113672619A CN202110942746.4A CN202110942746A CN113672619A CN 113672619 A CN113672619 A CN 113672619A CN 202110942746 A CN202110942746 A CN 202110942746A CN 113672619 A CN113672619 A CN 113672619A
Authority
CN
China
Prior art keywords
data
hash
sampling
buckets
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110942746.4A
Other languages
Chinese (zh)
Other versions
CN113672619B (en
Inventor
赵伟
李南锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Nankai University General Data Technologies Co ltd
Original Assignee
Tianjin Nankai University General Data Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Nankai University General Data Technologies Co ltd filed Critical Tianjin Nankai University General Data Technologies Co ltd
Priority to CN202110942746.4A priority Critical patent/CN113672619B/en
Publication of CN113672619A publication Critical patent/CN113672619A/en
Application granted granted Critical
Publication of CN113672619B publication Critical patent/CN113672619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a method for segmenting data more uniformly according to a hash rule, which comprises the steps of firstly calculating the number of hash buckets according to the set memory size, then sampling data sets to be segmented, recording the occurrence times of the same data in the sampling process, then sequencing the recorded data and the occurrence times of the data according to the occurrence times, recording the data at the top end to form topN data information, and then independently dividing to form independent hash data blocks. According to the method for segmenting the data more uniformly according to the hash rule, the data blocks are segmented more uniformly, so that a plurality of threads can finish work at the same time, and the problem that the processing time is too long due to the huge data volume of the segmentation of a single thread is solved.

Description

Method for segmenting data more uniformly according to hash rule
Technical Field
The invention belongs to the field of databases, and particularly relates to a method for segmenting data more uniformly according to a hash rule.
Background
Join operations of a database refer to the association of two tables during a query process to form a set of rows of two tables of a cartesian product, usually plus a where condition to filter out unwanted rows to obtain the combination of rows of the two tables that is really needed.
When the two tables are subjected to correlation query, the connection condition of the two tables is usually specified, and in many cases, an equivalent condition of a related column of the two tables is specified, for example, select x from t1, t2 where t1.a is t2.a, when processing is performed in a database kernel, the processing is performed through multi-thread parallel computing, before starting multi-thread processing, data of the two tables needs to be split, so that data with the same value can fall into the same thread for processing, and the process usually adopts a hash algorithm to split, so that data with the same hash value is placed into the same data block.
However, the problem is that some data with the same hash value are too huge, and there may be uneven data blocks split out, so that the threads spend more time in computing and processing the data blocks than other threads, and the overall efficiency of the system is affected by the threads to form a barrel effect, so that the more uniform data division can improve the system efficiency.
In addition, the efficiency of the system is influenced by the arrangement of the number of the hash buckets, if the number of the hash buckets is too small, all data cannot be loaded into the memory, the data can be repeatedly read from the disk, so that the number of the hash buckets is evaluated and calculated, and the number of the hash buckets is estimated according to the size of the memory and data information.
Disclosure of Invention
In view of this, the present invention is directed to provide a method for splitting data according to a hash rule to make the data more uniform, so as to solve the problem that a single thread has too long processing time due to huge amount of data to be split.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method for splitting data according to a hash rule to make the data more uniform comprises the following steps:
s1, sampling the data to be divided, and recording the occurrence times of the same data to be divided in the sampling process;
s2, sequencing the sampled data according to the occurrence times to form topN data information;
s3, carrying out hash bucket quantity evaluation by combining the size of the configured memory and the data quantity;
s4, segmenting into data block files through a hash algorithm according to the number of hash buckets and topN data information, and counting the average data number of data in each data block file;
and S5, judging whether the average data number of the data set in each data block file meets the requirement or not according to the set conditions, repeating the steps S2-S4 if the average data number of the data set in each data block file meets the requirement, and otherwise, finishing the segmentation.
Further, in the step S1, sampling is performed in proportion, and the sampling process specifically includes the following steps:
firstly, determining the number of sampling strips: taking 10% of the total amount of data as the total number to be sampled according to the total amount of data;
and step two, calculating sampling points: distributing the total data strips according to 100 parts, and selecting each part of data strips as an initial position as a sampling initial point;
thirdly, calculating the number of data to be sampled of each sampling point: and dividing the calculated number of the sampled data by 100 to obtain the number of the data to be sampled at each sampling point.
Further, the hash bucket number evaluation performed in step S3 is obtained by the following evaluation formula:
the hash bucket number (total number of data pieces x (1-data repetition rate))/number of data pieces that can be stored in the memory.
Further, the process of segmenting the hash algorithm into the data block files in step S4 is as follows:
taking out a piece of data from the topN data, calculating the hash value of the data through a hash algorithm, and obtaining an integer value by using a crc32 algorithm; dividing the integer value by the number of the hash buckets to obtain the serial numbers of the buckets, and putting the data into the corresponding buckets according to the serial numbers of the buckets.
Further, the conditions set in step S5 are: data that exceed multiples of the average number of data pieces.
Compared with the prior art, the method for segmenting the data more uniformly according to the hash rule has the following beneficial effects:
(1) according to the method for segmenting the data more uniformly according to the hash rule, the data blocks are segmented more uniformly, so that a plurality of threads can finish work at the same time, and the problem that the processing time is too long due to the huge data volume of the segmentation of a single thread is solved.
(2) The method for splitting the data according to the hash rule to enable the data to be more uniform determines the hash bucket according to the size of the memory so that the data can be completely loaded into the memory, thereby avoiding multi-pass processing caused by insufficient memory in the operation process and saving a lot of time.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic flow chart of a method for splitting data according to a hash rule to make the data more uniform according to an embodiment of the present invention;
fig. 2 is an operation diagram of a method for splitting data according to a hash rule to make the data more uniform according to the embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a process of segmenting data according to an embodiment of the present invention;
fig. 4 is a data processing flow chart of a method for splitting data according to a hash rule to make the data more uniform according to the embodiment of the present invention;
fig. 5 is a schematic flow chart of the hash algorithm being divided into data block files according to the embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1 to 5, a method for segmenting data according to a hash rule to make the data more uniform obtains data information of topN with the largest occurrence frequency by sampling the data to be segmented, compares the data with the information in the topN when performing hash segmentation, and directly forms the data into separate data blocks without performing hash operation segmentation if the data occurs in the topN;
in the divided data blocks, the average line number of the data blocks is counted, the data blocks which are larger than the average line number by a certain multiple are continuously analyzed for data information, the data information which is most appeared in the data blocks is found out and is supplemented to the previous data information of the topN, the data statistics of the topN can be more accurate, and the topN information can be directly used when the same data is cut next time.
As shown in fig. 1 to 4, the specific method includes the following steps:
s1, sampling the data to be divided, and recording the occurrence times of the same data to be divided in the sampling process;
s2, sequencing the sampled data according to the occurrence times to form topN data information;
s3, carrying out hash bucket quantity evaluation by combining the size of the configured memory and the data quantity;
s4, segmenting into data block files through a hash algorithm according to the number of hash buckets and topN data information, and counting the average data number of data in each data block file;
and S5, judging whether the average data number of the data set in each data block file meets the requirement or not according to the set conditions, repeating the steps S2-S4 if the average data number of the data set in each data block file meets the requirement, and otherwise, finishing the segmentation.
In the step S1, sampling is performed in proportion, and the sampling process specifically includes the following steps:
firstly, determining the number of sampling strips: taking 10% of the total amount of data as the total number to be sampled according to the total amount of data;
calculating a sampling point: distributing the total data strips according to 100 parts, and selecting each part of data strips as an initial position as a sampling initial point;
calculating the number of data to be sampled of each sampling point: and dividing the calculated number of the sampled data by 100 to obtain the number of the data to be sampled at each sampling point.
In step S1, the number of times of occurrence of the same data is recorded in the sampling process, here, a map container in the data structure is used, a key of the map is set as the data record, a value is set as the number of times of occurrence, when each piece of data is processed, the container is searched first, if the same key is found, the number of times of occurrence in the value is accumulated, if not found, the key is used as a new element and inserted into the map, and the value is set to 1 at the same time.
The hash bucket number evaluation performed in step S3 is obtained by the following evaluation formula:
the hash bucket number (total number of data pieces x (1-data repetition rate))/number of data pieces that can be stored in the memory.
As shown in fig. 5, the process of splitting the hash algorithm into data block files in step S4 is as follows:
taking out a piece of data from the topN data, calculating the hash value of the data through a hash algorithm, and obtaining an integer value by using a crc32 algorithm; dividing the integer value by the number of the hash buckets to obtain the serial numbers of the buckets, and putting the data into the corresponding buckets according to the serial numbers of the buckets.
The conditions set in step S5 are: data that exceed multiples of the average number of data pieces;
the specific analysis method of the data blocks is similar to the processing method in steps S2, S3, and S4, the topN information of the larger data blocks can be obtained after the analysis, and the data information is further improved into the whole topN data information, so that more data in the large data blocks can be divided independently when the next segmentation is performed, and the data in the data blocks can be reduced.
If the same data is subjected to hash segmentation next time, historical topN data information is used, and time and expense are saved due to the reuse of the data information.
The method for dividing data according to a hash rule to enable the data to be more uniform in the patent comprises the steps of firstly calculating the number of hash buckets according to the size of a set memory, wherein the purpose of the step is to enable the data falling into each hash bucket after the hash algorithm (hash algorithm) is divided to be completely loaded into the memory as much as possible, then sampling a data set to be divided, recording the times of the same data in the sampling process, then sequencing the recorded data and the times of the data according to the times of the data, finding out more data through sequencing, recording the data at the top to form topN data information, then directly and independently dividing the data to form independent hash data blocks without calculating the hash values of the independent hash data blocks in the process of dividing, and thus avoiding the situation that different data with the same hash value are distributed into the same hash data block, finally, data with more occurrence times are independently divided into an independent data block, so that the hash division is more uniform. In addition, in the result of the segmented data block, the average data number of the data block is calculated, the data block which is larger than the average data number by a certain proportion is subjected to information statistics and segmentation once again, through the step, analysis is carried out in the actually segmented large data block, the information statistics of the data can be more accurate, and finally the information is perfected into the topN data record, and when the same data is segmented next time, the information can be directly used without sampling and calculating related data information again.
The specific embodiment is as follows:
this patent exemplifies t1(a int, b varchar (20)), t2(a int, c varchar (20), select t1.a, b, cfrom t1, t2 where t1.a ═ t2. a;
the function varchar () is a conversion type function, int is an integer function, Select represents initiating a query in the database, followed by a table name to be projected, a column name, from is followed by a table to be queried, where represents some conditional constraints, and the meaning of this statement is from associating tables t1 and t2, querying for a, b, c column values satisfying t1.a ═ t2. a.
1. Carrying out hash barrel quantity evaluation according to a t1 table, wherein the data number is 100000, 10000 can be stored in a memory, and the data repetition rate is 5%, then
The number of Hash barrels is 100000 (1-0.05)/10000
2. Data sampling
And (4) sampling data of the t1 and the t2 according to a hash column, namely a column a of the table, and counting data with a large occurrence number.
Data of Number of occurrences
5 10000
2 30000
8 20000
3. And sorting the statistical data according to the occurrence times, and taking the top N.
Data of Number of occurrences
2 30000
8 20000
5 10000
4. And when data is cut, checking whether the data appears in the statistical data.
The values appearing in the statistical data are independently segmented without carrying out hash operation to form single-value data blocks;
that is, when the data are observed to be several data of 2, 8 and 5, each data is correspondingly divided into independent data blocks;
carrying out hash operation on values which do not appear in the statistical data, and dividing the values into corresponding hash buckets to form hash data blocks;
for example, if the rest data values are 9, 7 and 6, performing hash operation to obtain a hash value, and then dropping the hash value into a corresponding hash bucket.
5. After the division is completed, the average line number of the hash data block is counted, for example, the total number of the data blocks is 100000, the number of the hash buckets is 20, and the average line number is 5000.
6. And (3) counting the data blocks which are larger than the average line number by a certain multiple again, for example, the number of the data strips of a hash bucket is 50000 and is larger than the average line number by 10 times, selecting the data blocks to carry out statistics again, and carrying out the statistics according to the mode in 23 by using the statistical method to obtain the top N of the large data blocks.
7. And (4) completing the statistical data, and combining the topN of the large data block and the overall topN to form a new overall statistical data record so as to be reused.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1.A method for splitting data more uniformly according to a hash rule is characterized by comprising the following steps:
s1, sampling the data to be divided, and recording the occurrence times of the same data to be divided in the sampling process;
s2, sequencing the sampled data according to the occurrence times to form topN data information;
s3, carrying out hash bucket quantity evaluation by combining the size of the configured memory and the data quantity;
s4, segmenting into data block files through a hash algorithm according to the number of hash buckets and topN data information, and counting the average data number of data in each data block file;
and S5, judging whether the average data number of the data set in each data block file meets the requirement or not according to the set conditions, repeating the steps S2-S4 if the average data number of the data set in each data block file meets the requirement, and otherwise, finishing the segmentation.
2. The method for splitting data according to the hash rule to make the data more uniform as claimed in claim 1, wherein the sampling in step S1 is performed in proportion, and the sampling process specifically includes the following steps:
firstly, determining the number of sampling strips: taking 10% of the total amount of data as the total number to be sampled according to the total amount of data;
and step two, calculating sampling points: distributing the total data strips according to 100 parts, and selecting each part of data strips as an initial position as a sampling initial point;
thirdly, calculating the number of data to be sampled of each sampling point: and dividing the calculated number of the sampled data by 100 to obtain the number of the data to be sampled at each sampling point.
3. The method for splitting data more uniformly according to the hash rule as claimed in claim 1, wherein the evaluation of the number of hash buckets in step S3 is obtained by the following evaluation formula:
the hash bucket number (total number of data pieces x (1-data repetition rate))/number of data pieces that can be stored in the memory.
4. The method for splitting data into more uniform data blocks according to the hash rule as claimed in claim 3, wherein the hash algorithm is split into the data block files in step S4 as follows:
taking out a piece of data from the topN data, calculating the hash value of the data through a hash algorithm, and obtaining an integer value by using a crc32 algorithm; dividing the integer value by the number of the hash buckets to obtain the serial numbers of the buckets, and putting the data into the corresponding buckets according to the serial numbers of the buckets.
5. The method for splitting data according to the hash rule to be more uniform as claimed in claim 1, wherein the conditions set in the step S5 are: data that exceed multiples of the average number of data pieces.
CN202110942746.4A 2021-08-17 2021-08-17 Method for segmenting data according to hash rule to make data more uniform Active CN113672619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110942746.4A CN113672619B (en) 2021-08-17 2021-08-17 Method for segmenting data according to hash rule to make data more uniform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110942746.4A CN113672619B (en) 2021-08-17 2021-08-17 Method for segmenting data according to hash rule to make data more uniform

Publications (2)

Publication Number Publication Date
CN113672619A true CN113672619A (en) 2021-11-19
CN113672619B CN113672619B (en) 2024-02-06

Family

ID=78543252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110942746.4A Active CN113672619B (en) 2021-08-17 2021-08-17 Method for segmenting data according to hash rule to make data more uniform

Country Status (1)

Country Link
CN (1) CN113672619B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292373A (en) * 2022-10-09 2022-11-04 天津南大通用数据技术股份有限公司 Method and device for segmenting data block

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN102722583A (en) * 2012-06-07 2012-10-10 无锡众志和达存储技术有限公司 Hardware accelerating device for data de-duplication and method
CN106411654A (en) * 2016-10-27 2017-02-15 任子行网络技术股份有限公司 Method and device for processing network traffic analysis
CN106709001A (en) * 2016-12-22 2017-05-24 西安电子科技大学 Cardinality estimation method aiming at streaming big data
CN107766258A (en) * 2017-09-27 2018-03-06 精硕科技(北京)股份有限公司 Memory storage method and apparatus, memory lookup method and apparatus
CN108256003A (en) * 2017-12-29 2018-07-06 天津南大通用数据技术股份有限公司 A kind of method that union operation efficiencies are improved according to analysis Data duplication rate
CN110580307A (en) * 2019-08-09 2019-12-17 北京大学 Processing method and device for fast statistics
CN111538730A (en) * 2020-04-30 2020-08-14 福建天晴数码有限公司 Data statistics method and system based on Hash bucket algorithm
CN112463795A (en) * 2020-11-26 2021-03-09 杭州安恒信息技术股份有限公司 Dynamic hash method, device, equipment and storage medium
CN112612614A (en) * 2020-12-28 2021-04-06 江苏苏宁云计算有限公司 Data sorting method, device and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN102722583A (en) * 2012-06-07 2012-10-10 无锡众志和达存储技术有限公司 Hardware accelerating device for data de-duplication and method
CN106411654A (en) * 2016-10-27 2017-02-15 任子行网络技术股份有限公司 Method and device for processing network traffic analysis
CN106709001A (en) * 2016-12-22 2017-05-24 西安电子科技大学 Cardinality estimation method aiming at streaming big data
CN107766258A (en) * 2017-09-27 2018-03-06 精硕科技(北京)股份有限公司 Memory storage method and apparatus, memory lookup method and apparatus
CN108256003A (en) * 2017-12-29 2018-07-06 天津南大通用数据技术股份有限公司 A kind of method that union operation efficiencies are improved according to analysis Data duplication rate
CN110580307A (en) * 2019-08-09 2019-12-17 北京大学 Processing method and device for fast statistics
CN111538730A (en) * 2020-04-30 2020-08-14 福建天晴数码有限公司 Data statistics method and system based on Hash bucket algorithm
CN112463795A (en) * 2020-11-26 2021-03-09 杭州安恒信息技术股份有限公司 Dynamic hash method, device, equipment and storage medium
CN112612614A (en) * 2020-12-28 2021-04-06 江苏苏宁云计算有限公司 Data sorting method, device and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292373A (en) * 2022-10-09 2022-11-04 天津南大通用数据技术股份有限公司 Method and device for segmenting data block

Also Published As

Publication number Publication date
CN113672619B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN109164980B (en) Aggregation optimization processing method for time sequence data
US8112421B2 (en) Query selection for effectively learning ranking functions
EP3117347B1 (en) Systems and methods for rapid data analysis
CN106202280B (en) Information processing method and server
US6847924B1 (en) Method and system for aggregating data distribution models
EP1695192B1 (en) Automatic monitoring and statistical analysis of dynamic process metrics to expose meaningful changes
US20030088542A1 (en) System and methods for display of time-series data distribution
US9396247B2 (en) Method and device for processing a time sequence based on dimensionality reduction
CN105468651B (en) Relational database data query method and system
US7478083B2 (en) Method and system for estimating cardinality in a database system
US10452676B2 (en) Managing database with counting bloom filters
US20110113026A1 (en) Scalable Computation of Data
CN113672619A (en) Method for segmenting data more uniformly according to hash rule
US8515993B2 (en) Methods and apparatus for processing a database query
CN117171157B (en) Clearing data acquisition and cleaning method based on data analysis
CN105408883A (en) Database table column annotation
CN113360551B (en) Method and system for storing and rapidly counting time sequence data in shooting range
CN111461617B (en) Inventory counting method and device, computer equipment and storage medium
CN115690681A (en) Processing method of abnormity judgment basis, abnormity judgment method and device
CN111897803B (en) Database integrity evaluation method for power industry service system
CN110704433B (en) Brin index construction method of columnar storage data, data retrieval method and device
CN110196974A (en) A kind of rapid data polymerization for big data cleaning
CN114367547B (en) Statistical method and device for rolling data
CN117077598B (en) 3D parasitic parameter optimization method based on Mini-batch gradient descent method
US20150331867A1 (en) Adaptive short lists and acceleration of biometric database search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant