CN113568894A - Data redundancy processing method and device for database, electronic equipment and storage medium - Google Patents
Data redundancy processing method and device for database, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113568894A CN113568894A CN202010348004.4A CN202010348004A CN113568894A CN 113568894 A CN113568894 A CN 113568894A CN 202010348004 A CN202010348004 A CN 202010348004A CN 113568894 A CN113568894 A CN 113568894A
- Authority
- CN
- China
- Prior art keywords
- summary table
- matrix
- database
- similarity
- structure data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 124
- 238000012545 processing Methods 0.000 claims abstract description 48
- 238000006243 chemical reaction Methods 0.000 claims abstract description 26
- 238000013507 mapping Methods 0.000 claims abstract description 23
- 238000000547 structure data Methods 0.000 claims description 55
- 239000000203 mixture Substances 0.000 claims description 45
- 238000000034 method Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 11
- 230000000694 effects Effects 0.000 abstract description 8
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012938 design process Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a data redundancy processing method and device for a database, electronic equipment and a storage medium. The data redundancy processing method of the database comprises the following steps: acquiring a first summary table and a second summary table prestored in a database; respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information; determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix; when the first similarity is determined to reach the similarity threshold, based on a preset mapping relation, executing a first data redundancy processing operation corresponding to the first similarity aiming at the database; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation. According to the data redundancy processing method of the database, the data redundancy processing effect of the database can be improved.
Description
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a data redundancy processing method and device for a database, electronic equipment and a computer storage medium.
Background
With the market changing and the amazing survey, the market change is dealt with by utilizing the big data analysis, the market operation capacity and the efficiency are improved, and effective bases are provided for analysis and decision making of all departments of a company. Therefore, data redundancy detection and data redundancy processing need to be performed on the database, so that a great deal of waste of storage space and computing resources of the database is avoided.
At present, a common data redundancy processing method of a database is as follows: 1. the big data redundancy detection method based on the file similarity mainly detects redundant data by comparing hash values of stored files, analyzes and judges redundant data blocks only from the angle of the files, and can only effectively reduce data storage capacity. 2. A method for eliminating redundancy modes in a database design process can only eliminate the redundancy problem of table design of a relational database based on polynomial time complexity. 3. A redundant data detection method in a three-dimensional data patch aims at image data redundant data of a three-dimensional object and only solves the problems of storage space occupation and subsequent influence on three-dimensional imaging processing of the three-dimensional data. However, these methods have difficulty in accurately determining redundant data, resulting in poor data redundancy processing of the database.
Therefore, how to improve the data redundancy processing effect of the database is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The embodiment of the invention provides a data redundancy processing method and device for a database, electronic equipment and a computer storage medium, which can improve the data redundancy processing effect of the database.
In a first aspect, an embodiment of the present invention provides a data redundancy processing method for a database, including:
acquiring a first summary table and a second summary table prestored in a database;
respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
when the first similarity is determined to reach the similarity threshold, based on a preset mapping relation, executing a first data redundancy processing operation corresponding to the first similarity aiming at the database; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
Optionally, based on preset matrix conversion policy information, respectively converting the first summary table into the first matrix and converting the second summary table into the second matrix, including:
respectively acquiring composition structure data of a first summary table and composition structure data of a second summary table; wherein the composition structure data comprises a base table and fields;
and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
Optionally, the obtaining the composition structure data of the first summary table and the composition structure data of the second summary table respectively includes:
acquiring generation logic information and processing Structured Query Language (SQL) information of a summary table; the summary table comprises a first summary table and a second summary table;
and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
Optionally, the obtaining the composition structure data of the first summary table and the composition structure data of the second summary table respectively based on the generated logic information and the processed SQL information includes:
respectively acquiring a basic table of the first summary table and a basic table of the second summary table by utilizing the first regular expression based on the generated logic information and the processed SQL information;
and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
Optionally, determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix comprises:
performing difference operation on the first matrix and the second matrix to obtain a third matrix;
determining Euclidean norm values corresponding to the third matrix based on the third matrix;
and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
In a second aspect, an embodiment of the present invention provides a data redundancy processing apparatus for a database, including:
the acquisition module is used for acquiring a first summary table and a second summary table which are prestored in a database;
the conversion module is used for respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
the determining module is used for determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
the executing module is used for executing first data redundancy processing operation corresponding to the first similarity aiming at the database based on a preset mapping relation when the first similarity is determined to reach the similarity threshold; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
Optionally, the conversion module is configured to obtain component structure data of the first summary table and component structure data of the second summary table, respectively; wherein the composition structure data comprises a base table and fields; and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
Optionally, the conversion module is configured to obtain generation logic information of the summary table and processing Structured Query Language (SQL) information; the summary table comprises a first summary table and a second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
Optionally, the conversion module is configured to obtain a basic table of the first summary table and a basic table of the second summary table by using the first regular expression based on the generated logic information and the processed SQL information; and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
Optionally, the determining module is configured to perform a difference operation on the first matrix and the second matrix to obtain a third matrix; determining Euclidean norm values corresponding to the third matrix based on the third matrix; and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: a processor, and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the data redundancy processing method of the database in the first aspect or any optional implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where computer program instructions are stored on the computer storage medium, and when the computer program instructions are executed by a processor, the method for processing data redundancy of a database in the first aspect or any optional implementation manner of the first aspect is implemented.
The data redundancy processing method and device for the database, the electronic equipment and the computer storage medium can improve the data redundancy processing effect of the database. The data redundancy processing method of the database comprises the steps of converting a first summary table prestored in the database into a first matrix and converting a second summary table into a second matrix based on preset matrix conversion strategy information; based on the first matrix and the second matrix, the first similarity between the first summary table and the second summary table can be more accurately determined; therefore, when the first similarity is determined to reach the similarity threshold, the first data redundancy processing operation corresponding to the first similarity is executed for the database based on the preset mapping relation, and the data redundancy processing effect of the database can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data redundancy processing method for a database according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating another method for processing data redundancy of a database according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data redundancy processing apparatus for a database according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
At present, a common data redundancy processing method of a database is as follows: 1. the big data redundancy detection method based on the file similarity mainly detects redundant data by comparing hash values of stored files, analyzes and judges redundant data blocks only from the angle of the files, and can only effectively reduce data storage capacity. 2. A method for eliminating redundancy modes in a database design process can only eliminate the redundancy problem of table design of a relational database based on polynomial time complexity. 3. A redundant data detection method in a three-dimensional data patch aims at image data redundant data of a three-dimensional object and only solves the problems of storage space occupation and subsequent influence on three-dimensional imaging processing of the three-dimensional data. However, these methods have difficulty in accurately determining redundant data, resulting in poor data redundancy processing of the database.
In order to solve the problem of the prior art, embodiments of the present invention provide a data redundancy processing method and apparatus for a database, an electronic device, and a computer storage medium. First, a data redundancy processing method for a database according to an embodiment of the present invention is described below.
Fig. 1 is a schematic flowchart illustrating a data redundancy processing method for a database according to an embodiment of the present invention. As shown in fig. 1, the data redundancy processing method of the database may include the following steps:
s101, a first summary table and a second summary table which are prestored in a database are obtained.
The tables in the database are mainly derived from two sources: one is a basic table formed by importing files or other databases, and the other is a table formed by processing and summarizing the basic tables, namely a summary table. For example, in a large centralized data platform in china, the data volume is huge, and the contents of the summary tables of the basic table processing summary assembly are similar, which causes a great waste of storage space and computing resources.
The first summary table and the second summary table in step S101 may be two summary tables selected arbitrarily, but the similarity of the contents between the two summary tables cannot be determined directly and accurately, so step S102 needs to be executed.
S102, converting the first summary table into a first matrix and converting the second summary table into a second matrix respectively based on preset matrix conversion strategy information.
In order to determine the first matrix and the second matrix more accurately, in an embodiment, converting the first summary table into the first matrix and converting the second summary table into the second matrix respectively based on preset matrix conversion policy information may include: respectively acquiring composition structure data of a first summary table and composition structure data of a second summary table; wherein the composition structure data comprises a base table and fields; and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
In order to obtain the composition structure data more accurately, in an embodiment, the obtaining the composition structure data of the first summary table and the composition structure data of the second summary table respectively may include: acquiring generation logic information and processing SQL information of a summary table; the summary table comprises a first summary table and a second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
In order to obtain the basic table and the field more accurately, in an embodiment, based on the generated logic information and the processed SQL information, the component structure data of the first summary table and the component structure data of the second summary table are respectively obtained, including: respectively acquiring a basic table of the first summary table and a basic table of the second summary table by utilizing the first regular expression based on the generated logic information and the processed SQL information; and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
After the first matrix and the second matrix are obtained, step S103 is performed to determine redundant data.
S103, determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix.
To more accurately determine the first similarity between the first summary table and the second summary table, in one embodiment, determining the first similarity between the first summary table and the second summary table based on the first matrix and the second matrix may include: performing difference operation on the first matrix and the second matrix to obtain a third matrix; determining Euclidean norm values corresponding to the third matrix based on the third matrix; and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
S104, when the first similarity reaches a similarity threshold, executing first data redundancy processing operation corresponding to the first similarity aiming at the database based on a preset mapping relation; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
The data redundancy processing method of the database provided by the embodiment of the invention is characterized in that a first summary table prestored in the database is converted into a first matrix and a second summary table is converted into a second matrix based on preset matrix conversion strategy information; based on the first matrix and the second matrix, the first similarity between the first summary table and the second summary table can be more accurately determined; therefore, when the first similarity is determined to reach the similarity threshold, the first data redundancy processing operation corresponding to the first similarity is executed for the database based on the preset mapping relation, and the data redundancy processing effect of the database can be improved.
The following describes the above with an example, which specifically includes the following:
step 1: the bloodshot of the summary table, i.e., the basic table and the fields used for processing the summary table, is found.
According to the SQL sentence of the summary table, the character strings behind the from keyword are filtered through the regular expression to obtain the table name of the basic table, the character strings between the select and the from are obtained through the regular expression to obtain the field name of the basic table, and the table name and the field name are respectively stored.
Step 2: and expressing the basic table and the field used by the processing summary table in a matrix form.
Assuming that m is the number of basic tables in the database, n is the maximum value of the number of fields of the basic tables in the database, if the field of the summary table a is derived from the jth field of the ith basic table, the field a can be usedijIs shown as aijThe value of (d) represents how many are made up of the jth field of the ith base table, and the matrix is expressed as follows:
for example, the AA summary table structure is shown in table 1:
TABLE 1
The corresponding matrix of the AA summary table can be expressed as:
for example, the BB summary table structure is shown in table 2:
TABLE 2
Field 1 | This field is from field 3 of table 3 | aa33=1; |
Field 2 | This field is from field 4 of table 3 | aa43=1; |
Field 3 | This field is from fields 3 and 4 of Table 2 | aa32=1;aa42=1 |
Field 4 | This field is from field 1 of Table 2 | aa12=1; |
Field 5 | This field is from field 3 of table 2 | aa32=1; |
Field 6 | This field is from field 4 of table 2 | aa42=1; |
Field 7 | This field is from field 2 of table 4 | aa24=1; |
Field 8 | This field is from field 3 of table 4 | aa34=1; |
The corresponding matrix of the BB summary table can be expressed as:
and step 3: and calculating the similarity of the two summary tables.
The matrix representation form of two tables can be obtained from step 1 and step 2, the two matrices are subjected to difference operation to obtain a new matrix a, and the L2 norm (i.e. euclidean norm) of the new matrix is solved, namely, the formula:
wherein λ is1Is ATThe maximum eigenvalue of a, i.e., the square of the maximum eigenvalue of the a' a matrix. If the Euclidean norm value is smaller, the two matrixes are more similar, namely the two tables are more similar. If the euclidean norm value is 0, then the two tables are identical.
Fig. 2 is a schematic flow chart of another data redundancy processing method for a database according to an embodiment of the present invention, and as shown in fig. 2, a consanguineous relationship table is first extracted from a summary table a, for example, a 1 st field of an aa table, a 2 nd field of a bb table, and a 3 rd field … of a cc table are extracted from the summary table a, and then converted into a matrix form, that is, a matrix corresponding to the summary table a. Correspondingly, the same operation is carried out on the summary table B to obtain a matrix corresponding to the summary table B. And performing difference operation on the matrix corresponding to the summary table A and the matrix corresponding to the summary table B to obtain a new matrix, calculating an L2 norm of the new matrix, and judging the similarity between the summary table A and the summary table B according to the L2 norm, wherein the smaller the L2 norm value is, the more similar the summary table A and the summary table B are.
The data redundancy processing method of the database provided by the embodiment of the invention has the following beneficial effects:
firstly, the method is high in universality, is suitable for large databases of all types, faces numerous analysis requirements, can effectively avoid chimney type repeated development, and saves development and maintenance cost; secondly, the redundancy of processing summarized data is reduced through the matrix L2 norm convenient for calculation, and the storage and calculation resources of a database can be effectively reduced, so that the system resources are saved; and finally, the calculation is simple and convenient, the matrix L2 norm values among the source tables are calculated only by the blood relationship of the data, and the heavy data content does not need to be compared and calculated.
Fig. 3 is a schematic structural diagram of a data redundancy processing apparatus of a database according to an embodiment of the present invention, and as shown in fig. 3, the data redundancy processing apparatus of the database may include:
an obtaining module 301, configured to obtain a first summary table and a second summary table pre-stored in a database;
a conversion module 302, configured to convert the first summary table into a first matrix and convert the second summary table into a second matrix, respectively, based on preset matrix conversion policy information;
a determining module 303, configured to determine a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
the executing module 304 is configured to, when it is determined that the first similarity reaches the similarity threshold, execute, on the basis of a preset mapping relationship, a first data redundancy processing operation corresponding to the first similarity with respect to the database; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
Optionally, in an embodiment, the converting module 302 is configured to obtain component structure data of the first summary table and component structure data of the second summary table respectively; wherein the composition structure data comprises a base table and fields; and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
Optionally, in an embodiment, the conversion module 302 is configured to obtain generation logic information and processing SQL information of the summary table; the summary table comprises a first summary table and a second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
Optionally, in an embodiment, the converting module 302 is configured to obtain, based on the generated logic information and the processed SQL information, a base table of the first summary table and a base table of the second summary table by using the first regular expression, respectively; and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
Optionally, in an embodiment, the determining module 303 is configured to perform a difference operation on the first matrix and the second matrix to obtain a third matrix; determining Euclidean norm values corresponding to the third matrix based on the third matrix; and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
Each module in the apparatus shown in fig. 3 has a function of implementing each step in fig. 1, and can achieve the corresponding technical effect, and for brevity, is not described again here.
Fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
The electronic device may include a processor 401 and a memory 402 storing computer program instructions.
Specifically, the processor 401 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present invention.
In one example, the Memory 402 may be a Read Only Memory (ROM). In one example, the ROM may be mask programmed ROM, programmable ROM (prom), erasable prom (eprom), electrically erasable prom (eeprom), electrically rewritable ROM (earom), or flash memory, or a combination of two or more of these.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement the data redundancy processing method of the database in the embodiment shown in fig. 1, and achieve the corresponding technical effect achieved by the embodiment shown in fig. 1 executing the method, which is not described herein again for brevity.
In one example, the electronic device may also include a communication interface 403 and a bus 410. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 410 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.
In addition, embodiments of the present invention may be implemented by providing a computer storage medium. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement a method of data redundancy handling for a database as shown in fig. 1.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.
Claims (10)
1. A data redundancy processing method of a database is characterized by comprising the following steps:
acquiring a first summary table and a second summary table prestored in a database;
respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
when the first similarity is determined to reach a similarity threshold, based on a preset mapping relation, executing a first data redundancy processing operation corresponding to the first similarity on the database; wherein the preset mapping relationship is a mapping relationship between the similarity and the data redundancy processing operation.
2. The method according to claim 1, wherein the converting the first summary table into the first matrix and the converting the second summary table into the second matrix based on the preset matrix conversion policy information respectively comprises:
respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table; wherein the composition structure data comprises base tables and fields;
and respectively determining the first matrix and the second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
3. The method according to claim 2, wherein the obtaining the component structure data of the first summary table and the component structure data of the second summary table respectively comprises:
acquiring generation logic information of a summary table and processing Structured Query Language (SQL) information; wherein the summary table comprises the first summary table and the second summary table;
and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
4. The method according to claim 3, wherein the obtaining the component structure data of the first summary table and the component structure data of the second summary table based on the generated logical information and the processed SQL information respectively comprises:
respectively acquiring the basic table of the first summary table and the basic table of the second summary table by utilizing a first regular expression based on the generated logic information and the processed SQL information;
and respectively acquiring the field of the first summary table and the field of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
5. The method of claim 1, wherein determining the first similarity between the first summary table and the second summary table based on the first matrix and the second matrix comprises:
performing difference operation on the first matrix and the second matrix to obtain a third matrix;
determining a Euclidean norm value corresponding to the third matrix based on the third matrix;
and determining the first similarity between the first summary table and the second summary table according to the Euclidean norm value.
6. A data redundancy processing apparatus for a database, comprising:
the acquisition module is used for acquiring a first summary table and a second summary table which are prestored in a database;
the conversion module is used for respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
a determining module, configured to determine a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
the execution module is used for executing a first data redundancy processing operation corresponding to the first similarity aiming at the database based on a preset mapping relation when the first similarity is determined to reach a similarity threshold value; wherein the preset mapping relationship is a mapping relationship between the similarity and the data redundancy processing operation.
7. The data redundancy processing apparatus of the database according to claim 6, wherein the converting module is configured to obtain the composition structure data of the first summary table and the composition structure data of the second summary table respectively; wherein the composition structure data comprises base tables and fields; and respectively determining the first matrix and the second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
8. The data redundancy processing device of the database according to claim 7, wherein the conversion module is configured to obtain the generation logic information of the summary table and the processing Structured Query Language (SQL) information; wherein the summary table comprises the first summary table and the second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
9. An electronic device, characterized in that the electronic device comprises: a processor, and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the data redundancy processing method of the database according to any one of claims 1 to 5.
10. A computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method of data redundancy handling for a database according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010348004.4A CN113568894A (en) | 2020-04-28 | 2020-04-28 | Data redundancy processing method and device for database, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010348004.4A CN113568894A (en) | 2020-04-28 | 2020-04-28 | Data redundancy processing method and device for database, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113568894A true CN113568894A (en) | 2021-10-29 |
Family
ID=78157821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010348004.4A Pending CN113568894A (en) | 2020-04-28 | 2020-04-28 | Data redundancy processing method and device for database, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113568894A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114564691A (en) * | 2022-03-03 | 2022-05-31 | 昆明学院 | Redundant data discrimination method based on inverse matrix |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055167A1 (en) * | 2009-08-28 | 2011-03-03 | Ingenix, Inc. | Apparatus, System, and Method for Identifying Redundancy and Consolidation Opportunities in Databases and Application Systems |
CN103902582A (en) * | 2012-12-27 | 2014-07-02 | 中国移动通信集团湖北有限公司 | Data warehouse redundancy reduction method and device |
CN104394345A (en) * | 2014-12-10 | 2015-03-04 | 马人欢 | Video storage and playback method for security and protection monitoring |
CN106228143A (en) * | 2016-08-02 | 2016-12-14 | 王国兴 | A kind of method that instructional video is marked with camera video motion contrast |
JPWO2017175375A1 (en) * | 2016-04-08 | 2019-01-17 | 株式会社日立製作所 | Data cleansing system, method, and program |
CN110826834A (en) * | 2018-08-14 | 2020-02-21 | 中国石油天然气股份有限公司 | Comparison method and device between different responsibility separation rule sets |
-
2020
- 2020-04-28 CN CN202010348004.4A patent/CN113568894A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055167A1 (en) * | 2009-08-28 | 2011-03-03 | Ingenix, Inc. | Apparatus, System, and Method for Identifying Redundancy and Consolidation Opportunities in Databases and Application Systems |
CN103902582A (en) * | 2012-12-27 | 2014-07-02 | 中国移动通信集团湖北有限公司 | Data warehouse redundancy reduction method and device |
CN104394345A (en) * | 2014-12-10 | 2015-03-04 | 马人欢 | Video storage and playback method for security and protection monitoring |
JPWO2017175375A1 (en) * | 2016-04-08 | 2019-01-17 | 株式会社日立製作所 | Data cleansing system, method, and program |
CN106228143A (en) * | 2016-08-02 | 2016-12-14 | 王国兴 | A kind of method that instructional video is marked with camera video motion contrast |
CN110826834A (en) * | 2018-08-14 | 2020-02-21 | 中国石油天然气股份有限公司 | Comparison method and device between different responsibility separation rule sets |
Non-Patent Citations (2)
Title |
---|
(瑞典)(E.帕特-埃南德)EVA PART-ENANDER,等: "MATLAB 5手册", 31 May 2000, 机械工业出版社, pages: 108 - 109 * |
(美)阿迈拉登(AMINZADEH,FRED): "模式识别与图象处理", 31 July 1991, 石油工业出版社, pages: 214 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114564691A (en) * | 2022-03-03 | 2022-05-31 | 昆明学院 | Redundant data discrimination method based on inverse matrix |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11250137B2 (en) | Vulnerability assessment based on machine inference | |
US10878087B2 (en) | System and method for detecting malicious files using two-stage file classification | |
CN115600194A (en) | Intrusion detection method, storage medium and device based on XGboost and LGBM | |
US10146740B1 (en) | Sparse data set processing | |
CN109241163B (en) | Electronic certificate generation method and terminal equipment | |
CN113568894A (en) | Data redundancy processing method and device for database, electronic equipment and storage medium | |
CN108920601B (en) | Data matching method and device | |
CN113962324A (en) | Picture detection method and device, storage medium and electronic equipment | |
CN114139161A (en) | Method, device, electronic equipment and medium for batch vulnerability detection | |
CN114003731A (en) | Heterogeneous data processing method, device, server and storage medium | |
CN113722600A (en) | Data query method, device, equipment and product applied to big data | |
CN116489251A (en) | Universal code stream analysis method, device, computer readable medium and terminal equipment | |
CN110046180B (en) | Method and device for locating similar examples and electronic equipment | |
CN114491042A (en) | Classification method, computer equipment and computer-readable storage medium | |
CN112711584A (en) | Data checking method, checking device, terminal equipment and readable storage medium | |
EP3588349B1 (en) | System and method for detecting malicious files using two-stage file classification | |
CN114139597A (en) | Case similarity calculation method and system, readable storage medium and computer equipment | |
US20140195540A1 (en) | Expeditious citation indexing | |
CN112948415A (en) | SQL statement detection method and device, terminal equipment and storage medium | |
CN116775889B (en) | Threat information automatic extraction method, system, equipment and storage medium based on natural language processing | |
CN117708350B (en) | Enterprise policy information association method and device and electronic equipment | |
CN114817929B (en) | Method and device for dynamically tracking and processing vulnerability of Internet of things, electronic equipment and medium | |
CN115423947B (en) | Three-dimensional model retrieval method, device, equipment and medium | |
CN116932345A (en) | User operation behavior detection method and device | |
CN116108052A (en) | Data query method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |