CN113568894A - Data redundancy processing method and device for database, electronic equipment and storage medium - Google Patents

Data redundancy processing method and device for database, electronic equipment and storage medium Download PDF

Info

Publication number
CN113568894A
CN113568894A CN202010348004.4A CN202010348004A CN113568894A CN 113568894 A CN113568894 A CN 113568894A CN 202010348004 A CN202010348004 A CN 202010348004A CN 113568894 A CN113568894 A CN 113568894A
Authority
CN
China
Prior art keywords
summary table
matrix
database
similarity
structure data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010348004.4A
Other languages
Chinese (zh)
Inventor
杨怡
尚晶
冯凯
熊伟
徐海勇
陶涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010348004.4A priority Critical patent/CN113568894A/en
Publication of CN113568894A publication Critical patent/CN113568894A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data redundancy processing method and device for a database, electronic equipment and a storage medium. The data redundancy processing method of the database comprises the following steps: acquiring a first summary table and a second summary table prestored in a database; respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information; determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix; when the first similarity is determined to reach the similarity threshold, based on a preset mapping relation, executing a first data redundancy processing operation corresponding to the first similarity aiming at the database; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation. According to the data redundancy processing method of the database, the data redundancy processing effect of the database can be improved.

Description

Data redundancy processing method and device for database, electronic equipment and storage medium
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a data redundancy processing method and device for a database, electronic equipment and a computer storage medium.
Background
With the market changing and the amazing survey, the market change is dealt with by utilizing the big data analysis, the market operation capacity and the efficiency are improved, and effective bases are provided for analysis and decision making of all departments of a company. Therefore, data redundancy detection and data redundancy processing need to be performed on the database, so that a great deal of waste of storage space and computing resources of the database is avoided.
At present, a common data redundancy processing method of a database is as follows: 1. the big data redundancy detection method based on the file similarity mainly detects redundant data by comparing hash values of stored files, analyzes and judges redundant data blocks only from the angle of the files, and can only effectively reduce data storage capacity. 2. A method for eliminating redundancy modes in a database design process can only eliminate the redundancy problem of table design of a relational database based on polynomial time complexity. 3. A redundant data detection method in a three-dimensional data patch aims at image data redundant data of a three-dimensional object and only solves the problems of storage space occupation and subsequent influence on three-dimensional imaging processing of the three-dimensional data. However, these methods have difficulty in accurately determining redundant data, resulting in poor data redundancy processing of the database.
Therefore, how to improve the data redundancy processing effect of the database is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The embodiment of the invention provides a data redundancy processing method and device for a database, electronic equipment and a computer storage medium, which can improve the data redundancy processing effect of the database.
In a first aspect, an embodiment of the present invention provides a data redundancy processing method for a database, including:
acquiring a first summary table and a second summary table prestored in a database;
respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
when the first similarity is determined to reach the similarity threshold, based on a preset mapping relation, executing a first data redundancy processing operation corresponding to the first similarity aiming at the database; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
Optionally, based on preset matrix conversion policy information, respectively converting the first summary table into the first matrix and converting the second summary table into the second matrix, including:
respectively acquiring composition structure data of a first summary table and composition structure data of a second summary table; wherein the composition structure data comprises a base table and fields;
and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
Optionally, the obtaining the composition structure data of the first summary table and the composition structure data of the second summary table respectively includes:
acquiring generation logic information and processing Structured Query Language (SQL) information of a summary table; the summary table comprises a first summary table and a second summary table;
and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
Optionally, the obtaining the composition structure data of the first summary table and the composition structure data of the second summary table respectively based on the generated logic information and the processed SQL information includes:
respectively acquiring a basic table of the first summary table and a basic table of the second summary table by utilizing the first regular expression based on the generated logic information and the processed SQL information;
and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
Optionally, determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix comprises:
performing difference operation on the first matrix and the second matrix to obtain a third matrix;
determining Euclidean norm values corresponding to the third matrix based on the third matrix;
and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
In a second aspect, an embodiment of the present invention provides a data redundancy processing apparatus for a database, including:
the acquisition module is used for acquiring a first summary table and a second summary table which are prestored in a database;
the conversion module is used for respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
the determining module is used for determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
the executing module is used for executing first data redundancy processing operation corresponding to the first similarity aiming at the database based on a preset mapping relation when the first similarity is determined to reach the similarity threshold; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
Optionally, the conversion module is configured to obtain component structure data of the first summary table and component structure data of the second summary table, respectively; wherein the composition structure data comprises a base table and fields; and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
Optionally, the conversion module is configured to obtain generation logic information of the summary table and processing Structured Query Language (SQL) information; the summary table comprises a first summary table and a second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
Optionally, the conversion module is configured to obtain a basic table of the first summary table and a basic table of the second summary table by using the first regular expression based on the generated logic information and the processed SQL information; and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
Optionally, the determining module is configured to perform a difference operation on the first matrix and the second matrix to obtain a third matrix; determining Euclidean norm values corresponding to the third matrix based on the third matrix; and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: a processor, and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the data redundancy processing method of the database in the first aspect or any optional implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where computer program instructions are stored on the computer storage medium, and when the computer program instructions are executed by a processor, the method for processing data redundancy of a database in the first aspect or any optional implementation manner of the first aspect is implemented.
The data redundancy processing method and device for the database, the electronic equipment and the computer storage medium can improve the data redundancy processing effect of the database. The data redundancy processing method of the database comprises the steps of converting a first summary table prestored in the database into a first matrix and converting a second summary table into a second matrix based on preset matrix conversion strategy information; based on the first matrix and the second matrix, the first similarity between the first summary table and the second summary table can be more accurately determined; therefore, when the first similarity is determined to reach the similarity threshold, the first data redundancy processing operation corresponding to the first similarity is executed for the database based on the preset mapping relation, and the data redundancy processing effect of the database can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data redundancy processing method for a database according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating another method for processing data redundancy of a database according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data redundancy processing apparatus for a database according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
At present, a common data redundancy processing method of a database is as follows: 1. the big data redundancy detection method based on the file similarity mainly detects redundant data by comparing hash values of stored files, analyzes and judges redundant data blocks only from the angle of the files, and can only effectively reduce data storage capacity. 2. A method for eliminating redundancy modes in a database design process can only eliminate the redundancy problem of table design of a relational database based on polynomial time complexity. 3. A redundant data detection method in a three-dimensional data patch aims at image data redundant data of a three-dimensional object and only solves the problems of storage space occupation and subsequent influence on three-dimensional imaging processing of the three-dimensional data. However, these methods have difficulty in accurately determining redundant data, resulting in poor data redundancy processing of the database.
In order to solve the problem of the prior art, embodiments of the present invention provide a data redundancy processing method and apparatus for a database, an electronic device, and a computer storage medium. First, a data redundancy processing method for a database according to an embodiment of the present invention is described below.
Fig. 1 is a schematic flowchart illustrating a data redundancy processing method for a database according to an embodiment of the present invention. As shown in fig. 1, the data redundancy processing method of the database may include the following steps:
s101, a first summary table and a second summary table which are prestored in a database are obtained.
The tables in the database are mainly derived from two sources: one is a basic table formed by importing files or other databases, and the other is a table formed by processing and summarizing the basic tables, namely a summary table. For example, in a large centralized data platform in china, the data volume is huge, and the contents of the summary tables of the basic table processing summary assembly are similar, which causes a great waste of storage space and computing resources.
The first summary table and the second summary table in step S101 may be two summary tables selected arbitrarily, but the similarity of the contents between the two summary tables cannot be determined directly and accurately, so step S102 needs to be executed.
S102, converting the first summary table into a first matrix and converting the second summary table into a second matrix respectively based on preset matrix conversion strategy information.
In order to determine the first matrix and the second matrix more accurately, in an embodiment, converting the first summary table into the first matrix and converting the second summary table into the second matrix respectively based on preset matrix conversion policy information may include: respectively acquiring composition structure data of a first summary table and composition structure data of a second summary table; wherein the composition structure data comprises a base table and fields; and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
In order to obtain the composition structure data more accurately, in an embodiment, the obtaining the composition structure data of the first summary table and the composition structure data of the second summary table respectively may include: acquiring generation logic information and processing SQL information of a summary table; the summary table comprises a first summary table and a second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
In order to obtain the basic table and the field more accurately, in an embodiment, based on the generated logic information and the processed SQL information, the component structure data of the first summary table and the component structure data of the second summary table are respectively obtained, including: respectively acquiring a basic table of the first summary table and a basic table of the second summary table by utilizing the first regular expression based on the generated logic information and the processed SQL information; and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
After the first matrix and the second matrix are obtained, step S103 is performed to determine redundant data.
S103, determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix.
To more accurately determine the first similarity between the first summary table and the second summary table, in one embodiment, determining the first similarity between the first summary table and the second summary table based on the first matrix and the second matrix may include: performing difference operation on the first matrix and the second matrix to obtain a third matrix; determining Euclidean norm values corresponding to the third matrix based on the third matrix; and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
S104, when the first similarity reaches a similarity threshold, executing first data redundancy processing operation corresponding to the first similarity aiming at the database based on a preset mapping relation; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
The data redundancy processing method of the database provided by the embodiment of the invention is characterized in that a first summary table prestored in the database is converted into a first matrix and a second summary table is converted into a second matrix based on preset matrix conversion strategy information; based on the first matrix and the second matrix, the first similarity between the first summary table and the second summary table can be more accurately determined; therefore, when the first similarity is determined to reach the similarity threshold, the first data redundancy processing operation corresponding to the first similarity is executed for the database based on the preset mapping relation, and the data redundancy processing effect of the database can be improved.
The following describes the above with an example, which specifically includes the following:
step 1: the bloodshot of the summary table, i.e., the basic table and the fields used for processing the summary table, is found.
According to the SQL sentence of the summary table, the character strings behind the from keyword are filtered through the regular expression to obtain the table name of the basic table, the character strings between the select and the from are obtained through the regular expression to obtain the field name of the basic table, and the table name and the field name are respectively stored.
Step 2: and expressing the basic table and the field used by the processing summary table in a matrix form.
Assuming that m is the number of basic tables in the database, n is the maximum value of the number of fields of the basic tables in the database, if the field of the summary table a is derived from the jth field of the ith basic table, the field a can be usedijIs shown as aijThe value of (d) represents how many are made up of the jth field of the ith base table, and the matrix is expressed as follows:
Figure BDA0002470907560000071
for example, the AA summary table structure is shown in table 1:
TABLE 1
Figure BDA0002470907560000072
Figure BDA0002470907560000081
The corresponding matrix of the AA summary table can be expressed as:
Figure BDA0002470907560000082
for example, the BB summary table structure is shown in table 2:
TABLE 2
Field 1 This field is from field 3 of table 3 aa33=1;
Field 2 This field is from field 4 of table 3 aa43=1;
Field 3 This field is from fields 3 and 4 of Table 2 aa32=1;aa42=1
Field 4 This field is from field 1 of Table 2 aa12=1;
Field 5 This field is from field 3 of table 2 aa32=1;
Field 6 This field is from field 4 of table 2 aa42=1;
Field 7 This field is from field 2 of table 4 aa24=1;
Field 8 This field is from field 3 of table 4 aa34=1;
The corresponding matrix of the BB summary table can be expressed as:
Figure BDA0002470907560000083
and step 3: and calculating the similarity of the two summary tables.
The matrix representation form of two tables can be obtained from step 1 and step 2, the two matrices are subjected to difference operation to obtain a new matrix a, and the L2 norm (i.e. euclidean norm) of the new matrix is solved, namely, the formula:
Figure BDA0002470907560000091
wherein λ is1Is ATThe maximum eigenvalue of a, i.e., the square of the maximum eigenvalue of the a' a matrix. If the Euclidean norm value is smaller, the two matrixes are more similar, namely the two tables are more similar. If the euclidean norm value is 0, then the two tables are identical.
Fig. 2 is a schematic flow chart of another data redundancy processing method for a database according to an embodiment of the present invention, and as shown in fig. 2, a consanguineous relationship table is first extracted from a summary table a, for example, a 1 st field of an aa table, a 2 nd field of a bb table, and a 3 rd field … of a cc table are extracted from the summary table a, and then converted into a matrix form, that is, a matrix corresponding to the summary table a. Correspondingly, the same operation is carried out on the summary table B to obtain a matrix corresponding to the summary table B. And performing difference operation on the matrix corresponding to the summary table A and the matrix corresponding to the summary table B to obtain a new matrix, calculating an L2 norm of the new matrix, and judging the similarity between the summary table A and the summary table B according to the L2 norm, wherein the smaller the L2 norm value is, the more similar the summary table A and the summary table B are.
The data redundancy processing method of the database provided by the embodiment of the invention has the following beneficial effects:
firstly, the method is high in universality, is suitable for large databases of all types, faces numerous analysis requirements, can effectively avoid chimney type repeated development, and saves development and maintenance cost; secondly, the redundancy of processing summarized data is reduced through the matrix L2 norm convenient for calculation, and the storage and calculation resources of a database can be effectively reduced, so that the system resources are saved; and finally, the calculation is simple and convenient, the matrix L2 norm values among the source tables are calculated only by the blood relationship of the data, and the heavy data content does not need to be compared and calculated.
Fig. 3 is a schematic structural diagram of a data redundancy processing apparatus of a database according to an embodiment of the present invention, and as shown in fig. 3, the data redundancy processing apparatus of the database may include:
an obtaining module 301, configured to obtain a first summary table and a second summary table pre-stored in a database;
a conversion module 302, configured to convert the first summary table into a first matrix and convert the second summary table into a second matrix, respectively, based on preset matrix conversion policy information;
a determining module 303, configured to determine a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
the executing module 304 is configured to, when it is determined that the first similarity reaches the similarity threshold, execute, on the basis of a preset mapping relationship, a first data redundancy processing operation corresponding to the first similarity with respect to the database; the preset mapping relation is a mapping relation between the similarity and the data redundancy processing operation.
Optionally, in an embodiment, the converting module 302 is configured to obtain component structure data of the first summary table and component structure data of the second summary table respectively; wherein the composition structure data comprises a base table and fields; and respectively determining a first matrix and a second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
Optionally, in an embodiment, the conversion module 302 is configured to obtain generation logic information and processing SQL information of the summary table; the summary table comprises a first summary table and a second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
Optionally, in an embodiment, the converting module 302 is configured to obtain, based on the generated logic information and the processed SQL information, a base table of the first summary table and a base table of the second summary table by using the first regular expression, respectively; and respectively acquiring the fields of the first summary table and the fields of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
Optionally, in an embodiment, the determining module 303 is configured to perform a difference operation on the first matrix and the second matrix to obtain a third matrix; determining Euclidean norm values corresponding to the third matrix based on the third matrix; and determining a first similarity between the first summary table and the second summary table according to the Euclidean norm value.
Each module in the apparatus shown in fig. 3 has a function of implementing each step in fig. 1, and can achieve the corresponding technical effect, and for brevity, is not described again here.
Fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
The electronic device may include a processor 401 and a memory 402 storing computer program instructions.
Specifically, the processor 401 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present invention.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. In one example, memory 402 may include removable or non-removable (or fixed) media, or memory 402 is non-volatile solid-state memory. The memory 402 may be internal or external to the electronic device.
In one example, the Memory 402 may be a Read Only Memory (ROM). In one example, the ROM may be mask programmed ROM, programmable ROM (prom), erasable prom (eprom), electrically erasable prom (eeprom), electrically rewritable ROM (earom), or flash memory, or a combination of two or more of these.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement the data redundancy processing method of the database in the embodiment shown in fig. 1, and achieve the corresponding technical effect achieved by the embodiment shown in fig. 1 executing the method, which is not described herein again for brevity.
In one example, the electronic device may also include a communication interface 403 and a bus 410. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 410 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.
Bus 410 includes hardware, software, or both to couple the components of the electronic device to each other. By way of example, and not limitation, a Bus may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus, FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) Bus, an infiniband interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Micro Channel Architecture (MCA) Bus, a Peripheral Component Interconnect (PCI) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a video electronics standards association local (VLB) Bus, or other suitable Bus or a combination of two or more of these. Bus 410 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.
In addition, embodiments of the present invention may be implemented by providing a computer storage medium. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement a method of data redundancy handling for a database as shown in fig. 1.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (10)

1. A data redundancy processing method of a database is characterized by comprising the following steps:
acquiring a first summary table and a second summary table prestored in a database;
respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
determining a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
when the first similarity is determined to reach a similarity threshold, based on a preset mapping relation, executing a first data redundancy processing operation corresponding to the first similarity on the database; wherein the preset mapping relationship is a mapping relationship between the similarity and the data redundancy processing operation.
2. The method according to claim 1, wherein the converting the first summary table into the first matrix and the converting the second summary table into the second matrix based on the preset matrix conversion policy information respectively comprises:
respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table; wherein the composition structure data comprises base tables and fields;
and respectively determining the first matrix and the second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
3. The method according to claim 2, wherein the obtaining the component structure data of the first summary table and the component structure data of the second summary table respectively comprises:
acquiring generation logic information of a summary table and processing Structured Query Language (SQL) information; wherein the summary table comprises the first summary table and the second summary table;
and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
4. The method according to claim 3, wherein the obtaining the component structure data of the first summary table and the component structure data of the second summary table based on the generated logical information and the processed SQL information respectively comprises:
respectively acquiring the basic table of the first summary table and the basic table of the second summary table by utilizing a first regular expression based on the generated logic information and the processed SQL information;
and respectively acquiring the field of the first summary table and the field of the second summary table by using a second regular expression based on the generated logic information and the processed SQL information.
5. The method of claim 1, wherein determining the first similarity between the first summary table and the second summary table based on the first matrix and the second matrix comprises:
performing difference operation on the first matrix and the second matrix to obtain a third matrix;
determining a Euclidean norm value corresponding to the third matrix based on the third matrix;
and determining the first similarity between the first summary table and the second summary table according to the Euclidean norm value.
6. A data redundancy processing apparatus for a database, comprising:
the acquisition module is used for acquiring a first summary table and a second summary table which are prestored in a database;
the conversion module is used for respectively converting the first summary table into a first matrix and converting the second summary table into a second matrix based on preset matrix conversion strategy information;
a determining module, configured to determine a first similarity between the first summary table and the second summary table based on the first matrix and the second matrix;
the execution module is used for executing a first data redundancy processing operation corresponding to the first similarity aiming at the database based on a preset mapping relation when the first similarity is determined to reach a similarity threshold value; wherein the preset mapping relationship is a mapping relationship between the similarity and the data redundancy processing operation.
7. The data redundancy processing apparatus of the database according to claim 6, wherein the converting module is configured to obtain the composition structure data of the first summary table and the composition structure data of the second summary table respectively; wherein the composition structure data comprises base tables and fields; and respectively determining the first matrix and the second matrix by using the composition structure data of the first summary table and the composition structure data of the second summary table based on the matrix conversion strategy information.
8. The data redundancy processing device of the database according to claim 7, wherein the conversion module is configured to obtain the generation logic information of the summary table and the processing Structured Query Language (SQL) information; wherein the summary table comprises the first summary table and the second summary table; and respectively acquiring the composition structure data of the first summary table and the composition structure data of the second summary table based on the generated logic information and the processed SQL information.
9. An electronic device, characterized in that the electronic device comprises: a processor, and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the data redundancy processing method of the database according to any one of claims 1 to 5.
10. A computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method of data redundancy handling for a database according to any of claims 1-5.
CN202010348004.4A 2020-04-28 2020-04-28 Data redundancy processing method and device for database, electronic equipment and storage medium Pending CN113568894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010348004.4A CN113568894A (en) 2020-04-28 2020-04-28 Data redundancy processing method and device for database, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010348004.4A CN113568894A (en) 2020-04-28 2020-04-28 Data redundancy processing method and device for database, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113568894A true CN113568894A (en) 2021-10-29

Family

ID=78157821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010348004.4A Pending CN113568894A (en) 2020-04-28 2020-04-28 Data redundancy processing method and device for database, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113568894A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564691A (en) * 2022-03-03 2022-05-31 昆明学院 Redundant data discrimination method based on inverse matrix

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055167A1 (en) * 2009-08-28 2011-03-03 Ingenix, Inc. Apparatus, System, and Method for Identifying Redundancy and Consolidation Opportunities in Databases and Application Systems
CN103902582A (en) * 2012-12-27 2014-07-02 中国移动通信集团湖北有限公司 Data warehouse redundancy reduction method and device
CN104394345A (en) * 2014-12-10 2015-03-04 马人欢 Video storage and playback method for security and protection monitoring
CN106228143A (en) * 2016-08-02 2016-12-14 王国兴 A kind of method that instructional video is marked with camera video motion contrast
JPWO2017175375A1 (en) * 2016-04-08 2019-01-17 株式会社日立製作所 Data cleansing system, method, and program
CN110826834A (en) * 2018-08-14 2020-02-21 中国石油天然气股份有限公司 Comparison method and device between different responsibility separation rule sets

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055167A1 (en) * 2009-08-28 2011-03-03 Ingenix, Inc. Apparatus, System, and Method for Identifying Redundancy and Consolidation Opportunities in Databases and Application Systems
CN103902582A (en) * 2012-12-27 2014-07-02 中国移动通信集团湖北有限公司 Data warehouse redundancy reduction method and device
CN104394345A (en) * 2014-12-10 2015-03-04 马人欢 Video storage and playback method for security and protection monitoring
JPWO2017175375A1 (en) * 2016-04-08 2019-01-17 株式会社日立製作所 Data cleansing system, method, and program
CN106228143A (en) * 2016-08-02 2016-12-14 王国兴 A kind of method that instructional video is marked with camera video motion contrast
CN110826834A (en) * 2018-08-14 2020-02-21 中国石油天然气股份有限公司 Comparison method and device between different responsibility separation rule sets

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
(瑞典)(E.帕特-埃南德)EVA PART-ENANDER,等: "MATLAB 5手册", 31 May 2000, 机械工业出版社, pages: 108 - 109 *
(美)阿迈拉登(AMINZADEH,FRED): "模式识别与图象处理", 31 July 1991, 石油工业出版社, pages: 214 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564691A (en) * 2022-03-03 2022-05-31 昆明学院 Redundant data discrimination method based on inverse matrix

Similar Documents

Publication Publication Date Title
US11250137B2 (en) Vulnerability assessment based on machine inference
US10878087B2 (en) System and method for detecting malicious files using two-stage file classification
CN115600194A (en) Intrusion detection method, storage medium and device based on XGboost and LGBM
US10146740B1 (en) Sparse data set processing
CN109241163B (en) Electronic certificate generation method and terminal equipment
CN113568894A (en) Data redundancy processing method and device for database, electronic equipment and storage medium
CN108920601B (en) Data matching method and device
CN113962324A (en) Picture detection method and device, storage medium and electronic equipment
CN114139161A (en) Method, device, electronic equipment and medium for batch vulnerability detection
CN114003731A (en) Heterogeneous data processing method, device, server and storage medium
CN113722600A (en) Data query method, device, equipment and product applied to big data
CN116489251A (en) Universal code stream analysis method, device, computer readable medium and terminal equipment
CN110046180B (en) Method and device for locating similar examples and electronic equipment
CN114491042A (en) Classification method, computer equipment and computer-readable storage medium
CN112711584A (en) Data checking method, checking device, terminal equipment and readable storage medium
EP3588349B1 (en) System and method for detecting malicious files using two-stage file classification
CN114139597A (en) Case similarity calculation method and system, readable storage medium and computer equipment
US20140195540A1 (en) Expeditious citation indexing
CN112948415A (en) SQL statement detection method and device, terminal equipment and storage medium
CN116775889B (en) Threat information automatic extraction method, system, equipment and storage medium based on natural language processing
CN117708350B (en) Enterprise policy information association method and device and electronic equipment
CN114817929B (en) Method and device for dynamically tracking and processing vulnerability of Internet of things, electronic equipment and medium
CN115423947B (en) Three-dimensional model retrieval method, device, equipment and medium
CN116932345A (en) User operation behavior detection method and device
CN116108052A (en) Data query method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination