CN110442585B - Data updating method, data updating device, computer equipment and storage medium - Google Patents
Data updating method, data updating device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110442585B CN110442585B CN201910541926.4A CN201910541926A CN110442585B CN 110442585 B CN110442585 B CN 110442585B CN 201910541926 A CN201910541926 A CN 201910541926A CN 110442585 B CN110442585 B CN 110442585B
- Authority
- CN
- China
- Prior art keywords
- data table
- data
- total
- acquiring
- updating method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000001360 synchronised effect Effects 0.000 claims abstract description 53
- 230000008859 change Effects 0.000 claims description 24
- 238000012217 deletion Methods 0.000 claims description 18
- 230000037430 deletion Effects 0.000 claims description 18
- 230000007704 transition Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 4
- 239000000463 material Substances 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 6
- 230000008676 import Effects 0.000 description 5
- 239000002253 acid Substances 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data updating method, a data updating device, computer equipment and a storage medium, wherein first data synchronized from an application system before a preset time point are acquired, and the first data are imported into a first data table; acquiring second data synchronized from the application system after a preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table; combining the first data table and the second data table to generate a total data table; and removing the repeated data in the total data table to obtain an updated total data table. The data updating method, the data updating device, the computer and the storage medium provided by the invention are used for removing the duplication by combining the first data table and the second data table, the logic is simple, the trouble of manual processing is saved, a large amount of manpower and material resources are saved, and meanwhile, the accuracy of data updating is ensured.
Description
Technical Field
The present invention relates to the field of computer information technologies, and in particular, to a data updating method, a data updating device, a computer device, and a storage medium.
Background
With popularization and popularization of internet and big data technology, more and more data needs to be stored and processed, and data warehouse based on Hadoop and Hive distributed clusters has gradually become mainstream. For example, massive data from a service system needs to be stored by adopting a Hive table, so that the management and the query of the data are facilitated. However, due to the changes of service requirements, some tables of the service system are inevitably changed in table structure, so that the archive source data in each period is different.
In a typical star-type data warehouse, the dimension tables change slowly over time. For example, a retailer opens a new store, needs new store data to be added to the store table, or the business area or other characteristics to be tracked of an existing store change. These changes may result in insertion or modification of individual records. Hive, however, starts from version 0.14 to support row level updates. In addition, data sets are sometimes found to be erroneous and need to be corrected. Or the current data is only an approximation (e.g., only 90% of the total data, which would lag). Or business rules may require restatement of a particular transaction based on a subsequent transaction (e.g., a customer purchases a membership after purchasing some items, at which point a discounted price may be enjoyed, including previously purchased items). Or a client may request deletion of their client data after termination of the partnership.
Hive supports updates from version 0.14 and requires that the data table must support the ACID attribute. Wherein the ACID attributes include Atomic (a), consistency (Consistency, C), isolation (I), persistence (Durability, D). The Hive version used by enterprises can rarely support updating and timely updating, but more data need to be updated, users can only pull out the data in a running way for manual processing, so that a large amount of manpower and material resources are consumed, errors are generated in manual operation, and the updated data are inaccurate.
Disclosure of Invention
In view of the above, the present invention provides a question generation method, a question generation system, a computer device and a storage medium based on the topic identification, which can identify the topic in the question sentence, and generate the related questions by combining the topic and the semantic similarity, so that the generated questions more conform to the actual demands and ideas of the user.
First, to achieve the above object, the present invention provides a data updating method, including the steps of:
Acquiring first data synchronized from an application system before a preset time point, and importing the first data into a first data table;
Acquiring second data synchronized from the application system after the preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
Combining the first data table and the second data table to generate a total data table; and
And removing the repeated data in the total data table to obtain an updated total data table.
Further, before the step of obtaining the first data table and the second data table, wherein the format of the first data table is consistent with the format of the second data table, the data updating method further includes:
creating a Hive database; and
And acquiring an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source.
Further, the step of generating a total data table by combining the first data table and the second data table includes:
And performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table.
Further, the step of removing the repeated data in the total data table and obtaining the updated total data table includes:
Classifying the data in the total data table according to the primary key group to obtain a classification result;
According to the classification result, arranging the data under the same main key group according to a time sequence to obtain an arrangement result; and
And deleting the data ranked outside the preset ranking under the same primary key group according to the sequencing result, and obtaining an updated total data table.
Further, after the step of removing the repeated data in the total data table and obtaining the updated total data table, the data updating method includes:
Adding a change field in the updated total data table, and acquiring a first transition total data table to record change time of other fields in the updated total data table; and
And acquiring a synchronous total data table according to the first transition total data table and the change time.
Further, after the step of acquiring the synchronized data table according to the first transitional total data table and the change time, the data updating method includes:
adding a deletion field in the synchronous total data table, and acquiring a second transition total data table to record the deletion condition of other fields in the synchronous total data table; and
And deleting the synchronous total data table according to the deleting condition, and acquiring the synchronous data table again.
Further, after the step of obtaining the data source after the preset time point and writing the data into the second data table, the data updating method further includes:
left associating the first data table with the second data table to obtain a left associated data table;
acquiring a common main key group of the first data table and the second data table according to the left associated data table;
Covering the data of the row of the common main key group in the first data table by the data of the row of the common main key group in the second data table, and acquiring a covered data table; and
And filtering the data in the second data table to obtain the data which exists in the second data table independently, and inserting the data into the covered data table to obtain the updated data table.
In addition, to achieve the above object, the present invention also provides a data updating apparatus including:
the first acquisition module is used for acquiring first data synchronized from the application system before a preset time point and importing the first data into a first data table;
The second acquisition module is used for acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
the combination module is used for combining the first data table and the second data table to generate a total data table; and
And the de-duplication module is used for removing repeated data in the total data table and acquiring an updated total data table.
To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above method.
Compared with the prior art, the data updating method provided by the invention has the advantages that the first data synchronized from the application system is firstly obtained and is imported into the first data table, so that the first data table is used as a basic table, when the second data synchronized from the application system is obtained later, the second data is imported into the second data table, the first data table and the second data table are combined to generate the total data table, and then some repeated data in the total data table are removed, so that the updated data table is obtained. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.
Drawings
FIG. 1 is a flowchart of a data updating method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a data updating method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a data updating method according to a third embodiment of the present invention;
FIG. 4 is a flowchart of a data updating method according to a fourth embodiment of the present invention;
FIG. 5 is a flowchart of a data updating method according to a fifth embodiment of the present invention;
FIG. 6 is a flowchart of a data updating method according to a sixth embodiment of the present invention;
FIG. 7 is a block diagram of a data updating apparatus according to a seventh embodiment of the present invention;
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Data updating device | 700 |
First acquisition module | 710 |
Second acquisition module | 720 |
Combined module | 730 |
Duplicate removal module | 740 |
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Referring to fig. 1, a first embodiment provides a data updating method. The data updating method comprises the following steps:
Step S110, first data synchronized from the application system before a preset time point is acquired, and the first data is imported into a first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The first data table is consistent with the acquired data format to ensure that the acquired data can be written to the first data table. The first data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system before a preset point in time.
For example, at 00: before 00, a batch of data is obtained from an application system, and then the data are imported into a preset first data table for subsequent use.
Step S120, obtaining second data synchronized from the application system after the preset time point, and importing the second data into a second data table, where the format of the second data table is consistent with the format of the first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The second data table is consistent with the acquired data format to ensure that the acquired data can be written to the second data table. The second data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system after a preset time point, and the format of the second data table is consistent with that of the first data table for updating the data in the first data table.
For example, at 00: after 00, a batch of data is synchronized again from the local system, and the data is imported into a second data table for subsequent updating and use.
And step S130, combining the first data table and the second data table to generate a total data table.
Specifically, a UNION operation is performed on the first data table and the second data table by using a UNION ALL command, so as to obtain the total data table. For example, the first data table is employee_China:
E_ID | E_Name |
01 | Zhang,Hua |
02 | Wang,Wei |
03 | Carter,Thomas |
04 | Yang,Ming |
the second data table is employee_USA:
E_ID | E_Name |
01 | Adams,John |
02 | Bush,George |
03 | Carter,Thomas |
04 | Gates,Bill |
Using UNION ALL commands, i.e.
SELECTE_Name FROM Employees_China
UNION ALL
SELECTE_Name FROM Employees_USA
Then a total data table is generated, i.e
E_Name |
Zhang,Hua |
Wang,Wei |
Carter,Thomas |
Yang,Ming |
Adams,John |
Bush,George |
Carter,Thomas |
Gates,Bill |
That is, UNIONALL pulls the data out together for aggregation, regardless of whether the data in the first and second tables are duplicated.
And step S140, removing the repeated data in the total data table to obtain an updated total data table.
Specifically, the duplicate data in the above-described total data table is deleted by a duplicate function row_number () (for example, one Carter, thomas in the total data table is deleted), thereby obtaining an updated total data table.
By adopting the data updating method in this embodiment, first data synchronized from the application system is acquired, and the first data is imported into the first data table, so that the first data table is used as a basic table, then when second data synchronized from the application system is acquired, the second data is imported into the second data table, the first data table and the second data table are combined to generate a total data table, and then some repeated data in the total data table are removed, so that the updated data table is acquired. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.
In the second embodiment, please refer to fig. 2, step S110: the method for updating the data further comprises the steps of:
Step S210, a Hive database is created.
Specifically, create Hive database statements: hive > create database student. Then a database (e.g., a student library) is created on the Hadoop distributed file system (hdfs) at hdfs:///user/hive/warehouse/which is generated as libraries are created, where each library is a folder, the name of the library, i.e., the folder name, plus. Db (e.g., student. Db) to indicate that it is a database.
Hive is a data warehouse tool based on Hadoop, can map a structured data file into a database table, provides a simple sql query function, and can convert sql sentences into MapReduce tasks for operation. Hive's data is divided into table data, which is data that a table (table) in Hive has, and metadata; metadata is used to store the name of the table, the column and partition of the table and its attributes, the attributes of the table (whether it is an external table, etc.), the directory in which the data of the table is located, etc. Hive mainly comprises the following data models: table, external Table, partition, bucket.
Step S220, obtaining an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source.
For example, the raw data source is a student performance txt file, seven fields (ID, name, language, english, math, school, class). For example, 0001, zhang three, 99, 98, 100, school 1, class 1;0002, li four, 59, 89, 79, school 2, class 1;0003, wang five, 89, 99, 100, school 3, class 1;0004, zhang Saner, 99, 98, 100, school 1, class 1;0005, litetra, 59, 89, 79, school 2, class 1;0006, wang wubi, 89, 99, 100, school 3, class 1. The method of obtaining the original data source is not limited in this embodiment.
From the original data source in the above embodiment, a first data table of seven columns and six rows is created. Specifically, the names (such as tablescore 1) of the first data table are created through codes, an ID variable, a name variable, a language variable, an english variable, a mathematical variable, a school variable, a class variable are defined in the first data table, and the variables are defined to be separated from each other by commas, so that the first data table is created. The code for creating the first data table is as follows:
create table score1
(id string comment'ID',name string comment'name',
Chinese double comment'Chinese',
English double comment'English',
math double comment'math',
school string comment'school',
class string comment'class')
comment'score1'
row format delimited fields terminated by','
stored as textfile; thereby creating a first data table.
Likewise, a second data table may be created in the Hive database in the same manner. In addition, the first data table and the second data table need to maintain the consistency of the formats so as to be used later.
In a third embodiment, referring to fig. 3, step S130 includes:
And step S310, classifying the data in the total data table according to the primary key group to obtain a classification result.
For example, the total data table is
The primary key groups are 111, 222 and 333 respectively, so that the data in the total data table are classified, and corresponding classification results are obtained. The primary key group is set according to the condition of the total data table so as to avoid conflict among other field data.
Step S320, according to the classification result, the data under the same main key group are arranged according to the time sequence, and the arrangement result is obtained.
Specifically, a row_number function is adopted to execute related sentences, so that data under the same main key group are arranged, and a corresponding arrangement result is obtained. The statement may be: row_number () over (part by key_value_ column order by updated _date de sc)
For example, one ranking result obtained using the method described above is:
The data of the same main key group can be determined to have several import records according to the import time, and meanwhile, the data of the same main key group is determined to be the latest data according to the import time, so that the subsequent data update is padded.
And step S330, deleting the data which are ranked outside the preset ranking and are in the same primary key group according to the sequencing result, and obtaining an updated total data table.
Specifically, based on the ordering result, deleting data that is ranked after the first name under the same primary key group, e.g., based on the ordering result of the previous embodiment, deleting 111 the second name in the group, deleting 222 the second name in the group, thereby obtaining an updated total data table, i.e.
ID | Name of name | Age of |
111 | Zhang San (Zhang San) | 26 |
222 | Liwu four-element bag | 32 |
333 | Wangwu (five kinds of Chinese characters) | 27 |
The data table is the updated data table, so that some repeated data in the first data table and the second data table are removed, and the latest data is reserved, so that the data in the data table has higher accuracy and timeliness. The arrangement order of the same primary key group may be arranged in the positive order of time or in the negative order of time, which is not limited in this embodiment. For example, when the arrangement order of the same main key group is arranged in the positive order of time, all data except the last name of the same main key group may be deleted.
In a fourth embodiment, referring to fig. 4, after step S140, the data updating method further includes:
Step S410, adding a change field to the updated total data table, and obtaining a first transitional total data table to record change time of other fields in the updated total data table.
For example, a list of change fields is added to the original updated total data table, and the change fields are used for recording the change time of other fields each time. For example, for a plurality of changes, the change time of each change is recorded correspondingly. Assume that the updated total data table is:
ID | Name of name | Age of |
111 | Zhang San (Zhang San) | 23 |
222 | Liwu four-element bag | 44 |
333 | Wangwu (five kinds of Chinese characters) | 10 |
333 | Zhao Liu A | 30 |
I.e. the updated total data table comprises an ID field, a name field and an age field. Then, some changes are made to certain fields in the data table, such as changing Zhang Sanage to 24, changing Lifour name to Li Sai, changing age to 45, and changing Zhang Sanage to 25 at 2017/07/01:02:00, and changing Zhang Sanage to 25 at 2018/05/25 09:00:10. Then a column is added to the last column of the updated total data table and is set as a change field, and the original updated total data table is changed into a changed data table after being recorded in the change field:
The upper table is the acquired second transition total data table. Where Null indicates that the row is the original data in the updated total data table.
Step S402, according to the transition total data table and the change time, acquiring a synchronous total data table.
For example, according to the change time recorded in the change field, the obtained transition total data table is combined, and for the line where the change time is last, the other lines are deleted; and if the data is unchanged, reserving the data, and thus acquiring the synchronous data table. The synchronized data table is
ID | Name of name | Age of |
111 | Zhang San (Zhang San) | 25 |
222 | Li Sai A | 45 |
333 | Wangwu (five kinds of Chinese characters) | 10 |
333 | Zhao Liu A | 30 |
In a fifth embodiment, referring to fig. 5, after step S420, the data updating method further includes:
Step S510, adding a deletion field to the synchronized total data table, and obtaining a second transitional total data table to record the deletion condition of other fields in the synchronized total data table.
For example, a list of deletion fields is added to the original synchronized total data table, and the deletion fields are used for recording whether a deletion operation is performed, and when the deletion operation is performed, the record is 1, and when the deletion operation is not performed, the record is 0. The synchronized data table is assumed to be:
that is, the synchronized data table includes an ID field, a name field, an age field, a change field, and a delete field. Some deletion operations are then performed on certain fields in the data table, such as deleting records of Zhang three and Zhao Liu at 2017/07/0100:02:00. Namely, the original synchronous total data table is changed into a record with a deleted field:
The table above is the second transition total data table.
And step S520, deleting the synchronous total data table according to the second transition total data table, and acquiring the synchronous data table again.
For example, in combination with the second transitional total data table obtained above, according to the deletion condition recorded in the deletion field, the line is deleted when the deletion operation is described as being performed by the line recorded as 1, the line is deleted when the deletion operation is described as being performed as being 0, and the synchronized data table is retained when the deletion operation is not described as being performed, and thus is obtained again. The synchronized data table obtained again is:
ID | Name of name | Age of |
222 | Li Sai A | 45 |
333 | Wangwu (five kinds of Chinese characters) | 10 |
In a sixth embodiment, referring to fig. 6, after step S120, the data updating method further includes:
step S610, associating the first data table with the second data table to obtain a left associated data table.
Specifically, the first data table is used as a master table and the second data table is used as a slave table based on the master key field by using the left association related statement, so that the first data table and the second data table are associated, and the left association data table is obtained. In one embodiment, the left associated sentence may be:
Insert overwrite table A
Select NVL(B.KEY_COLUMN_1,A.KEY_COLUMN_1)ASKEY_COLUMN_1,
NVL(B.COLUMN_2,A.COLUMN_2)AS COLUMN_2,
NVL(B.COLUMN_3,A.COLUMN_3)AS COLUMN_3,
……
From A
left join B on A.KEY_COLUMN_1=B.KEY_COLUMN_1 (based on the A table, the same data as the ID in the B table is spliced to the row of the ID in the A table, and the data different from the ID in the A table is set as Null value)
For example, the first data table is:
ID | COLUMN_1 | COLUMN_2 |
111 | 1 | A |
222 | 3 | S |
333 | 4 | D |
The second data table is:
ID | COLUMN_1 | COLUMN_2 |
111 | 1 | a |
444 | 8 | R |
555 | 6 | F |
after the left association statement, the acquired left association data table is:
A_ID | COLUMN_1 | COLUMN_2 | B_ID | COLUMN_1 | COLUMN_2 |
111 | 1 | A | 111 | 1 | a |
222 | 3 | S | Null | Null | Null |
333 | 4 | D | Null | Null | Null |
Where Null indicates that the two groups of primary keys, 222 and 333, have no associated data in the second data table.
Step S620, obtaining a common primary key group of the first data table and the second data table according to the left associated data table.
Specifically, according to the left associated data table, the primary key group common to the first data table and the second data table can be directly acquired. For example, the primary key field common to the first data table and the second data table may be acquired from the left associated data table in the above embodiment as 111.
Step S630, overlaying the data of the row of the common primary key group in the first data table with the data of the row of the common primary key group in the second data table, and obtaining an overlaid data table.
For example, the data of the row in the first data table is covered by the data of the row in the second data table in the common primary key group 111, so that the data in the first data table is updated, and the covered data table is obtained, that is:
ID | COLUMN_1 | COLUMN_2 |
111 | 1 | a |
222 | 3 | S |
333 | 4 | D |
Step S640, filtering the data in the second data table, obtaining the data that exists in the second data table alone, and inserting the data into the covered data table to obtain the updated data table.
Specifically, it can be found from the above steps that some data in the second data table is removed when the left association is performed, and the removed data is just newly added data, and needs to be updated into the first data table. And filtering the data in the second data table by using an exist function, screening out a part of data only existing in the second data table according to the primary key field, and directly inserting the part of data into the covered data table so as to obtain the finally updated data table. The final updated data table is:
ID | COLUMN_1 | COLUMN_2 |
111 | 1 | a |
222 | 3 | S |
333 | 4 | D |
444 | 8 | R |
555 | 6 | F |
in addition, the exist function may be:
Insert into table A
Select B.*
From B
Where not exists(select 1from A where A.KEY_COLUMN_1=B.KEY_COLUMN_1)
in a seventh embodiment, referring to fig. 7, a data updating apparatus 700 is provided. The data updating apparatus 700 includes:
the first obtaining module 710 is configured to obtain first data synchronized from the application system before a preset time point, and import the first data into the first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The first data table is consistent with the acquired data format to ensure that the acquired data can be written to the first data table. The first data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system before a preset point in time.
For example, at 00: before 00, a batch of data is obtained from an application system, and then the data are imported into a preset first data table for subsequent use.
A second obtaining module 720, configured to obtain second data synchronized from the application system after the preset time point, and import the second data into a second data table, where a format of the second data table is consistent with a format of the first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The second data table is consistent with the acquired data format to ensure that the acquired data can be written to the second data table. The second data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system after a preset time point, and the format of the second data table is consistent with that of the first data table for updating the data in the first data table.
For example, at 00: after 00, a batch of data is synchronized again from the local system, and the data is imported into a second data table for subsequent updating and use.
And a combining module 730, configured to combine the first data table and the second data table to generate a total data table.
Specifically, a UNION operation is performed on the first data table and the second data table by using a UNION ALL command, so as to obtain the total data table. For example, the first data table is employee_China:
E_ID | E_Name |
01 | Zhang,Hua |
02 | Wang,Wei |
03 | Carter,Thomas |
04 | Yang,Ming |
the second data table is employee_USA:
E_ID | E_Name |
01 | Adams,John |
02 | Bush,George |
03 | Carter,Thomas |
04 | Gates,Bill |
Using UNION ALL commands, i.e.
SELECT E_Name FROM Employees_China
UNION ALL
SELECT E_Name FROM Employees_USA
Then a total data table is generated, i.e
That is, UNIONALL pulls the data out together for aggregation, regardless of whether the data in the first and second tables are duplicated.
And the deduplication module 740 is configured to remove duplicate data in the total data table, and obtain an updated total data table.
Specifically, the duplicate data in the above-described total data table is deleted (for example, one Carter, thomas in the total data table is deleted), so that an updated total data table is obtained.
By adopting the data updating device in this embodiment, first data synchronized from the application system is acquired, and the first data is imported into the first data table, so that the first data table is used as a basic table, then when second data synchronized from the application system is acquired, the second data is imported into the second data table, the first data table and the second data table are combined to generate a total data table, and then some repeated data in the total data table are removed, so that the updated data table is acquired. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.
The invention also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server or a cabinet server (comprising independent servers or a server cluster formed by a plurality of servers) and the like which can execute programs. The computer device of the present embodiment includes at least, but is not limited to: memory, processors, etc. that may be communicatively coupled to each other via a system bus.
The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing an electronic device, and when executed by a processor, implements the data updating method of the present invention.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (5)
1. A data updating method, characterized in that the data updating method comprises the steps of:
Acquiring first data synchronized from an application system before a preset time point, and importing the first data into a first data table;
acquiring second data synchronized from the application system after the preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
Combining the first data table and the second data table to generate a total data table; and
Removing repeated data in the total data table to obtain an updated total data table;
the step of generating a total data table by combining the first data table and the second data table comprises:
Performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table;
The step of acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, the data updating method further includes:
creating a Hive database; and
Acquiring an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source;
after the step of removing the repeated data in the total data table and obtaining the updated total data table, the data updating method comprises the following steps:
Adding a change field in the updated total data table, and acquiring a first transition total data table to record change time of other fields in the updated total data table; and
Acquiring a synchronous total data table according to the first transition total data table and the change time;
After the step of acquiring the synchronized data table according to the first transition total data table and the change time, the data updating method includes:
adding a deletion field in the synchronous total data table, and acquiring a second transition total data table to record the deletion condition of other fields in the synchronous total data table; and
Deleting the synchronous total data table according to the deleting condition, and acquiring the synchronous data table again;
After the step of acquiring the second data synchronized from the application system after the preset time point and importing the second data into a second data table, the data updating method further includes:
left associating the first data table with the second data table to obtain a left associated data table;
acquiring a common main key group of the first data table and the second data table according to the left associated data table;
Covering the data of the row of the common main key group in the first data table by the data of the row of the common main key group in the second data table, and acquiring a covered data table; and
And filtering the data in the second data table to obtain the data which exists in the second data table independently, and inserting the data in the second data table into the covered data table to obtain the updated data table.
2. The data updating method as claimed in claim 1, wherein the step of removing the duplicate data in the total data table and obtaining the updated total data table comprises:
Classifying the data in the total data table according to the primary key group to obtain a classification result;
According to the classification result, arranging the data under the same main key group according to a time sequence to obtain an arrangement result; and
And deleting the data which are ranked outside the preset ranking under the same primary key group according to the arrangement result, and obtaining an updated total data table.
3. A data updating apparatus for performing the data updating method of claim 1 or 2, characterized in that the data updating apparatus comprises:
the first acquisition module is used for acquiring first data synchronized from the application system before a preset time point and importing the first data into a first data table;
the second acquisition module is used for acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
the combination module is used for combining the first data table and the second data table to generate a total data table; and
The de-duplication module is used for removing repeated data in the total data table and acquiring an updated total data table;
And the joint module is also used for performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table.
4. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the data updating method of any of claims 1 to 2 when the computer program is executed by the processor.
5. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the data updating method of any of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910541926.4A CN110442585B (en) | 2019-06-21 | 2019-06-21 | Data updating method, data updating device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910541926.4A CN110442585B (en) | 2019-06-21 | 2019-06-21 | Data updating method, data updating device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442585A CN110442585A (en) | 2019-11-12 |
CN110442585B true CN110442585B (en) | 2024-04-30 |
Family
ID=68428719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910541926.4A Active CN110442585B (en) | 2019-06-21 | 2019-06-21 | Data updating method, data updating device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442585B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259082B (en) * | 2020-02-11 | 2023-07-21 | 深圳市六因科技有限公司 | Method for realizing full data synchronization in big data environment |
CN113495894A (en) * | 2020-04-01 | 2021-10-12 | 北京京东振世信息技术有限公司 | Data synchronization method, device, equipment and storage medium |
CN111581448B (en) * | 2020-05-14 | 2023-09-19 | 中国银行股份有限公司 | Method and device for warehousing card bin information |
CN112612839A (en) * | 2020-12-28 | 2021-04-06 | 中国农业银行股份有限公司 | Data processing method and device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001356949A (en) * | 2000-01-26 | 2001-12-26 | Fusionone Inc | Data transfer and synchronization system |
CN107329998A (en) * | 2017-06-09 | 2017-11-07 | 广州虎牙信息科技有限公司 | User's increment class data capture method, device and equipment |
WO2018051096A1 (en) * | 2016-09-15 | 2018-03-22 | Gb Gas Holdings Limited | System for importing data into a data repository |
CN108897863A (en) * | 2018-06-29 | 2018-11-27 | 联想(北京)有限公司 | Method of data synchronization and its system and server cluster |
CN108897794A (en) * | 2018-06-12 | 2018-11-27 | 东软集团股份有限公司 | Synchronous method, device, storage medium and the electronic equipment of dereliction key data table |
CN108958959A (en) * | 2017-05-18 | 2018-12-07 | 北京京东尚科信息技术有限公司 | The method and apparatus for detecting hive tables of data |
CN109559808A (en) * | 2018-11-07 | 2019-04-02 | 平安医疗健康管理股份有限公司 | A kind of data processing method, device, equipment and storage medium |
CN109739936A (en) * | 2019-01-23 | 2019-05-10 | 杭州数梦工场科技有限公司 | Method of data synchronization, system, server and computer readable storage medium |
-
2019
- 2019-06-21 CN CN201910541926.4A patent/CN110442585B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001356949A (en) * | 2000-01-26 | 2001-12-26 | Fusionone Inc | Data transfer and synchronization system |
WO2018051096A1 (en) * | 2016-09-15 | 2018-03-22 | Gb Gas Holdings Limited | System for importing data into a data repository |
CN108958959A (en) * | 2017-05-18 | 2018-12-07 | 北京京东尚科信息技术有限公司 | The method and apparatus for detecting hive tables of data |
CN107329998A (en) * | 2017-06-09 | 2017-11-07 | 广州虎牙信息科技有限公司 | User's increment class data capture method, device and equipment |
CN108897794A (en) * | 2018-06-12 | 2018-11-27 | 东软集团股份有限公司 | Synchronous method, device, storage medium and the electronic equipment of dereliction key data table |
CN108897863A (en) * | 2018-06-29 | 2018-11-27 | 联想(北京)有限公司 | Method of data synchronization and its system and server cluster |
CN109559808A (en) * | 2018-11-07 | 2019-04-02 | 平安医疗健康管理股份有限公司 | A kind of data processing method, device, equipment and storage medium |
CN109739936A (en) * | 2019-01-23 | 2019-05-10 | 杭州数梦工场科技有限公司 | Method of data synchronization, system, server and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110442585A (en) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442585B (en) | Data updating method, data updating device, computer equipment and storage medium | |
US11468103B2 (en) | Relational modeler and renderer for non-relational data | |
US11086894B1 (en) | Dynamically updated data sheets using row links | |
US20120203745A1 (en) | System and method for range search over distributive storage systems | |
CN103678556A (en) | Method for processing column-oriented database and processing equipment | |
CN108536745B (en) | Shell-based data table extraction method, terminal, equipment and storage medium | |
US20170255708A1 (en) | Index structures for graph databases | |
CN109739828B (en) | Data processing method and device and computer readable storage medium | |
US11841836B2 (en) | Target environment data seeding | |
CN105900093A (en) | Keyvalue database data table updating method and data table updating device | |
CN107330024B (en) | Storage method and device of tag system data | |
CN110134681B (en) | Data storage and query method and device, computer equipment and storage medium | |
US10445370B2 (en) | Compound indexes for graph databases | |
CN111444181A (en) | Knowledge graph updating method and device and electronic equipment | |
CN111008521A (en) | Method and device for generating wide table and computer storage medium | |
CN109522332A (en) | Customer profile data merging method, device, equipment and readable storage medium storing program for executing | |
CN112463986A (en) | Information storage method and device | |
CN113553458A (en) | Data export method and device in graph database | |
CN116680278B (en) | Data processing method, device, electronic equipment and storage medium | |
CN113672618A (en) | Metadata table-based multi-tenant data processing method and device | |
CN110704635B (en) | Method and device for converting triplet data in knowledge graph | |
CN112182093A (en) | Data storage method, device, equipment and computer readable storage medium | |
CN114610959B (en) | Data processing method, device, equipment and storage medium | |
CN111460000A (en) | Backtracking data query method and system based on relational database | |
CN114356945A (en) | Data processing method, data processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |