CN110442585B

CN110442585B - Data updating method, data updating device, computer equipment and storage medium

Info

Publication number: CN110442585B
Application number: CN201910541926.4A
Authority: CN
Inventors: 李京京
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2024-04-30
Anticipated expiration: 2039-06-21
Also published as: CN110442585A

Abstract

The invention discloses a data updating method, a data updating device, computer equipment and a storage medium, wherein first data synchronized from an application system before a preset time point are acquired, and the first data are imported into a first data table; acquiring second data synchronized from the application system after a preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table; combining the first data table and the second data table to generate a total data table; and removing the repeated data in the total data table to obtain an updated total data table. The data updating method, the data updating device, the computer and the storage medium provided by the invention are used for removing the duplication by combining the first data table and the second data table, the logic is simple, the trouble of manual processing is saved, a large amount of manpower and material resources are saved, and meanwhile, the accuracy of data updating is ensured.

Description

Data updating method, data updating device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer information technologies, and in particular, to a data updating method, a data updating device, a computer device, and a storage medium.

Background

With popularization and popularization of internet and big data technology, more and more data needs to be stored and processed, and data warehouse based on Hadoop and Hive distributed clusters has gradually become mainstream. For example, massive data from a service system needs to be stored by adopting a Hive table, so that the management and the query of the data are facilitated. However, due to the changes of service requirements, some tables of the service system are inevitably changed in table structure, so that the archive source data in each period is different.

In a typical star-type data warehouse, the dimension tables change slowly over time. For example, a retailer opens a new store, needs new store data to be added to the store table, or the business area or other characteristics to be tracked of an existing store change. These changes may result in insertion or modification of individual records. Hive, however, starts from version 0.14 to support row level updates. In addition, data sets are sometimes found to be erroneous and need to be corrected. Or the current data is only an approximation (e.g., only 90% of the total data, which would lag). Or business rules may require restatement of a particular transaction based on a subsequent transaction (e.g., a customer purchases a membership after purchasing some items, at which point a discounted price may be enjoyed, including previously purchased items). Or a client may request deletion of their client data after termination of the partnership.

Hive supports updates from version 0.14 and requires that the data table must support the ACID attribute. Wherein the ACID attributes include Atomic (a), consistency (Consistency, C), isolation (I), persistence (Durability, D). The Hive version used by enterprises can rarely support updating and timely updating, but more data need to be updated, users can only pull out the data in a running way for manual processing, so that a large amount of manpower and material resources are consumed, errors are generated in manual operation, and the updated data are inaccurate.

Disclosure of Invention

In view of the above, the present invention provides a question generation method, a question generation system, a computer device and a storage medium based on the topic identification, which can identify the topic in the question sentence, and generate the related questions by combining the topic and the semantic similarity, so that the generated questions more conform to the actual demands and ideas of the user.

First, to achieve the above object, the present invention provides a data updating method, including the steps of:

Acquiring first data synchronized from an application system before a preset time point, and importing the first data into a first data table;

Acquiring second data synchronized from the application system after the preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;

Combining the first data table and the second data table to generate a total data table; and

And removing the repeated data in the total data table to obtain an updated total data table.

Further, before the step of obtaining the first data table and the second data table, wherein the format of the first data table is consistent with the format of the second data table, the data updating method further includes:

creating a Hive database; and

And acquiring an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source.

Further, the step of generating a total data table by combining the first data table and the second data table includes:

And performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table.

Further, the step of removing the repeated data in the total data table and obtaining the updated total data table includes:

Classifying the data in the total data table according to the primary key group to obtain a classification result;

According to the classification result, arranging the data under the same main key group according to a time sequence to obtain an arrangement result; and

And deleting the data ranked outside the preset ranking under the same primary key group according to the sequencing result, and obtaining an updated total data table.

Further, after the step of removing the repeated data in the total data table and obtaining the updated total data table, the data updating method includes:

Adding a change field in the updated total data table, and acquiring a first transition total data table to record change time of other fields in the updated total data table; and

And acquiring a synchronous total data table according to the first transition total data table and the change time.

Further, after the step of acquiring the synchronized data table according to the first transitional total data table and the change time, the data updating method includes:

adding a deletion field in the synchronous total data table, and acquiring a second transition total data table to record the deletion condition of other fields in the synchronous total data table; and

And deleting the synchronous total data table according to the deleting condition, and acquiring the synchronous data table again.

Further, after the step of obtaining the data source after the preset time point and writing the data into the second data table, the data updating method further includes:

left associating the first data table with the second data table to obtain a left associated data table;

acquiring a common main key group of the first data table and the second data table according to the left associated data table;

Covering the data of the row of the common main key group in the first data table by the data of the row of the common main key group in the second data table, and acquiring a covered data table; and

And filtering the data in the second data table to obtain the data which exists in the second data table independently, and inserting the data into the covered data table to obtain the updated data table.

In addition, to achieve the above object, the present invention also provides a data updating apparatus including:

the first acquisition module is used for acquiring first data synchronized from the application system before a preset time point and importing the first data into a first data table;

The second acquisition module is used for acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;

the combination module is used for combining the first data table and the second data table to generate a total data table; and

And the de-duplication module is used for removing repeated data in the total data table and acquiring an updated total data table.

To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above method.

Compared with the prior art, the data updating method provided by the invention has the advantages that the first data synchronized from the application system is firstly obtained and is imported into the first data table, so that the first data table is used as a basic table, when the second data synchronized from the application system is obtained later, the second data is imported into the second data table, the first data table and the second data table are combined to generate the total data table, and then some repeated data in the total data table are removed, so that the updated data table is obtained. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.

Drawings

FIG. 1 is a flowchart of a data updating method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a data updating method according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a data updating method according to a third embodiment of the present invention;

FIG. 4 is a flowchart of a data updating method according to a fourth embodiment of the present invention;

FIG. 5 is a flowchart of a data updating method according to a fifth embodiment of the present invention;

FIG. 6 is a flowchart of a data updating method according to a sixth embodiment of the present invention;

FIG. 7 is a block diagram of a data updating apparatus according to a seventh embodiment of the present invention;

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Data updating device	700
		First acquisition module	710
Second acquisition module	720
		Combined module	730
Duplicate removal module	740

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

Referring to fig. 1, a first embodiment provides a data updating method. The data updating method comprises the following steps:

Step S110, first data synchronized from the application system before a preset time point is acquired, and the first data is imported into a first data table.

The preset time point may be defined according to practical situations, which is not limited in this embodiment. The first data table is consistent with the acquired data format to ensure that the acquired data can be written to the first data table. The first data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system before a preset point in time.

For example, at 00: before 00, a batch of data is obtained from an application system, and then the data are imported into a preset first data table for subsequent use.

Step S120, obtaining second data synchronized from the application system after the preset time point, and importing the second data into a second data table, where the format of the second data table is consistent with the format of the first data table.

The preset time point may be defined according to practical situations, which is not limited in this embodiment. The second data table is consistent with the acquired data format to ensure that the acquired data can be written to the second data table. The second data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system after a preset time point, and the format of the second data table is consistent with that of the first data table for updating the data in the first data table.

For example, at 00: after 00, a batch of data is synchronized again from the local system, and the data is imported into a second data table for subsequent updating and use.

And step S130, combining the first data table and the second data table to generate a total data table.

Specifically, a UNION operation is performed on the first data table and the second data table by using a UNION ALL command, so as to obtain the total data table. For example, the first data table is employee_China:

E_ID	E_Name
		01	Zhang,Hua
02	Wang,Wei
		03	Carter,Thomas
04	Yang,Ming

the second data table is employee_USA:

E_ID	E_Name
		01	Adams,John
02	Bush,George
		03	Carter,Thomas
04	Gates,Bill

Using UNION ALL commands, i.e.

SELECTE_Name FROM Employees_China

UNION ALL

SELECTE_Name FROM Employees_USA

Then a total data table is generated, i.e

E_Name
	Zhang,Hua
Wang,Wei
	Carter,Thomas
Yang,Ming
	Adams,John
Bush,George
	Carter,Thomas
Gates,Bill

That is, UNIONALL pulls the data out together for aggregation, regardless of whether the data in the first and second tables are duplicated.

And step S140, removing the repeated data in the total data table to obtain an updated total data table.

Specifically, the duplicate data in the above-described total data table is deleted by a duplicate function row_number () (for example, one Carter, thomas in the total data table is deleted), thereby obtaining an updated total data table.

By adopting the data updating method in this embodiment, first data synchronized from the application system is acquired, and the first data is imported into the first data table, so that the first data table is used as a basic table, then when second data synchronized from the application system is acquired, the second data is imported into the second data table, the first data table and the second data table are combined to generate a total data table, and then some repeated data in the total data table are removed, so that the updated data table is acquired. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.

In the second embodiment, please refer to fig. 2, step S110: the method for updating the data further comprises the steps of:

Step S210, a Hive database is created.

Specifically, create Hive database statements: hive > create database student. Then a database (e.g., a student library) is created on the Hadoop distributed file system (hdfs) at hdfs:///user/hive/warehouse/which is generated as libraries are created, where each library is a folder, the name of the library, i.e., the folder name, plus. Db (e.g., student. Db) to indicate that it is a database.

Hive is a data warehouse tool based on Hadoop, can map a structured data file into a database table, provides a simple sql query function, and can convert sql sentences into MapReduce tasks for operation. Hive's data is divided into table data, which is data that a table (table) in Hive has, and metadata; metadata is used to store the name of the table, the column and partition of the table and its attributes, the attributes of the table (whether it is an external table, etc.), the directory in which the data of the table is located, etc. Hive mainly comprises the following data models: table, external Table, partition, bucket.

Step S220, obtaining an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source.

For example, the raw data source is a student performance txt file, seven fields (ID, name, language, english, math, school, class). For example, 0001, zhang three, 99, 98, 100, school 1, class 1;0002, li four, 59, 89, 79, school 2, class 1;0003, wang five, 89, 99, 100, school 3, class 1;0004, zhang Saner, 99, 98, 100, school 1, class 1;0005, litetra, 59, 89, 79, school 2, class 1;0006, wang wubi, 89, 99, 100, school 3, class 1. The method of obtaining the original data source is not limited in this embodiment.

From the original data source in the above embodiment, a first data table of seven columns and six rows is created. Specifically, the names (such as tablescore 1) of the first data table are created through codes, an ID variable, a name variable, a language variable, an english variable, a mathematical variable, a school variable, a class variable are defined in the first data table, and the variables are defined to be separated from each other by commas, so that the first data table is created. The code for creating the first data table is as follows:

create table score1

(id string comment'ID',name string comment'name',

Chinese double comment'Chinese',

English double comment'English',

math double comment'math',

school string comment'school',

class string comment'class')

comment'score1'

row format delimited fields terminated by','

stored as textfile; thereby creating a first data table.

Likewise, a second data table may be created in the Hive database in the same manner. In addition, the first data table and the second data table need to maintain the consistency of the formats so as to be used later.

In a third embodiment, referring to fig. 3, step S130 includes:

And step S310, classifying the data in the total data table according to the primary key group to obtain a classification result.

For example, the total data table is

The primary key groups are 111, 222 and 333 respectively, so that the data in the total data table are classified, and corresponding classification results are obtained. The primary key group is set according to the condition of the total data table so as to avoid conflict among other field data.

Step S320, according to the classification result, the data under the same main key group are arranged according to the time sequence, and the arrangement result is obtained.

Specifically, a row_number function is adopted to execute related sentences, so that data under the same main key group are arranged, and a corresponding arrangement result is obtained. The statement may be: row_number () over (part by key_value_ column order by updated _date de sc)

For example, one ranking result obtained using the method described above is:

The data of the same main key group can be determined to have several import records according to the import time, and meanwhile, the data of the same main key group is determined to be the latest data according to the import time, so that the subsequent data update is padded.

And step S330, deleting the data which are ranked outside the preset ranking and are in the same primary key group according to the sequencing result, and obtaining an updated total data table.

Specifically, based on the ordering result, deleting data that is ranked after the first name under the same primary key group, e.g., based on the ordering result of the previous embodiment, deleting 111 the second name in the group, deleting 222 the second name in the group, thereby obtaining an updated total data table, i.e.

ID	Name of name	Age of
			111	Zhang San (Zhang San)	26
222	Liwu four-element bag	32
			333	Wangwu (five kinds of Chinese characters)	27

The data table is the updated data table, so that some repeated data in the first data table and the second data table are removed, and the latest data is reserved, so that the data in the data table has higher accuracy and timeliness. The arrangement order of the same primary key group may be arranged in the positive order of time or in the negative order of time, which is not limited in this embodiment. For example, when the arrangement order of the same main key group is arranged in the positive order of time, all data except the last name of the same main key group may be deleted.

In a fourth embodiment, referring to fig. 4, after step S140, the data updating method further includes:

Step S410, adding a change field to the updated total data table, and obtaining a first transitional total data table to record change time of other fields in the updated total data table.

For example, a list of change fields is added to the original updated total data table, and the change fields are used for recording the change time of other fields each time. For example, for a plurality of changes, the change time of each change is recorded correspondingly. Assume that the updated total data table is:

ID	Name of name	Age of
			111	Zhang San (Zhang San)	23
222	Liwu four-element bag	44
			333	Wangwu (five kinds of Chinese characters)	10
333	Zhao Liu A	30

I.e. the updated total data table comprises an ID field, a name field and an age field. Then, some changes are made to certain fields in the data table, such as changing Zhang Sanage to 24, changing Lifour name to Li Sai, changing age to 45, and changing Zhang Sanage to 25 at 2017/07/01:02:00, and changing Zhang Sanage to 25 at 2018/05/25 09:00:10. Then a column is added to the last column of the updated total data table and is set as a change field, and the original updated total data table is changed into a changed data table after being recorded in the change field:

The upper table is the acquired second transition total data table. Where Null indicates that the row is the original data in the updated total data table.

Step S402, according to the transition total data table and the change time, acquiring a synchronous total data table.

For example, according to the change time recorded in the change field, the obtained transition total data table is combined, and for the line where the change time is last, the other lines are deleted; and if the data is unchanged, reserving the data, and thus acquiring the synchronous data table. The synchronized data table is

ID	Name of name	Age of
			111	Zhang San (Zhang San)	25
222	Li Sai A	45
			333	Wangwu (five kinds of Chinese characters)	10
333	Zhao Liu A	30

In a fifth embodiment, referring to fig. 5, after step S420, the data updating method further includes:

Step S510, adding a deletion field to the synchronized total data table, and obtaining a second transitional total data table to record the deletion condition of other fields in the synchronized total data table.

For example, a list of deletion fields is added to the original synchronized total data table, and the deletion fields are used for recording whether a deletion operation is performed, and when the deletion operation is performed, the record is 1, and when the deletion operation is not performed, the record is 0. The synchronized data table is assumed to be:

that is, the synchronized data table includes an ID field, a name field, an age field, a change field, and a delete field. Some deletion operations are then performed on certain fields in the data table, such as deleting records of Zhang three and Zhao Liu at 2017/07/0100:02:00. Namely, the original synchronous total data table is changed into a record with a deleted field:

The table above is the second transition total data table.

And step S520, deleting the synchronous total data table according to the second transition total data table, and acquiring the synchronous data table again.

For example, in combination with the second transitional total data table obtained above, according to the deletion condition recorded in the deletion field, the line is deleted when the deletion operation is described as being performed by the line recorded as 1, the line is deleted when the deletion operation is described as being performed as being 0, and the synchronized data table is retained when the deletion operation is not described as being performed, and thus is obtained again. The synchronized data table obtained again is:

ID	Name of name	Age of
			222	Li Sai A	45
333	Wangwu (five kinds of Chinese characters)	10

In a sixth embodiment, referring to fig. 6, after step S120, the data updating method further includes:

step S610, associating the first data table with the second data table to obtain a left associated data table.

Specifically, the first data table is used as a master table and the second data table is used as a slave table based on the master key field by using the left association related statement, so that the first data table and the second data table are associated, and the left association data table is obtained. In one embodiment, the left associated sentence may be:

Insert overwrite table A

Select NVL(B.KEY_COLUMN_1,A.KEY_COLUMN_1)ASKEY_COLUMN_1,

NVL(B.COLUMN_2,A.COLUMN_2)AS COLUMN_2,

NVL(B.COLUMN_3,A.COLUMN_3)AS COLUMN_3,

……

From A

left join B on A.KEY_COLUMN_1=B.KEY_COLUMN_1 (based on the A table, the same data as the ID in the B table is spliced to the row of the ID in the A table, and the data different from the ID in the A table is set as Null value)

For example, the first data table is:

ID	COLUMN_1	COLUMN_2
			111	1	A
222	3	S
			333	4	D

The second data table is:

ID	COLUMN_1	COLUMN_2
			111	1	a
444	8	R
			555	6	F

after the left association statement, the acquired left association data table is:

A_ID	COLUMN_1	COLUMN_2	B_ID	COLUMN_1	COLUMN_2
						111	1	A	111	1	a
222	3	S	Null	Null	Null
						333	4	D	Null	Null	Null

Where Null indicates that the two groups of primary keys, 222 and 333, have no associated data in the second data table.

Step S620, obtaining a common primary key group of the first data table and the second data table according to the left associated data table.

Specifically, according to the left associated data table, the primary key group common to the first data table and the second data table can be directly acquired. For example, the primary key field common to the first data table and the second data table may be acquired from the left associated data table in the above embodiment as 111.

Step S630, overlaying the data of the row of the common primary key group in the first data table with the data of the row of the common primary key group in the second data table, and obtaining an overlaid data table.

For example, the data of the row in the first data table is covered by the data of the row in the second data table in the common primary key group 111, so that the data in the first data table is updated, and the covered data table is obtained, that is:

ID	COLUMN_1	COLUMN_2
			111	1	a
222	3	S
			333	4	D

Step S640, filtering the data in the second data table, obtaining the data that exists in the second data table alone, and inserting the data into the covered data table to obtain the updated data table.

Specifically, it can be found from the above steps that some data in the second data table is removed when the left association is performed, and the removed data is just newly added data, and needs to be updated into the first data table. And filtering the data in the second data table by using an exist function, screening out a part of data only existing in the second data table according to the primary key field, and directly inserting the part of data into the covered data table so as to obtain the finally updated data table. The final updated data table is:

ID	COLUMN_1	COLUMN_2
			111	1	a
222	3	S
			333	4	D
444	8	R
			555	6	F

in addition, the exist function may be:

Insert into table A

Select B.*

From B

Where not exists(select 1from A where A.KEY_COLUMN_1＝B.KEY_COLUMN_1)

in a seventh embodiment, referring to fig. 7, a data updating apparatus 700 is provided. The data updating apparatus 700 includes:

the first obtaining module 710 is configured to obtain first data synchronized from the application system before a preset time point, and import the first data into the first data table.

A second obtaining module 720, configured to obtain second data synchronized from the application system after the preset time point, and import the second data into a second data table, where a format of the second data table is consistent with a format of the first data table.

And a combining module 730, configured to combine the first data table and the second data table to generate a total data table.

E_ID	E_Name
		01	Zhang,Hua
02	Wang,Wei
		03	Carter,Thomas
04	Yang,Ming

the second data table is employee_USA:

E_ID	E_Name
		01	Adams,John
02	Bush,George
		03	Carter,Thomas
04	Gates,Bill

Using UNION ALL commands, i.e.

SELECT E_Name FROM Employees_China

UNION ALL

SELECT E_Name FROM Employees_USA

Then a total data table is generated, i.e

And the deduplication module 740 is configured to remove duplicate data in the total data table, and obtain an updated total data table.

Specifically, the duplicate data in the above-described total data table is deleted (for example, one Carter, thomas in the total data table is deleted), so that an updated total data table is obtained.

By adopting the data updating device in this embodiment, first data synchronized from the application system is acquired, and the first data is imported into the first data table, so that the first data table is used as a basic table, then when second data synchronized from the application system is acquired, the second data is imported into the second data table, the first data table and the second data table are combined to generate a total data table, and then some repeated data in the total data table are removed, so that the updated data table is acquired. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.

The invention also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server or a cabinet server (comprising independent servers or a server cluster formed by a plurality of servers) and the like which can execute programs. The computer device of the present embodiment includes at least, but is not limited to: memory, processors, etc. that may be communicatively coupled to each other via a system bus.

The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing an electronic device, and when executed by a processor, implements the data updating method of the present invention.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A data updating method, characterized in that the data updating method comprises the steps of:

Removing repeated data in the total data table to obtain an updated total data table;

the step of generating a total data table by combining the first data table and the second data table comprises:

Performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table;

The step of acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, the data updating method further includes:

creating a Hive database; and

Acquiring an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source;

after the step of removing the repeated data in the total data table and obtaining the updated total data table, the data updating method comprises the following steps:

Acquiring a synchronous total data table according to the first transition total data table and the change time;

After the step of acquiring the synchronized data table according to the first transition total data table and the change time, the data updating method includes:

Deleting the synchronous total data table according to the deleting condition, and acquiring the synchronous data table again;

After the step of acquiring the second data synchronized from the application system after the preset time point and importing the second data into a second data table, the data updating method further includes:

And filtering the data in the second data table to obtain the data which exists in the second data table independently, and inserting the data in the second data table into the covered data table to obtain the updated data table.

2. The data updating method as claimed in claim 1, wherein the step of removing the duplicate data in the total data table and obtaining the updated total data table comprises:

And deleting the data which are ranked outside the preset ranking under the same primary key group according to the arrangement result, and obtaining an updated total data table.

3. A data updating apparatus for performing the data updating method of claim 1 or 2, characterized in that the data updating apparatus comprises:

The de-duplication module is used for removing repeated data in the total data table and acquiring an updated total data table;

And the joint module is also used for performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table.

4. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the data updating method of any of claims 1 to 2 when the computer program is executed by the processor.

5. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the data updating method of any of claims 1 to 2.