CN110442585B - Data updating method, data updating device, computer equipment and storage medium - Google Patents

Data updating method, data updating device, computer equipment and storage medium Download PDF

Info

Publication number
CN110442585B
CN110442585B CN201910541926.4A CN201910541926A CN110442585B CN 110442585 B CN110442585 B CN 110442585B CN 201910541926 A CN201910541926 A CN 201910541926A CN 110442585 B CN110442585 B CN 110442585B
Authority
CN
China
Prior art keywords
data table
data
total
acquiring
updating method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910541926.4A
Other languages
Chinese (zh)
Other versions
CN110442585A (en
Inventor
李京京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN201910541926.4A priority Critical patent/CN110442585B/en
Publication of CN110442585A publication Critical patent/CN110442585A/en
Application granted granted Critical
Publication of CN110442585B publication Critical patent/CN110442585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data updating method, a data updating device, computer equipment and a storage medium, wherein first data synchronized from an application system before a preset time point are acquired, and the first data are imported into a first data table; acquiring second data synchronized from the application system after a preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table; combining the first data table and the second data table to generate a total data table; and removing the repeated data in the total data table to obtain an updated total data table. The data updating method, the data updating device, the computer and the storage medium provided by the invention are used for removing the duplication by combining the first data table and the second data table, the logic is simple, the trouble of manual processing is saved, a large amount of manpower and material resources are saved, and meanwhile, the accuracy of data updating is ensured.

Description

Data updating method, data updating device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer information technologies, and in particular, to a data updating method, a data updating device, a computer device, and a storage medium.
Background
With popularization and popularization of internet and big data technology, more and more data needs to be stored and processed, and data warehouse based on Hadoop and Hive distributed clusters has gradually become mainstream. For example, massive data from a service system needs to be stored by adopting a Hive table, so that the management and the query of the data are facilitated. However, due to the changes of service requirements, some tables of the service system are inevitably changed in table structure, so that the archive source data in each period is different.
In a typical star-type data warehouse, the dimension tables change slowly over time. For example, a retailer opens a new store, needs new store data to be added to the store table, or the business area or other characteristics to be tracked of an existing store change. These changes may result in insertion or modification of individual records. Hive, however, starts from version 0.14 to support row level updates. In addition, data sets are sometimes found to be erroneous and need to be corrected. Or the current data is only an approximation (e.g., only 90% of the total data, which would lag). Or business rules may require restatement of a particular transaction based on a subsequent transaction (e.g., a customer purchases a membership after purchasing some items, at which point a discounted price may be enjoyed, including previously purchased items). Or a client may request deletion of their client data after termination of the partnership.
Hive supports updates from version 0.14 and requires that the data table must support the ACID attribute. Wherein the ACID attributes include Atomic (a), consistency (Consistency, C), isolation (I), persistence (Durability, D). The Hive version used by enterprises can rarely support updating and timely updating, but more data need to be updated, users can only pull out the data in a running way for manual processing, so that a large amount of manpower and material resources are consumed, errors are generated in manual operation, and the updated data are inaccurate.
Disclosure of Invention
In view of the above, the present invention provides a question generation method, a question generation system, a computer device and a storage medium based on the topic identification, which can identify the topic in the question sentence, and generate the related questions by combining the topic and the semantic similarity, so that the generated questions more conform to the actual demands and ideas of the user.
First, to achieve the above object, the present invention provides a data updating method, including the steps of:
Acquiring first data synchronized from an application system before a preset time point, and importing the first data into a first data table;
Acquiring second data synchronized from the application system after the preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
Combining the first data table and the second data table to generate a total data table; and
And removing the repeated data in the total data table to obtain an updated total data table.
Further, before the step of obtaining the first data table and the second data table, wherein the format of the first data table is consistent with the format of the second data table, the data updating method further includes:
creating a Hive database; and
And acquiring an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source.
Further, the step of generating a total data table by combining the first data table and the second data table includes:
And performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table.
Further, the step of removing the repeated data in the total data table and obtaining the updated total data table includes:
Classifying the data in the total data table according to the primary key group to obtain a classification result;
According to the classification result, arranging the data under the same main key group according to a time sequence to obtain an arrangement result; and
And deleting the data ranked outside the preset ranking under the same primary key group according to the sequencing result, and obtaining an updated total data table.
Further, after the step of removing the repeated data in the total data table and obtaining the updated total data table, the data updating method includes:
Adding a change field in the updated total data table, and acquiring a first transition total data table to record change time of other fields in the updated total data table; and
And acquiring a synchronous total data table according to the first transition total data table and the change time.
Further, after the step of acquiring the synchronized data table according to the first transitional total data table and the change time, the data updating method includes:
adding a deletion field in the synchronous total data table, and acquiring a second transition total data table to record the deletion condition of other fields in the synchronous total data table; and
And deleting the synchronous total data table according to the deleting condition, and acquiring the synchronous data table again.
Further, after the step of obtaining the data source after the preset time point and writing the data into the second data table, the data updating method further includes:
left associating the first data table with the second data table to obtain a left associated data table;
acquiring a common main key group of the first data table and the second data table according to the left associated data table;
Covering the data of the row of the common main key group in the first data table by the data of the row of the common main key group in the second data table, and acquiring a covered data table; and
And filtering the data in the second data table to obtain the data which exists in the second data table independently, and inserting the data into the covered data table to obtain the updated data table.
In addition, to achieve the above object, the present invention also provides a data updating apparatus including:
the first acquisition module is used for acquiring first data synchronized from the application system before a preset time point and importing the first data into a first data table;
The second acquisition module is used for acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
the combination module is used for combining the first data table and the second data table to generate a total data table; and
And the de-duplication module is used for removing repeated data in the total data table and acquiring an updated total data table.
To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above method.
Compared with the prior art, the data updating method provided by the invention has the advantages that the first data synchronized from the application system is firstly obtained and is imported into the first data table, so that the first data table is used as a basic table, when the second data synchronized from the application system is obtained later, the second data is imported into the second data table, the first data table and the second data table are combined to generate the total data table, and then some repeated data in the total data table are removed, so that the updated data table is obtained. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.
Drawings
FIG. 1 is a flowchart of a data updating method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a data updating method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a data updating method according to a third embodiment of the present invention;
FIG. 4 is a flowchart of a data updating method according to a fourth embodiment of the present invention;
FIG. 5 is a flowchart of a data updating method according to a fifth embodiment of the present invention;
FIG. 6 is a flowchart of a data updating method according to a sixth embodiment of the present invention;
FIG. 7 is a block diagram of a data updating apparatus according to a seventh embodiment of the present invention;
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Data updating device 700
First acquisition module 710
Second acquisition module 720
Combined module 730
Duplicate removal module 740
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Referring to fig. 1, a first embodiment provides a data updating method. The data updating method comprises the following steps:
Step S110, first data synchronized from the application system before a preset time point is acquired, and the first data is imported into a first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The first data table is consistent with the acquired data format to ensure that the acquired data can be written to the first data table. The first data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system before a preset point in time.
For example, at 00: before 00, a batch of data is obtained from an application system, and then the data are imported into a preset first data table for subsequent use.
Step S120, obtaining second data synchronized from the application system after the preset time point, and importing the second data into a second data table, where the format of the second data table is consistent with the format of the first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The second data table is consistent with the acquired data format to ensure that the acquired data can be written to the second data table. The second data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system after a preset time point, and the format of the second data table is consistent with that of the first data table for updating the data in the first data table.
For example, at 00: after 00, a batch of data is synchronized again from the local system, and the data is imported into a second data table for subsequent updating and use.
And step S130, combining the first data table and the second data table to generate a total data table.
Specifically, a UNION operation is performed on the first data table and the second data table by using a UNION ALL command, so as to obtain the total data table. For example, the first data table is employee_China:
E_ID E_Name
01 Zhang,Hua
02 Wang,Wei
03 Carter,Thomas
04 Yang,Ming
the second data table is employee_USA:
E_ID E_Name
01 Adams,John
02 Bush,George
03 Carter,Thomas
04 Gates,Bill
Using UNION ALL commands, i.e.
SELECTE_Name FROM Employees_China
UNION ALL
SELECTE_Name FROM Employees_USA
Then a total data table is generated, i.e
E_Name
Zhang,Hua
Wang,Wei
Carter,Thomas
Yang,Ming
Adams,John
Bush,George
Carter,Thomas
Gates,Bill
That is, UNIONALL pulls the data out together for aggregation, regardless of whether the data in the first and second tables are duplicated.
And step S140, removing the repeated data in the total data table to obtain an updated total data table.
Specifically, the duplicate data in the above-described total data table is deleted by a duplicate function row_number () (for example, one Carter, thomas in the total data table is deleted), thereby obtaining an updated total data table.
By adopting the data updating method in this embodiment, first data synchronized from the application system is acquired, and the first data is imported into the first data table, so that the first data table is used as a basic table, then when second data synchronized from the application system is acquired, the second data is imported into the second data table, the first data table and the second data table are combined to generate a total data table, and then some repeated data in the total data table are removed, so that the updated data table is acquired. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.
In the second embodiment, please refer to fig. 2, step S110: the method for updating the data further comprises the steps of:
Step S210, a Hive database is created.
Specifically, create Hive database statements: hive > create database student. Then a database (e.g., a student library) is created on the Hadoop distributed file system (hdfs) at hdfs:///user/hive/warehouse/which is generated as libraries are created, where each library is a folder, the name of the library, i.e., the folder name, plus. Db (e.g., student. Db) to indicate that it is a database.
Hive is a data warehouse tool based on Hadoop, can map a structured data file into a database table, provides a simple sql query function, and can convert sql sentences into MapReduce tasks for operation. Hive's data is divided into table data, which is data that a table (table) in Hive has, and metadata; metadata is used to store the name of the table, the column and partition of the table and its attributes, the attributes of the table (whether it is an external table, etc.), the directory in which the data of the table is located, etc. Hive mainly comprises the following data models: table, external Table, partition, bucket.
Step S220, obtaining an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source.
For example, the raw data source is a student performance txt file, seven fields (ID, name, language, english, math, school, class). For example, 0001, zhang three, 99, 98, 100, school 1, class 1;0002, li four, 59, 89, 79, school 2, class 1;0003, wang five, 89, 99, 100, school 3, class 1;0004, zhang Saner, 99, 98, 100, school 1, class 1;0005, litetra, 59, 89, 79, school 2, class 1;0006, wang wubi, 89, 99, 100, school 3, class 1. The method of obtaining the original data source is not limited in this embodiment.
From the original data source in the above embodiment, a first data table of seven columns and six rows is created. Specifically, the names (such as tablescore 1) of the first data table are created through codes, an ID variable, a name variable, a language variable, an english variable, a mathematical variable, a school variable, a class variable are defined in the first data table, and the variables are defined to be separated from each other by commas, so that the first data table is created. The code for creating the first data table is as follows:
create table score1
(id string comment'ID',name string comment'name',
Chinese double comment'Chinese',
English double comment'English',
math double comment'math',
school string comment'school',
class string comment'class')
comment'score1'
row format delimited fields terminated by','
stored as textfile; thereby creating a first data table.
Likewise, a second data table may be created in the Hive database in the same manner. In addition, the first data table and the second data table need to maintain the consistency of the formats so as to be used later.
In a third embodiment, referring to fig. 3, step S130 includes:
And step S310, classifying the data in the total data table according to the primary key group to obtain a classification result.
For example, the total data table is
The primary key groups are 111, 222 and 333 respectively, so that the data in the total data table are classified, and corresponding classification results are obtained. The primary key group is set according to the condition of the total data table so as to avoid conflict among other field data.
Step S320, according to the classification result, the data under the same main key group are arranged according to the time sequence, and the arrangement result is obtained.
Specifically, a row_number function is adopted to execute related sentences, so that data under the same main key group are arranged, and a corresponding arrangement result is obtained. The statement may be: row_number () over (part by key_value_ column order by updated _date de sc)
For example, one ranking result obtained using the method described above is:
The data of the same main key group can be determined to have several import records according to the import time, and meanwhile, the data of the same main key group is determined to be the latest data according to the import time, so that the subsequent data update is padded.
And step S330, deleting the data which are ranked outside the preset ranking and are in the same primary key group according to the sequencing result, and obtaining an updated total data table.
Specifically, based on the ordering result, deleting data that is ranked after the first name under the same primary key group, e.g., based on the ordering result of the previous embodiment, deleting 111 the second name in the group, deleting 222 the second name in the group, thereby obtaining an updated total data table, i.e.
ID Name of name Age of
111 Zhang San (Zhang San) 26
222 Liwu four-element bag 32
333 Wangwu (five kinds of Chinese characters) 27
The data table is the updated data table, so that some repeated data in the first data table and the second data table are removed, and the latest data is reserved, so that the data in the data table has higher accuracy and timeliness. The arrangement order of the same primary key group may be arranged in the positive order of time or in the negative order of time, which is not limited in this embodiment. For example, when the arrangement order of the same main key group is arranged in the positive order of time, all data except the last name of the same main key group may be deleted.
In a fourth embodiment, referring to fig. 4, after step S140, the data updating method further includes:
Step S410, adding a change field to the updated total data table, and obtaining a first transitional total data table to record change time of other fields in the updated total data table.
For example, a list of change fields is added to the original updated total data table, and the change fields are used for recording the change time of other fields each time. For example, for a plurality of changes, the change time of each change is recorded correspondingly. Assume that the updated total data table is:
ID Name of name Age of
111 Zhang San (Zhang San) 23
222 Liwu four-element bag 44
333 Wangwu (five kinds of Chinese characters) 10
333 Zhao Liu A 30
I.e. the updated total data table comprises an ID field, a name field and an age field. Then, some changes are made to certain fields in the data table, such as changing Zhang Sanage to 24, changing Lifour name to Li Sai, changing age to 45, and changing Zhang Sanage to 25 at 2017/07/01:02:00, and changing Zhang Sanage to 25 at 2018/05/25 09:00:10. Then a column is added to the last column of the updated total data table and is set as a change field, and the original updated total data table is changed into a changed data table after being recorded in the change field:
The upper table is the acquired second transition total data table. Where Null indicates that the row is the original data in the updated total data table.
Step S402, according to the transition total data table and the change time, acquiring a synchronous total data table.
For example, according to the change time recorded in the change field, the obtained transition total data table is combined, and for the line where the change time is last, the other lines are deleted; and if the data is unchanged, reserving the data, and thus acquiring the synchronous data table. The synchronized data table is
ID Name of name Age of
111 Zhang San (Zhang San) 25
222 Li Sai A 45
333 Wangwu (five kinds of Chinese characters) 10
333 Zhao Liu A 30
In a fifth embodiment, referring to fig. 5, after step S420, the data updating method further includes:
Step S510, adding a deletion field to the synchronized total data table, and obtaining a second transitional total data table to record the deletion condition of other fields in the synchronized total data table.
For example, a list of deletion fields is added to the original synchronized total data table, and the deletion fields are used for recording whether a deletion operation is performed, and when the deletion operation is performed, the record is 1, and when the deletion operation is not performed, the record is 0. The synchronized data table is assumed to be:
that is, the synchronized data table includes an ID field, a name field, an age field, a change field, and a delete field. Some deletion operations are then performed on certain fields in the data table, such as deleting records of Zhang three and Zhao Liu at 2017/07/0100:02:00. Namely, the original synchronous total data table is changed into a record with a deleted field:
The table above is the second transition total data table.
And step S520, deleting the synchronous total data table according to the second transition total data table, and acquiring the synchronous data table again.
For example, in combination with the second transitional total data table obtained above, according to the deletion condition recorded in the deletion field, the line is deleted when the deletion operation is described as being performed by the line recorded as 1, the line is deleted when the deletion operation is described as being performed as being 0, and the synchronized data table is retained when the deletion operation is not described as being performed, and thus is obtained again. The synchronized data table obtained again is:
ID Name of name Age of
222 Li Sai A 45
333 Wangwu (five kinds of Chinese characters) 10
In a sixth embodiment, referring to fig. 6, after step S120, the data updating method further includes:
step S610, associating the first data table with the second data table to obtain a left associated data table.
Specifically, the first data table is used as a master table and the second data table is used as a slave table based on the master key field by using the left association related statement, so that the first data table and the second data table are associated, and the left association data table is obtained. In one embodiment, the left associated sentence may be:
Insert overwrite table A
Select NVL(B.KEY_COLUMN_1,A.KEY_COLUMN_1)ASKEY_COLUMN_1,
NVL(B.COLUMN_2,A.COLUMN_2)AS COLUMN_2,
NVL(B.COLUMN_3,A.COLUMN_3)AS COLUMN_3,
……
From A
left join B on A.KEY_COLUMN_1=B.KEY_COLUMN_1 (based on the A table, the same data as the ID in the B table is spliced to the row of the ID in the A table, and the data different from the ID in the A table is set as Null value)
For example, the first data table is:
ID COLUMN_1 COLUMN_2
111 1 A
222 3 S
333 4 D
The second data table is:
ID COLUMN_1 COLUMN_2
111 1 a
444 8 R
555 6 F
after the left association statement, the acquired left association data table is:
A_ID COLUMN_1 COLUMN_2 B_ID COLUMN_1 COLUMN_2
111 1 A 111 1 a
222 3 S Null Null Null
333 4 D Null Null Null
Where Null indicates that the two groups of primary keys, 222 and 333, have no associated data in the second data table.
Step S620, obtaining a common primary key group of the first data table and the second data table according to the left associated data table.
Specifically, according to the left associated data table, the primary key group common to the first data table and the second data table can be directly acquired. For example, the primary key field common to the first data table and the second data table may be acquired from the left associated data table in the above embodiment as 111.
Step S630, overlaying the data of the row of the common primary key group in the first data table with the data of the row of the common primary key group in the second data table, and obtaining an overlaid data table.
For example, the data of the row in the first data table is covered by the data of the row in the second data table in the common primary key group 111, so that the data in the first data table is updated, and the covered data table is obtained, that is:
ID COLUMN_1 COLUMN_2
111 1 a
222 3 S
333 4 D
Step S640, filtering the data in the second data table, obtaining the data that exists in the second data table alone, and inserting the data into the covered data table to obtain the updated data table.
Specifically, it can be found from the above steps that some data in the second data table is removed when the left association is performed, and the removed data is just newly added data, and needs to be updated into the first data table. And filtering the data in the second data table by using an exist function, screening out a part of data only existing in the second data table according to the primary key field, and directly inserting the part of data into the covered data table so as to obtain the finally updated data table. The final updated data table is:
ID COLUMN_1 COLUMN_2
111 1 a
222 3 S
333 4 D
444 8 R
555 6 F
in addition, the exist function may be:
Insert into table A
Select B.*
From B
Where not exists(select 1from A where A.KEY_COLUMN_1=B.KEY_COLUMN_1)
in a seventh embodiment, referring to fig. 7, a data updating apparatus 700 is provided. The data updating apparatus 700 includes:
the first obtaining module 710 is configured to obtain first data synchronized from the application system before a preset time point, and import the first data into the first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The first data table is consistent with the acquired data format to ensure that the acquired data can be written to the first data table. The first data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system before a preset point in time.
For example, at 00: before 00, a batch of data is obtained from an application system, and then the data are imported into a preset first data table for subsequent use.
A second obtaining module 720, configured to obtain second data synchronized from the application system after the preset time point, and import the second data into a second data table, where a format of the second data table is consistent with a format of the first data table.
The preset time point may be defined according to practical situations, which is not limited in this embodiment. The second data table is consistent with the acquired data format to ensure that the acquired data can be written to the second data table. The second data table may be an external table, an internal table, or the like, which is not limited in this embodiment. The first data table stores data synchronized from the application system after a preset time point, and the format of the second data table is consistent with that of the first data table for updating the data in the first data table.
For example, at 00: after 00, a batch of data is synchronized again from the local system, and the data is imported into a second data table for subsequent updating and use.
And a combining module 730, configured to combine the first data table and the second data table to generate a total data table.
Specifically, a UNION operation is performed on the first data table and the second data table by using a UNION ALL command, so as to obtain the total data table. For example, the first data table is employee_China:
E_ID E_Name
01 Zhang,Hua
02 Wang,Wei
03 Carter,Thomas
04 Yang,Ming
the second data table is employee_USA:
E_ID E_Name
01 Adams,John
02 Bush,George
03 Carter,Thomas
04 Gates,Bill
Using UNION ALL commands, i.e.
SELECT E_Name FROM Employees_China
UNION ALL
SELECT E_Name FROM Employees_USA
Then a total data table is generated, i.e
That is, UNIONALL pulls the data out together for aggregation, regardless of whether the data in the first and second tables are duplicated.
And the deduplication module 740 is configured to remove duplicate data in the total data table, and obtain an updated total data table.
Specifically, the duplicate data in the above-described total data table is deleted (for example, one Carter, thomas in the total data table is deleted), so that an updated total data table is obtained.
By adopting the data updating device in this embodiment, first data synchronized from the application system is acquired, and the first data is imported into the first data table, so that the first data table is used as a basic table, then when second data synchronized from the application system is acquired, the second data is imported into the second data table, the first data table and the second data table are combined to generate a total data table, and then some repeated data in the total data table are removed, so that the updated data table is acquired. The updating method has simple logic, saves the trouble of manual processing, saves a great amount of manpower and material resources, and ensures the accuracy of data updating.
The invention also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server or a cabinet server (comprising independent servers or a server cluster formed by a plurality of servers) and the like which can execute programs. The computer device of the present embodiment includes at least, but is not limited to: memory, processors, etc. that may be communicatively coupled to each other via a system bus.
The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing an electronic device, and when executed by a processor, implements the data updating method of the present invention.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (5)

1. A data updating method, characterized in that the data updating method comprises the steps of:
Acquiring first data synchronized from an application system before a preset time point, and importing the first data into a first data table;
acquiring second data synchronized from the application system after the preset time point, and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
Combining the first data table and the second data table to generate a total data table; and
Removing repeated data in the total data table to obtain an updated total data table;
the step of generating a total data table by combining the first data table and the second data table comprises:
Performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table;
The step of acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, the data updating method further includes:
creating a Hive database; and
Acquiring an original data source, and creating a first data table conforming to the format of the original data source in the Hive database according to the original data source;
after the step of removing the repeated data in the total data table and obtaining the updated total data table, the data updating method comprises the following steps:
Adding a change field in the updated total data table, and acquiring a first transition total data table to record change time of other fields in the updated total data table; and
Acquiring a synchronous total data table according to the first transition total data table and the change time;
After the step of acquiring the synchronized data table according to the first transition total data table and the change time, the data updating method includes:
adding a deletion field in the synchronous total data table, and acquiring a second transition total data table to record the deletion condition of other fields in the synchronous total data table; and
Deleting the synchronous total data table according to the deleting condition, and acquiring the synchronous data table again;
After the step of acquiring the second data synchronized from the application system after the preset time point and importing the second data into a second data table, the data updating method further includes:
left associating the first data table with the second data table to obtain a left associated data table;
acquiring a common main key group of the first data table and the second data table according to the left associated data table;
Covering the data of the row of the common main key group in the first data table by the data of the row of the common main key group in the second data table, and acquiring a covered data table; and
And filtering the data in the second data table to obtain the data which exists in the second data table independently, and inserting the data in the second data table into the covered data table to obtain the updated data table.
2. The data updating method as claimed in claim 1, wherein the step of removing the duplicate data in the total data table and obtaining the updated total data table comprises:
Classifying the data in the total data table according to the primary key group to obtain a classification result;
According to the classification result, arranging the data under the same main key group according to a time sequence to obtain an arrangement result; and
And deleting the data which are ranked outside the preset ranking under the same primary key group according to the arrangement result, and obtaining an updated total data table.
3. A data updating apparatus for performing the data updating method of claim 1 or 2, characterized in that the data updating apparatus comprises:
the first acquisition module is used for acquiring first data synchronized from the application system before a preset time point and importing the first data into a first data table;
the second acquisition module is used for acquiring second data synchronized from the application system after the preset time point and importing the second data into a second data table, wherein the format of the second data table is consistent with that of the first data table;
the combination module is used for combining the first data table and the second data table to generate a total data table; and
The de-duplication module is used for removing repeated data in the total data table and acquiring an updated total data table;
And the joint module is also used for performing UNION operation on the first data table and the second data table by utilizing a UNION ALL command to generate the total data table.
4. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the data updating method of any of claims 1 to 2 when the computer program is executed by the processor.
5. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the data updating method of any of claims 1 to 2.
CN201910541926.4A 2019-06-21 2019-06-21 Data updating method, data updating device, computer equipment and storage medium Active CN110442585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910541926.4A CN110442585B (en) 2019-06-21 2019-06-21 Data updating method, data updating device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910541926.4A CN110442585B (en) 2019-06-21 2019-06-21 Data updating method, data updating device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110442585A CN110442585A (en) 2019-11-12
CN110442585B true CN110442585B (en) 2024-04-30

Family

ID=68428719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910541926.4A Active CN110442585B (en) 2019-06-21 2019-06-21 Data updating method, data updating device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110442585B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259082B (en) * 2020-02-11 2023-07-21 深圳市六因科技有限公司 Method for realizing full data synchronization in big data environment
CN113495894A (en) * 2020-04-01 2021-10-12 北京京东振世信息技术有限公司 Data synchronization method, device, equipment and storage medium
CN111581448B (en) * 2020-05-14 2023-09-19 中国银行股份有限公司 Method and device for warehousing card bin information
CN112612839A (en) * 2020-12-28 2021-04-06 中国农业银行股份有限公司 Data processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001356949A (en) * 2000-01-26 2001-12-26 Fusionone Inc Data transfer and synchronization system
CN107329998A (en) * 2017-06-09 2017-11-07 广州虎牙信息科技有限公司 User's increment class data capture method, device and equipment
WO2018051096A1 (en) * 2016-09-15 2018-03-22 Gb Gas Holdings Limited System for importing data into a data repository
CN108897863A (en) * 2018-06-29 2018-11-27 联想(北京)有限公司 Method of data synchronization and its system and server cluster
CN108897794A (en) * 2018-06-12 2018-11-27 东软集团股份有限公司 Synchronous method, device, storage medium and the electronic equipment of dereliction key data table
CN108958959A (en) * 2017-05-18 2018-12-07 北京京东尚科信息技术有限公司 The method and apparatus for detecting hive tables of data
CN109559808A (en) * 2018-11-07 2019-04-02 平安医疗健康管理股份有限公司 A kind of data processing method, device, equipment and storage medium
CN109739936A (en) * 2019-01-23 2019-05-10 杭州数梦工场科技有限公司 Method of data synchronization, system, server and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001356949A (en) * 2000-01-26 2001-12-26 Fusionone Inc Data transfer and synchronization system
WO2018051096A1 (en) * 2016-09-15 2018-03-22 Gb Gas Holdings Limited System for importing data into a data repository
CN108958959A (en) * 2017-05-18 2018-12-07 北京京东尚科信息技术有限公司 The method and apparatus for detecting hive tables of data
CN107329998A (en) * 2017-06-09 2017-11-07 广州虎牙信息科技有限公司 User's increment class data capture method, device and equipment
CN108897794A (en) * 2018-06-12 2018-11-27 东软集团股份有限公司 Synchronous method, device, storage medium and the electronic equipment of dereliction key data table
CN108897863A (en) * 2018-06-29 2018-11-27 联想(北京)有限公司 Method of data synchronization and its system and server cluster
CN109559808A (en) * 2018-11-07 2019-04-02 平安医疗健康管理股份有限公司 A kind of data processing method, device, equipment and storage medium
CN109739936A (en) * 2019-01-23 2019-05-10 杭州数梦工场科技有限公司 Method of data synchronization, system, server and computer readable storage medium

Also Published As

Publication number Publication date
CN110442585A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN110442585B (en) Data updating method, data updating device, computer equipment and storage medium
US11468103B2 (en) Relational modeler and renderer for non-relational data
US11086894B1 (en) Dynamically updated data sheets using row links
US20120203745A1 (en) System and method for range search over distributive storage systems
CN103678556A (en) Method for processing column-oriented database and processing equipment
CN108536745B (en) Shell-based data table extraction method, terminal, equipment and storage medium
US20170255708A1 (en) Index structures for graph databases
CN109739828B (en) Data processing method and device and computer readable storage medium
US11841836B2 (en) Target environment data seeding
CN105900093A (en) Keyvalue database data table updating method and data table updating device
CN107330024B (en) Storage method and device of tag system data
CN110134681B (en) Data storage and query method and device, computer equipment and storage medium
US10445370B2 (en) Compound indexes for graph databases
CN111444181A (en) Knowledge graph updating method and device and electronic equipment
CN111008521A (en) Method and device for generating wide table and computer storage medium
CN109522332A (en) Customer profile data merging method, device, equipment and readable storage medium storing program for executing
CN112463986A (en) Information storage method and device
CN113553458A (en) Data export method and device in graph database
CN116680278B (en) Data processing method, device, electronic equipment and storage medium
CN113672618A (en) Metadata table-based multi-tenant data processing method and device
CN110704635B (en) Method and device for converting triplet data in knowledge graph
CN112182093A (en) Data storage method, device, equipment and computer readable storage medium
CN114610959B (en) Data processing method, device, equipment and storage medium
CN111460000A (en) Backtracking data query method and system based on relational database
CN114356945A (en) Data processing method, data processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant