CN113157716B - Data processing method, device, equipment and medium - Google Patents

Data processing method, device, equipment and medium Download PDF

Info

Publication number
CN113157716B
CN113157716B CN202110520749.9A CN202110520749A CN113157716B CN 113157716 B CN113157716 B CN 113157716B CN 202110520749 A CN202110520749 A CN 202110520749A CN 113157716 B CN113157716 B CN 113157716B
Authority
CN
China
Prior art keywords
time
target
identification information
data
data information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110520749.9A
Other languages
Chinese (zh)
Other versions
CN113157716A (en
Inventor
张帅帅
蔡辉
李元洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202110520749.9A priority Critical patent/CN113157716B/en
Publication of CN113157716A publication Critical patent/CN113157716A/en
Application granted granted Critical
Publication of CN113157716B publication Critical patent/CN113157716B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a data processing method, apparatus, device, and medium. Because the target data information corresponds to the first version time, the first version time identifies the time when the target data information is updated, the index document of the target identification information corresponds to the second version time, the second version time identifies the time when the index document of the target identification information is updated, and then when the index document of the target identification information is updated according to the target data information, the comparison result of the second version time and the first version time of the index document of the target identification information can be considered, so that the problem that the updated data in the index document of the target identification information is replaced by the target data information when the updated time is earlier than the updated time of the index document of the target identification information is effectively avoided, and the accuracy and the stability of data synchronization are effectively improved.

Description

Data processing method, device, equipment and medium
Technical Field
The disclosure relates to the technical field of big data, and in particular relates to a data processing method, a device, equipment and a medium.
Background
Many enterprises construct their own search services on a search platform based on search engines, such as a full text search engine (Lucene), a distributed full text search engine (elastic search), an enterprise-level search application server (Solr), etc., in order to facilitate users inside and outside the enterprise to inquire about contents to be consulted. In order to realize that the user can inquire the information to be consulted on the search platform, the search engine needs to acquire and store the content which can be inquired in the database, namely, the content which can be inquired in the database needs to be synchronously processed. When the data stored in the database is changed, the search engine also needs to timely and accurately acquire the changed data, and update the stored data according to the changed data so as to ensure the accuracy of information queried by subsequent users. Therefore, how to accurately and quickly synchronize changed data to be synchronized into a search engine is a problem of increasing attention in recent years.
Disclosure of Invention
The disclosure provides a data processing method, a device, equipment and a medium, which are used for solving the problem that the prior data to be synchronized which is changed cannot be accurately synchronized to a search engine.
The present disclosure provides a data processing method, the method comprising:
acquiring first version time of changed target data information and target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used to identify a time when the index document of the target identification information is updated.
In one possible implementation manner, the obtaining the changed target data information includes:
and determining the changed target data information according to the data information carried in the received change notification message.
In one possible implementation manner, the obtaining the changed target data information includes:
if the first time meets the preset determining condition, determining the data information to be synchronized, which is changed in the target time period; determining any changed data information to be synchronized as the target data information; the target time period is a time interval between the first time and a second time; the second time is the last time determined to meet the determined condition.
In one possible implementation manner, the determining that the first time meets the preset determining condition includes:
if the first time is determined to be the time when the full-quantity synchronous instruction is received, determining that the first time meets a preset determination condition; the full synchronization instruction is an instruction for respectively synchronizing each data to be synchronized at the first time into a corresponding index document; or (b)
And if the time interval between the first time and the second time reaches the preset duration, determining that the first time meets the preset determining condition.
In one possible implementation manner, the determining the data information to be synchronized, which is changed within the target time period, includes:
acquiring each piece of data to be synchronized in the first time offline, and summarizing each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized;
for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information;
acquiring each piece of second data information corresponding to the second time;
and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information, and updating each piece of second data information according to each piece of first data information.
In one possible implementation manner, the determining the data information to be synchronized, which is changed in the target time period, according to the first data information and the second data information includes:
for the identification information contained in each piece of first data information, if the fact that the second data information containing the identification information does not exist is determined, the first data information containing the identification information is determined to be changed data information to be synchronized; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
In one possible embodiment, the method further comprises:
acquiring a target change type of the target data information;
judging whether the target change type is a deletion type or not; the deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted;
if the target change type is a deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information;
And if the target change type is not the deletion type, executing the step of determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
In one possible implementation manner, the determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information includes:
if the second version time of the index document without the target identification information is determined to be not longer than the first version time, updating the index document of the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
and if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, determining that the index document with the target identification information is not updated.
In one possible implementation, the first version time is obtained by:
if the target data information is determined by the data information to be synchronized, which is changed in the target time period, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; determining the first version time according to the updated time and the first time;
if the target data information is determined through the received notification message, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
The present disclosure provides a data processing apparatus, the apparatus comprising:
the processing unit is used for acquiring the first version time of the changed target data information and the target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
an updating unit, configured to determine whether to update the index document of the target identification information according to a comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used to identify a time when the index document of the target identification information is updated.
In a possible implementation manner, the processing unit is specifically configured to determine the changed target data information according to the data information carried in the received change notification message.
In one possible implementation manner, the processing unit is specifically configured to determine to-be-synchronized data information that is changed in the target time period if it is determined that the first time meets a preset determination condition; determining any changed data information to be synchronized as the target data information; the target time period is a time interval between the first time and a second time; the second time is the last time determined to meet the determined condition.
In a possible implementation manner, the processing unit is specifically configured to determine that the first time meets a preset determination condition if the first time is determined to be the time when the full synchronization instruction is received; the full synchronization instruction is an instruction for respectively synchronizing each data to be synchronized at the first time into a corresponding index document; or if the time interval between the first time and the second time reaches the preset duration, determining that the first time meets the preset determination condition.
In a possible implementation manner, the processing unit is specifically configured to acquire each piece of data to be synchronized in the first time offline, and aggregate each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized; for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information; acquiring each piece of second data information corresponding to the second time; and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information, and updating each piece of second data information according to each piece of first data information.
In a possible implementation manner, the processing unit is specifically configured to determine, for each piece of first data information, the first data information including the identification information as to-be-synchronized data information that is changed if it is determined that there is no second data information including the identification information; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
In a possible implementation manner, the obtaining unit is further configured to obtain a target change type of the target data information;
the updating unit is further used for judging whether the target change type is a deletion type; the deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted; if the target change type is a deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information; and if the target change type is not the deletion type, executing the step of determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
In a possible implementation manner, the updating unit is specifically configured to update the index document of the target identification information according to the target data information if it is determined that the second version time of the index document of the target identification information does not exist, or the second version time is not later than the first version time; updating the second version time of the index document of the updated target identification information according to the first version time; and if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, determining that the index document with the target identification information is not updated.
In a possible implementation manner, the updating unit is specifically configured to obtain the first version time by:
if the target data information is determined by the changed data information to be synchronized, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; determining the first version time according to the updated time and the first time;
if the target data information is determined through the received notification message, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
The present disclosure provides an electronic device comprising at least a processor and a memory, the processor being adapted to implement the steps of any of the data processing methods described above when executing a computer program stored in the memory.
The present disclosure provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of a data processing method as described in any one of the above.
In the method, the first version time is corresponding to the changed target data information, the first version time is used for identifying the updated time of the target data information, the second version time is also corresponding to the updated time of the index document of the target identification information, the second version time is used for identifying the updated time of the index document of the target identification information, and the comparison result of the second version time and the first version time of the index document of the target identification information can be considered when the index document of the target identification information is updated according to the target data information, so that the problem that the updated data in the index document of the target identification information is replaced by the target data information when the updated time of the target data information is earlier than the updated time of the index document of the target identification information is effectively avoided, and the accuracy and stability of data synchronization are effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a real-time task flow provided in an embodiment of the disclosure;
FIG. 2 is a schematic diagram of a full-scale task flow provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a data processing process according to an embodiment of the disclosure;
FIG. 4 is a schematic diagram of a specific data processing flow provided in the present disclosure;
FIG. 5 is a schematic diagram of a specific data processing flow provided in the present disclosure;
fig. 6 is a schematic view of a specific data processing procedure according to an embodiment of the disclosure;
fig. 7 is a schematic view of a specific data processing procedure according to an embodiment of the disclosure;
FIG. 8 is a schematic diagram of a data processing process in a specific music database provided in the present disclosure;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The present disclosure will be described in further detail below with reference to the attached drawings, wherein it is apparent that the described embodiments are only some, but not all embodiments of the present disclosure. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In this document, it should be understood that any number of elements in the drawings is for illustration and not limitation, and that any naming is used only for distinction and not for any limitation.
For ease of understanding, some of the concepts involved in the embodiments of the present disclosure are explained below:
document: the content carrier with a certain data structure in the search engine corresponds to a row record of a table of the relational database.
Index: collections of identical structured documents in a search engine.
Source table: relative to a table structure in the index for storing data of the document content source.
Real-time tasks: and monitoring the task of changing the source list and synchronizing the document in real time.
Full-scale task-the task of building and synchronizing documents to a search engine by traversing all source table data.
As internet technology evolves, more and more users will use search platforms to search for content that is desired to be queried. Two key technologies are generally used to implement content querying on a search platform, which are respectively: techniques for synchronizing documents in a database into a search engine, and techniques for relevance matching based on search keywords entered by a user and content of documents saved in the search engine. Among other things, techniques for synchronizing documents in a database into a search engine are a precondition for enabling a user to query data on a search platform.
When the data stored in the database is synchronized into the search engine for the first time, the data in the database is summarized and cleaned to obtain a data structure (for example, a document) suitable for being stored in the search engine, and then the obtained document is synchronized into the search engine. However, since the data stored in the database may change at any time, it is necessary to monitor in real time whether the data stored in the database is changed, for example, add/delete data, change data, etc., and synchronize the changed data to be synchronized to the search engine in time, so that the user can query the updated data, and ensure the accuracy of the content that can be queried in the search engine. Therefore, how to accurately and quickly synchronize changed data to be synchronized into a search engine is a problem of increasing attention in recent years.
In the related art, a real-time task and a full-scale task are generally adopted to accurately and rapidly synchronize changed data to be synchronized into a search engine. The real-time task can ensure the real-time performance of synchronizing the changed data to be synchronized into the search engine, and the full-quantity task can ensure that the changed data to be synchronized can be synchronized into the search engine. The following describes the real-time task and the full-scale task in detail:
1. Real-time tasks. For convenience of description of real-time tasks, the description is presented with reference to the accompanying drawings:
fig. 1 is a schematic diagram of a real-time task flow provided in an embodiment of the disclosure. As shown in fig. 1, when there is a change in data in the source table stored in the database, the database transmits an index synchronization request for changing the target data information to the electronic device for index synchronization. And after the electronic equipment receives the index synchronization request, acquiring related data in a source table where the target data information is located in the database. And generating a document according to the acquired related data and sending the document to a search engine so that the search engine updates the stored document according to the document.
2. Full-scale tasks. For convenience in describing the full volume task, the following description is made with reference to the accompanying drawings:
fig. 2 is a schematic diagram of a full-scale task flow provided in an embodiment of the disclosure. As shown in fig. 2, the electronic device for index synchronization acquires data in the source table stored in the database according to a preset period, that is, traverses the data in the source table stored in the database at regular time. And summarizing and cleaning the acquired data to acquire each document. Each document generated is then sent to the search engine so that the search engine updates each document saved based on each document received.
The problem that the data stored in the database is inconsistent with the updated data in the search engine exists when the data in the search engine is updated whether the data is a real-time task or a full-quantity task.
For example, for a real-time task, the problem that the sequence of the multiple index synchronization requests is inconsistent with the sequence of sending the generated documents to the search engine may cause the electronic device to acquire the related data of the changed data to be synchronized again each time, and then the documents corresponding to the changed data to be synchronized are sent to the search engine, so that the electronic device may update the stored index document according to the received documents, that is, update the index document corresponding to the changed data to be synchronized in the search engine according to the documents corresponding to the changed data to be synchronized.
For another example, for the full-volume task, after the electronic device acquires each data contained in the database, before the search engine updates the saved index document based on the received document, if the data contained in the database is changed again, the data synchronized by the full-volume task will be replaced by the data synchronized by the real-time task, so that the problem that the data saved in the database is inconsistent with the data in the index document in the search engine is caused.
Therefore, in order to solve the problem that the existing documents in the search engine cannot be updated timely and accurately, the disclosure provides a data processing method, a device, equipment and a medium. In the method, the first version time is corresponding to the changed target data information, the first version time is used for identifying the updated time of the target data information, the second version time is also corresponding to the updated time of the index document of the target identification information, the second version time is used for identifying the updated time of the index document of the target identification information, and the comparison result of the second version time and the first version time of the index document of the target identification information can be considered when the index document of the target identification information is updated according to the target data information, so that the problem that the updated data in the index document of the target identification information is replaced by the target data information when the updated time of the target data information is earlier than the updated time of the index document of the target identification information is effectively avoided, and the accuracy and stability of data synchronization are effectively improved.
Fig. 3 is a schematic diagram of a data processing procedure according to an embodiment of the disclosure, where the procedure includes:
s301: acquiring first version time of changed target data information and target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated.
The data processing method provided by the disclosure can be applied to electronic equipment, and the electronic equipment can be a server, intelligent equipment and the like. In the implementation process, the method can be flexibly set according to actual requirements, and is not particularly limited.
In order to conveniently and accurately determine an index document to be updated, in the present disclosure, when data to be synchronized is changed, an electronic device may acquire changed target data information, where the target data information includes target identification information. Subsequently, the electronic device can quickly determine the index document to be updated according to the target identification information. The target identification information is used for indicating an index document which needs to be updated, and the index document can be a document stored in a search engine, a document stored in a backup database, and the like. In the implementation process, the method can be flexibly set according to actual requirements, and is not particularly limited.
The data to be synchronized may be stored in a database or in a storage area. The device storing the data to be synchronized may be the same as or different from the electronic device.
In one example, the target data information may include only the target identification information and the changed data to be synchronized.
In another example, the target data information may include target identification information and each piece of data to be synchronized corresponding to the target identification information. Each data to be synchronized includes the changed data to be synchronized, and may also include the data to be synchronized which is not changed.
It should be noted that, the target identification information may be represented by a number, a character string, or the like, or may be represented by another form, so long as the unique representation form capable of identifying the document can be used for the identification information of the index document in the present disclosure.
In order to update the document conveniently and accurately, in the present disclosure, when the data to be synchronized is changed, the electronic device also obtains the version time (denoted as the first version time) of the changed target data information. The first version time is used to identify a time when the target data information is updated. For example, the first version time is used to identify the time when the target data information was updated last. And then, according to the first version time, carrying out corresponding processing to determine whether to update the index document of the target identification information.
S302: determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used to identify a time when the index document of the target identification information is updated.
In order to solve the problem that when the time for updating the target data information is earlier than the time for updating the index document of the target identification information, the index document of the target identification information is directly updated according to the target data information, so that the later updated data in the index document of the target identification information is replaced by the target data information, in the present disclosure, the index document of the target identification information also corresponds to the version time (marked as the second version time). After the target identification information and the first version time of the target data information are acquired based on the above embodiment, the second version time of the index document of the target identification information is acquired, and the first version time is compared with the second version time. And determining whether to update the index document of the target identification information according to the comparison result. Wherein the second version time is used to identify a time when the index document of the target identification information is updated. For example, the second version time is determined by a time at which the index document identifying the target identification information was updated at the latest.
In one example, the index documents of the target identification information are generally updated in the order of the updated times, so when the first version time is later than the second version time, indicating that the change of the target data information occurs after the time when the index documents of the target identification information are updated, the index documents of the target identification information may be updated according to the target data information.
When the first version time is earlier than the second version time, it is indicated that the change of the target data information occurs before the time when the index document of the target identification information is updated, and the index document of the target identification information may not be updated.
In a possible implementation manner, if the target data information only includes the target identification information and the changed data to be synchronized, in order to accurately update the index document of the target identification information, when it is determined that the index document of the target identification information is updated according to the target data information, each piece of data to be synchronized corresponding to the target identification information may be obtained from the device storing the data to be synchronized, that is, each piece of data to be synchronized corresponding to the target identification information stored in the device storing the data to be synchronized is retrieved, and the target document is constructed according to the target identification information and each piece of data to be synchronized corresponding to the target identification information. And updating the index document of the target identification information according to the target document.
In another possible implementation manner, if the target data information includes the target identification information and each piece of data to be synchronized corresponding to the target identification information, the target document may be constructed according to the target data information. And updating the index document of the target identification information according to the target document.
In the method, the first version time is corresponding to the changed target data information, the first version time is used for identifying the updated time of the target data information, the second version time is also corresponding to the updated time of the index document of the target identification information, the second version time is used for identifying the updated time of the index document of the target identification information, and the comparison result of the second version time and the first version time of the index document of the target identification information can be considered when the index document of the target identification information is updated according to the target data information, so that the problem that the updated data in the index document of the target identification information is replaced by the target data information when the updated time of the target data information is earlier than the updated time of the index document of the target identification information is effectively avoided, and the accuracy and stability of data synchronization are effectively improved.
In order to update the document in time, in the present disclosure, the obtaining the changed target data information includes:
and determining the changed target data information according to the data information carried in the received change notification message.
In one possible implementation manner, since there may be a situation that the data to be synchronized is changed at any time, in order to process the changed data to be synchronized in time, the data change situation in the device storing the data to be synchronized may be monitored in real time, for example, by monitoring a change event of a database binlog of the device storing the data to be synchronized. When the data to be synchronized is determined to change, after the electronic equipment receives the change notification message, the change notification message is analyzed, and target data information carried in the change notification message is obtained.
In another possible embodiment, the obtaining the changed target data information includes:
if the first time meets the preset determining condition, determining the data information to be synchronized, which is changed in the target time period; determining any changed data information to be synchronized as the target data information; the target time period is a time interval between the first time and a second time; the second time is the last time determined to meet the determined condition.
Because there may be a case that the data to be synchronized is changed at regular time or that part of the data is not synchronized to the electronic device in the process of synchronizing the changed data to be synchronized in real time, in order to ensure that the electronic device can acquire and synchronize all the changed data to be synchronized, a certain condition is preset. The determining condition may be that the first time is a preset time, or that a time interval between the first time and the second time reaches a preset duration, or that the first time is a time when the full synchronization instruction is received. The electronic device can determine whether to acquire the changed target data information according to whether the first time meets a preset determination condition. The first time may be the current time, or may be a time of a preset duration before the current time, or may be a preset certain time. The second time is a time before the first time, and the second time is a time determined last time and satisfying the determination condition, and it can be understood that the second time is a last first time.
In an example, if the preset determination condition is that the first time is the time when the full-volume synchronization instruction is received, when the user wants to synchronize each piece of changed data to be synchronized in the electronic device in all pieces of currently stored data to be synchronized, the full-volume synchronization instruction can be input through the intelligent device. The user inputs the full synchronization command through the intelligent device in many ways, such as a mouse, a helmet, a remote controller and the like, through the operation of the control device, through the voice, through the operation of a display of the intelligent device, and through the operation of a hardware button on the intelligent device. In the implementation process, the method can be flexibly set according to actual requirements, and is not particularly limited. After receiving the full-volume synchronous instruction input by the user, the intelligent device can send the full-volume synchronous instruction to the electronic device. The electronic equipment determines the time of receiving the full-quantity synchronous instruction as first time, and determines that the first time meets a preset determination condition. The full synchronization instruction is an instruction for instructing the electronic device to synchronize each piece of data to be synchronized at the first time into a corresponding index document.
It should be noted that the electronic device may be the same as or different from the smart device. In the implementation process, the method can be flexibly set according to actual requirements, and is not particularly limited.
In another example, if the preset determination condition is that the time interval between the first time and the second time reaches the preset duration, when the electronic device determines that the duration recorded on the timer reaches the preset duration, the time when the duration recorded on the timer reaches the preset duration may be determined to be the first time that satisfies the preset determination condition. The time length recorded on the timer is the time interval between the first time and the second time, and the timer is cleared each time the time length recorded on the timer reaches the preset time length.
In the related art, when the data amount of the data to be synchronized is relatively large, for example, hundreds of millions of data to be synchronized, all the data to be synchronized are summarized and cleaned, so that it is very time-consuming to generate each document and synchronize each document into a search engine, and it is also unfavorable for users to query accurate contents in the process of performing full-scale tasks on electronic equipment. Therefore, in order to improve the efficiency of data synchronization and reduce the time taken for synchronizing the changed data to be synchronized, it is possible to determine only the data information that changes in the time interval (target time period) between the first time and the second time as target data information and perform subsequent synchronization. And if the first time meets at least one preset determining condition, acquiring the data information to be synchronized, which is changed in the target time period. And then, determining the changed data information to be synchronized as target data information according to each changed data information to be synchronized.
In one possible implementation manner, the determining the data information to be synchronized, which is changed within the target time period, includes:
acquiring each piece of data to be synchronized in the first time offline, and summarizing each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized;
for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information;
acquiring each piece of second data information corresponding to the second time;
and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information, and updating each piece of second data information according to each piece of first data information.
In the related art, in the process of generating a document according to each acquired data to be synchronized, the device storing the data to be synchronized is frequently read for multiple times, which may cause that the reading pressure of the device storing the data to be synchronized becomes large, may affect other services, or cause that the device storing the data to be synchronized is down. Based on this, in order to avoid the occurrence of the above-described problem, each piece of data to be synchronized saved at the first time may be acquired offline, that is, each piece of data to be synchronized saved in the device that saves the data to be synchronized may be acquired offline sequentially or randomly from the first time. In the offline acquisition process, each piece of data to be synchronized stored in the device for data to be synchronized may still be updated.
After each piece of data to be synchronized stored in the first time is obtained offline, summarizing each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized, namely obtaining a data set corresponding to each piece of identification information. And then, for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information.
For example, the identification information and each piece of data to be synchronized corresponding to the identification information are spliced.
For another example, each piece of data to be synchronized corresponding to the identification information is subjected to duplication removal processing, and then the identification information and the duplicated data to be synchronized are spliced. The step of performing deduplication processing on each piece of data to be synchronized corresponding to the identification information includes: if the fact that the M pieces of data to be synchronized corresponding to the identification information have the identical n pieces of data to be synchronized is determined, deleting n-1 pieces of data to be synchronized in the n pieces of data to be synchronized. Wherein n is an integer of 2 or more and M or less, and M is an integer of 2 or more.
After each piece of first data information is acquired based on the above embodiment, each piece of data information corresponding to the second time is acquired. And then determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information.
In one possible implementation manner, the determining the data information to be synchronized, which is changed in the target time period, according to the first data information and the second data information includes:
for the identification information contained in each piece of first data information, if the fact that the second data information containing the identification information does not exist is determined, the first data information containing the identification information is determined to be changed data information to be synchronized; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
Because the data to be synchronized stored in the second time may be modified, added, deleted, etc. in the target time period. Based on this, when determining the data information to be synchronized, which is changed in each piece of first data information, it may be determined, for the identification information contained in each piece of first data information, whether the identification information matches with the identification information contained in any piece of second data information, that is, whether there is second data information containing the identification information.
If it is determined that the second data information including the identification information does not exist, that is, the first data information is the data information newly added in the target time period, the first data information can be determined as the data information to be synchronized, wherein the data information is changed.
If it is determined that there is the second data information including the identification information, it is indicated that the first data information is not newly added data information in the target time period, but is possibly other data information of a change type, it may be continuously determined whether the second data information including the identification information is inconsistent with the first data information, so as to accurately determine whether the first data information is changed data information to be synchronized.
If the second data information containing the identification information is inconsistent with the first data information, the first data information is the data information of other change types, and the first data information is determined to be the data information to be synchronized with the change; if the second data information containing the identification information is consistent with the first data information, and the first data information is possibly unchanged in the target time period, the first data information is not the changed data information to be synchronized.
In order to facilitate the determination of the changed data information to be synchronized at the next first time when the preset determination condition is met, after the changed data information to be synchronized is determined from the first data information based on the above embodiment, the stored second data information is updated according to each first data information.
The method and the device for determining the target data information in the off-line mode not only can facilitate the electronic device to accurately acquire the target data information, but also can reduce the reading pressure of the device for storing the data to be synchronized. The follow-up process can only synchronize the determined target data information, so that the workload of the electronic equipment in the data synchronization process is reduced, the time consumed by each piece of data to be synchronized in the synchronization storage is further reduced, and the efficiency of each piece of data to be synchronized in the synchronization storage is improved.
The following describes a data processing method provided by the present disclosure by means of a specific embodiment, and fig. 4 is a schematic diagram of a specific data processing flow provided by the present disclosure, as shown in fig. 4, where the flow includes:
s401: if the change notification message is received, the target data information of the change is determined according to the data information carried in the received change notification message, and S408 is executed.
S402: and if the first time meets the preset determining condition, acquiring each piece of data to be synchronized in the first time offline.
Here, the present disclosure does not limit the execution order of S401 and S402, that is, S401 may be executed before S402, S401 may be executed after S402, and S401 and S402 may be executed simultaneously.
In one possible embodiment, if the first time is determined to be the time when the full synchronization command is received, it is determined that the first time meets a preset determination condition. The full synchronization instruction is an instruction for respectively synchronizing each data to be synchronized at the first time into a corresponding index document.
In another possible embodiment, if it is determined that the time interval between the first time and the second time reaches the preset duration, it is determined that the first time satisfies the preset determination condition.
The second time is the last time determined to meet the determination condition.
S403: and summarizing each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized acquired in the step S402.
S404: for each piece of identification information in S403, first data information is determined according to the identification information and each piece of data to be synchronized corresponding to the identification information.
S405: and acquiring each piece of second data information corresponding to the second time.
S406: and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information.
The specific determining the data information to be synchronized, which is changed in the target time period, comprises the following steps:
for the identification information contained in each piece of first data information, if the fact that the second data information containing the identification information does not exist is determined, the first data information containing the identification information is determined to be changed data information to be synchronized; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
S407: and updating each piece of second data information according to each piece of first data information.
S408: and acquiring the first version time of the changed target data information and the target identification information contained in the target data information.
S409: and determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
In order to improve accuracy in the data synchronization process, in the present disclosure, on the basis of the foregoing embodiments, the method further includes:
acquiring a target change type of the target data information;
judging whether the target change type is a deletion type or not; the deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted;
if the target change type is a deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information;
and if the target change type is not the deletion type, executing the step of determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
In order to update a document conveniently and accurately, in the present disclosure, when data to be synchronized is changed, an electronic device may also acquire a target change type of changed target data information, so as to determine a specific update manner for an index document of target identification information according to the target change type. The target change type may be delete, add, modify, etc.
Typically, the deleted index document is not subsequently written to. Therefore, when the target change type of the target data information is deleting, each data corresponding to the target identification information in the index document of the target identification information can be directly deleted, and each data to be synchronized corresponding to the target identification information is not required to be acquired from the equipment for storing the data to be synchronized, or the first version time and the second version time are compared. Based on this, in the present disclosure, if the target change type is a deletion type, it is described that each data corresponding to the target identification information included in the index document of the target identification information is to be deleted, each data corresponding to the target identification information included in the index document of the target identification information may be deleted, and the second version time of the index document of the target identification information may be updated according to the third time corresponding to the target data information.
The third time corresponding to the target data information may be determined according to a time of each piece of data to be synchronized corresponding to the target identification information included in the index document for deleting the target identification information, may be determined by a preset certain time, or may be determined according to the current time. The specific implementation process can be flexibly set according to actual requirements, and detailed description is omitted here.
In another possible implementation manner, the target change type of the target data type may not be the deletion type, and for the change which is not the deletion type, it is required to determine whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
If the target change type is not the deletion type, which means that each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information does not need to be deleted, whether to update the index document of the target identification information can be determined according to the comparison result of the first version time and the first version time of the target data information.
In one example, the determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information includes:
if the second version time of the index document without the target identification information is determined to be not longer than the first version time, updating the index document of the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
And if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, determining that the index document with the target identification information is not updated.
Since there may or may not be the case where the index document of the target identification information corresponds to the second version information, when determining whether to update the index document of the target identification information based on the comparison result of the first version time and the first version time of the target data information, it may be determined whether the second version time of the index document of the target identification information exists first, and when the second version time exists, the comparison result of the second version time and the first version time may determine whether the updated time of the index document of the target identification information is earlier than the updated time of the target data information, thereby determining whether to update the index document of the target identification information based on the target data information.
In one example, if it is determined that the second version time of the index document of the target identification information does not exist or the second version time is not later than the first version time, the updated time of the index document of the target identification information is earlier than the updated time of the target data information, the index document of the target identification information may be updated directly according to the target data information.
In order to facilitate the next update of the index document of the target identification information, the second version time of the updated index document of the target identification information may be updated according to the first version time of the target data information.
In another example, if it is determined that the second version time of the index document of the target identification information exists and the second version time is later than the first version time, it is determined that the index document of the target identification information is not updated if the updated time of the index document of the target identification information is later than the updated time of the target data information.
In one possible implementation manner, updating the index document of the target identification information according to the target change type and the target data information may be specifically implemented by the following calculation logic:
Figure BDA0003063854960000211
wherein, params.released indicates that the change type of the target data information is a deletion type, doc is an index document of the target identification information, params is the target data information, doc.dataversion is a second version time, and params.dataversion is a first version time.
By comparing whether the second version time is stored with the first version time and whether the second version time is stored with the first version time, whether the updated time of the index document of the target identification information is earlier than the updated time of the target data information is determined, whether the index document of the target identification information is updated according to the target data information is further determined, the sequence of the acquired plurality of target data information is avoided, and when the sequence of the data change is inconsistent with the sequence of the plurality of target data information, the problem that the updated index document of the target identification information is inconsistent with the data to be synchronized is solved, and the accuracy and stability of data synchronization are improved.
The following describes a data processing method provided by the present disclosure through a specific embodiment, and fig. 5 is a schematic diagram of a specific data processing flow provided by the present disclosure, as shown in fig. 5, where the flow includes:
s501: and acquiring changed target data information.
The process of specifically acquiring the changed target data information may be referred to as S401 to S407 shown in fig. 4 in the above embodiment.
S502: and acquiring the first version time of the changed target data information and the target identification information contained in the target data information.
In a possible implementation manner, if the obtained target data information, that is, the target data information, is determined by the to-be-synchronized data information that is changed in the target time period through S402 to S407 shown in fig. 4, the time when to-be-synchronized data corresponding to the target identification information included in the target data information is updated is determined; the first version time is determined based on the updated time and the first time.
In another possible implementation manner, if the acquired target data information, that is, the target data information, is determined by the received notification information through S401 shown in fig. 4, the time when the to-be-synchronized data corresponding to the target identification information included in the target data information is updated is determined; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
S503: and acquiring the target change type of the target data information.
S504: and judging whether the target change type is a deletion type, if so, executing S505, otherwise, executing S506.
The deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted.
S505: and deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information.
S506: and determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
Specifically, according to a comparison result of the second version time and the first version time of the index document of the target identification information, determining whether to update the index document of the target identification information includes:
if the second version time of the index document without the target identification information is determined to be not later than the first version time, updating the index document with the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
If the second version time of the index document with the target identification information is determined to exist and is later than the first version time, the index document with the target identification information is determined not to be updated.
In order to improve accuracy and stability of data synchronization, in the present disclosure, the first version time may be obtained by:
if the target data information is determined by the data information to be synchronized, which is changed in the target time period, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; determining the first version time according to the updated time and the first time;
if the target data information is determined through the received notification message, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
As a possible implementation manner, the latest updated time among the times of updating the data to be synchronized corresponding to the target identification information included in the target data information may be determined as the first version time.
As another possible implementation, since all data to be synchronized at the first time is acquired offline, each data to be synchronized is offline from the first time. In the process of taking all the data to be synchronized offline, there may still be an operation of changing the data to be synchronized, so that in the data to be synchronized obtained offline, some updated time corresponding to the data to be synchronized may be after the first time, or may be before the first time, or even the data to be synchronized may not have the corresponding updated time. Therefore, when determining the first version time of the target data information, if the target data information is determined by the data information to be synchronized, which is changed in the target time period, the latest updated time of the updated times of the data to be synchronized corresponding to the target identification information included in the target data information may be determined first. The latest updated time is then compared with the time at which the offline is started (i.e., the first time), and a first version time of the target data information is determined based on the comparison result. If the data to be synchronized corresponding to the target identification information included in the target data information does not record the updated time, the updated time may be determined as any time not greater than the first time. For example, the first time may be 0 or the like.
In one example, the latest updated time and the maximum of the first times may be determined as the first version time. Specifically, the method can be determined by the following calculation logic:
dataVersion=max(max(related entities updateTime),dumpTaskStartTime)
where datacversion is the first version time, related entities updateTime is the latest updated time, dumpTaskStartTime is the first time.
For example, if the latest updated time is greater than the first time, indicating that the latest updated time is later than the first time, the latest updated time is determined to be the first version time.
For another example, if the latest updated time is not greater than the first time, indicating that the latest updated time is not later than the first time, the first time is determined to be the first version time.
As still another possible embodiment, in order to avoid a reason such as a network state, the order of sending the target data information is different from the order of receiving the target data information, so that updating of the index document is affected, when the target data information is determined by the received notification message, the first version time may be determined according to the time when the data to be synchronized corresponding to the target identification information included in the target data information is updated, and the time when the notification message is generated (referred to as the fourth time).
When determining the first version time of the target data information, if the target data information is determined by the data information to be synchronized, which is changed in the target time period, the latest updated time in the time to be synchronized corresponding to the target identification information contained in the target data information can be determined first. The latest updated time is then compared with the fourth time, and a first version time of the target data information is determined based on the comparison result. If the data to be synchronized corresponding to the target identification information included in the target data information does not record the updated time, the updated time may be determined as any time not greater than the fourth time. For example, the fourth time may be 0 or the like.
In one example, the latest updated time and the maximum in the third interval may be determined as the first version time. Specifically, the method can be determined by the following calculation logic:
dataVersion=max(max(related entities updateTime),currTime)
where datacversion is the first version time, related entities updateTime is the latest updated time, currTime is the fourth time.
For example, if the latest updated time is greater than the fourth time, indicating that the latest updated time is later than the fourth time, the latest updated time is determined to be the first version time.
For another example, if the latest updated time is not greater than the fourth time, indicating that the latest updated time is not later than the fourth time, the fourth time is determined to be the first version time.
By determining the first version time through the method in the embodiment, the time of latest updated target data information can be accurately represented by the first version time, further, the method is beneficial to determining whether the updated time of the index document of the target identification information is earlier than the updated time of the target data information through the comparison of the second version time and the first version time, further determining whether the index document of the target identification information is updated according to the target data information, the problem that the index document of the updated target identification information is inconsistent with data to be synchronized when the acquired sequence of the plurality of target data information and the sequence of data change among the plurality of target data information are inconsistent is avoided, and the accuracy and stability of data synchronization are improved.
The following describes a data processing method provided by the present disclosure through a specific embodiment, and fig. 6 is a schematic view of a scenario of a specific data processing flow provided by an embodiment of the present disclosure. As shown in fig. 6, when it is determined that the first time satisfies a preset determination condition, data to be synchronized stored in the device for storing data to be synchronized is acquired. The data to be synchronized is stored in the form of a source table in a database in the device.
For example, if it is determined that the time interval between the first time and the second time reaches the preset duration, dump is used for storing each data to be synchronized in the source table stored in the device for storing data to be synchronized. The second time is the last time determined to meet the determination condition.
And summarizing each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized. For example, join is performed on each data to be synchronized contained in the dump-to-source table.
And determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information aiming at each piece of identification information. Wherein the data format of the first data information may be the same as the data format of the index document.
For example, the first data information may be determined by the spark big data platform according to the identification information and each piece of data to be synchronized corresponding to the identification information.
And acquiring each piece of second data information corresponding to the second time.
According to each piece of first data information and each piece of second data information, determining the data information to be synchronized, which is changed in the target time period, and updating each piece of second data information according to each piece of first data information. The target time period is a time interval between the first time and the second time, and any changed data information to be synchronized is determined as target data information.
For any determined target data information, the following steps are executed:
the method includes the steps of obtaining first version time of changed target data information, obtaining target change type of the target data information and target identification information contained in the target data information.
The time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated can be determined; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
And judging whether the target change type is a deletion type or not. The deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted.
If the target change type is the deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information.
If the target change type is not the deletion type, judging whether the second version time of the index document of the target identification information exists or not and whether the second version time is not later than the first version time or not.
If the second version time of the index document without the target identification information is determined to be not later than the first version time, updating the index document with the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, the index document with the target identification information is determined not to be updated.
The following describes a data processing method provided by the present disclosure through a specific embodiment, and fig. 7 is a schematic view of a scenario of a specific data processing flow provided by the embodiment of the present disclosure. As shown in fig. 7, the change time of the data to be synchronized stored in the device for storing data to be synchronized may be monitored in real time. For example, the data to be synchronized stored in the device for storing data to be synchronized. The data to be synchronized is stored in a database in the device in the form of a source table, and the electronic device monitors the change time of the database binlog.
When a change notification message is received, the change notification message is analyzed, the first version time of changed target data information carried in the change notification message is acquired, the target change type of the target data information is acquired, and target identification information contained in the target data information is acquired.
Wherein, the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated can be determined. The first version time is determined based on the updated time and the first time.
And judging whether the target change type is a deletion type or not. The deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted.
If the target change type is the deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information.
If the target change type is not the deletion type, judging whether the second version time of the index document of the target identification information exists or not and whether the second version time is not later than the first version time or not.
If the second version time of the index document without the target identification information is determined to be not later than the first version time, updating the index document with the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
If the second version time of the index document with the target identification information is determined to exist and is later than the first version time, the index document with the target identification information is determined not to be updated.
In the method, the first version time is corresponding to the changed target data information, the first version time is used for identifying the updated time of the target data information, the second version time is also corresponding to the updated time of the index document of the target identification information, the second version time is used for identifying the updated time of the index document of the target identification information, and the comparison result of the second version time and the first version time of the index document of the target identification information can be considered when the index document of the target identification information is updated according to the target data information, so that the problem that the updated data in the index document of the target identification information is replaced by the target data information when the updated time of the target data information is earlier than the updated time of the index document of the target identification information is effectively avoided, and the accuracy and stability of data synchronization are effectively improved.
Fig. 8 is a schematic diagram of a data processing procedure in a specific music database provided in the present disclosure, where the procedure includes:
in the apparatus for storing data to be synchronized, a list line width table, a song width table, and an authorization book width table are stored, each of which stores data to be synchronized.
If the time interval between the first time and the second time is determined to reach the preset duration, dump is used for storing the source table stored in the device for storing the data to be synchronized.
And summarizing each piece of data to be synchronized according to the identification information respectively corresponding to each piece of data to be synchronized stored in the list line width table and the authorization document contract width table to obtain a list-authorization document aggregation table.
And then summarizing each piece of data to be synchronized according to the identification information respectively corresponding to each piece of data to be synchronized stored in the list-authorization document aggregation list and the song wide list to obtain the song list wide list.
And for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information, and converting the data format of the data information into doc.
And acquiring each piece of second data information corresponding to the second time.
And determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information. The target time period is a time interval between the first time and the second time, and any changed data information to be synchronized is determined as target data information.
For any determined target data information, the following data synchronization steps are performed:
the method includes the steps of obtaining first version time of changed target data information, obtaining target change type of the target data information and target identification information contained in the target data information.
The time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated can be determined; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
And judging whether the target change type is a deletion type or not. The deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted.
If the target change type is the deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information.
If the target change type is not the deletion type, judging whether the second version time of the index document of the target identification information exists or not and whether the second version time is not later than the first version time or not.
If the second version time of the index document without the target identification information is determined to be not later than the first version time, updating the index document with the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, the index document with the target identification information is determined not to be updated.
After each piece of target data information is determined to be synchronized into the corresponding index document, the synchronization result can be notified to related staff through a preset communication mode.
The disclosure further provides a data processing apparatus, and fig. 9 is a schematic structural diagram of a data processing apparatus provided in an embodiment of the disclosure, where the apparatus includes:
a processing unit 91, configured to obtain a first version time of changed target data information and target identification information included in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
An updating unit 92, configured to determine whether to update the index document of the target identification information according to a comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used to identify a time when the index document of the target identification information is updated.
Since the principle of the data processing apparatus for solving the problem is similar to that of the data processing method, the implementation of the data processing apparatus can refer to the implementation of the method, and the repetition is omitted.
In a possible implementation manner, the processing unit 91 is specifically configured to determine the changed target data information according to the data information carried in the received change notification message.
In a possible implementation manner, the processing unit 91 is specifically configured to determine the data information to be synchronized that is changed in the target time period if it is determined that the first time meets a preset determination condition; determining any changed data information to be synchronized as the target data information; the target time period is a time interval between the first time and a second time; the second time is the last time determined to meet the determined condition.
In a possible implementation manner, the processing unit 91 is specifically configured to determine that the first time meets a preset determination condition if it is determined that the first time is the time when the full synchronization instruction is received; the full synchronization instruction is an instruction for respectively synchronizing each data to be synchronized at the first time into a corresponding index document; or if the time interval between the first time and the second time reaches the preset duration, determining that the first time meets the preset determination condition.
In a possible implementation manner, the processing unit 91 is specifically configured to acquire each piece of data to be synchronized in the first time offline, and aggregate each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized; for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information; acquiring each piece of second data information corresponding to the second time; and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information, and updating each piece of second data information according to each piece of first data information.
In a possible implementation manner, the processing unit 91 is specifically configured to determine, for each piece of first data information, the first data information including the identification information as the data information to be synchronized that is changed if it is determined that there is no second data information including the identification information; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
In a possible implementation manner, the obtaining unit is further configured to obtain a target change type of the target data information;
the updating unit 92 is further configured to determine whether the target change type is a deletion type; the deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted; if the target change type is a deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information; and if the target change type is not the deletion type, executing the step of determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
In a possible implementation manner, the updating unit 92 is specifically configured to update the index document of the target identification information according to the target data information if it is determined that the second version time of the index document of the target identification information does not exist, or the second version time is not later than the first version time; updating the second version time of the index document of the updated target identification information according to the first version time; and if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, determining that the index document with the target identification information is not updated.
In a possible implementation manner, the updating unit 92 is specifically configured to obtain the first version time by:
if the target data information is determined by the changed data information to be synchronized, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; determining the first version time according to the updated time and the first time;
If the target data information is determined through the received notification message, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
In the method, the first version time is corresponding to the changed target data information, the first version time is used for identifying the updated time of the target data information, the second version time is also corresponding to the updated time of the index document of the target identification information, the second version time is used for identifying the updated time of the index document of the target identification information, and the comparison result of the second version time and the first version time of the index document of the target identification information can be considered when the index document of the target identification information is updated according to the target data information, so that the problem that the updated data in the index document of the target identification information is replaced by the target data information when the updated time of the target data information is earlier than the updated time of the index document of the target identification information is effectively avoided, and the accuracy and stability of data synchronization are effectively improved.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, and on the basis of the foregoing embodiments, the embodiment of the present disclosure further provides an electronic device, as shown in fig. 10, including: processor 1001, communication interface 1002, memory 1003 and communication bus 1004, wherein processor 1001, communication interface 1002, memory 1003 accomplish the mutual communication through communication bus 1004;
the memory 1003 stores a computer program which, when executed by the processor 1001, causes the processor 1001 to perform the steps of:
acquiring first version time of changed target data information and target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used to identify a time when the index document of the target identification information is updated.
Since the principle of the electronic device for solving the problem is similar to that of the data processing method, the implementation of the electronic device can refer to the implementation of the method, and the repetition is omitted.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface 1002 is used for communication between the above-described electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
On the basis of the above embodiments, the present disclosure further provides a computer readable storage medium having stored therein a computer program executable by a processor, which when run on the processor, causes the processor to perform the steps of:
acquiring first version time of changed target data information and target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used to identify a time when the index document of the target identification information is updated.
Since the principle of solving the problem by the computer readable storage medium is similar to that of the data processing method, the specific implementation can refer to the implementation of the data processing method, and the repetition is omitted.
It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present disclosure without departing from the spirit or scope of the disclosure. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (16)

1. A method of data processing, the method comprising:
acquiring first version time of changed target data information and target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used for identifying the time when the index document of the target identification information is updated;
The obtaining the changed target data information comprises the following steps:
if the first time meets the preset determining condition, determining the data information to be synchronized, which is changed in the target time period; determining any changed data information to be synchronized as the target data information; the target time period is a time interval between the first time and a second time; the second time is the last time determined to meet the determined condition;
the determining the data information to be synchronized, which is changed in the target time period, comprises the following steps:
acquiring each piece of data to be synchronized in the first time offline, and summarizing each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized;
for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information;
acquiring each piece of second data information corresponding to the second time;
and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information, and updating each piece of second data information according to each piece of first data information.
2. The method of claim 1, wherein the obtaining the changed target data information comprises:
and determining the changed target data information according to the data information carried in the received change notification message.
3. The method of claim 1, wherein determining that the first time satisfies a preset determination condition comprises:
if the first time is determined to be the time when the full-quantity synchronous instruction is received, determining that the first time meets a preset determination condition; the full synchronization instruction is an instruction for respectively synchronizing each data to be synchronized at the first time into a corresponding index document; or (b)
And if the time interval between the first time and the second time reaches the preset duration, determining that the first time meets the preset determining condition.
4. The method according to claim 1, wherein the determining the data information to be synchronized, which is changed in the target period, according to the each piece of first data information and the each piece of second data information, includes:
for the identification information contained in each piece of first data information, if the fact that the second data information containing the identification information does not exist is determined, the first data information containing the identification information is determined to be changed data information to be synchronized; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
5. The method according to any one of claims 1-4, further comprising:
acquiring a target change type of the target data information;
judging whether the target change type is a deletion type or not; the deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted;
if the target change type is a deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information;
and if the target change type is not the deletion type, executing the step of determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
6. The method of claim 5, wherein the determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information comprises:
If the second version time of the index document without the target identification information is determined to be not longer than the first version time, updating the index document of the target identification information according to the target data information; updating the second version time of the index document of the updated target identification information according to the first version time;
and if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, determining that the index document with the target identification information is not updated.
7. The method of claim 5, wherein the first version time is obtained by:
if the target data information is determined by the data information to be synchronized, which is changed in the target time period, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; determining the first version time according to the updated time and the first time;
if the target data information is determined through the received notification message, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
8. A data processing apparatus, the apparatus comprising:
the processing unit is used for acquiring the first version time of the changed target data information and the target identification information contained in the target data information; wherein the first version time is used to identify a time when the target data information is updated;
an updating unit, configured to determine whether to update the index document of the target identification information according to a comparison result of the second version time and the first version time of the index document of the target identification information; the second version time is used for identifying the time when the index document of the target identification information is updated;
the processing unit is specifically configured to determine to-be-synchronized data information that is changed in a target time period if it is determined that the first time meets a preset determination condition; determining any changed data information to be synchronized as the target data information; the target time period is a time interval between the first time and a second time; the second time is the last time determined to meet the determined condition;
the processing unit is specifically configured to acquire each piece of data to be synchronized in the first time offline, and aggregate each piece of data to be synchronized according to the identification information corresponding to each piece of data to be synchronized; for each piece of identification information, determining first data information according to the identification information and each piece of data to be synchronized corresponding to the identification information; acquiring each piece of second data information corresponding to the second time; and determining the data information to be synchronized, which is changed in the target time period, according to each piece of first data information and each piece of second data information, and updating each piece of second data information according to each piece of first data information.
9. The apparatus according to claim 8, wherein the processing unit is specifically configured to determine the changed target data information according to the data information carried in the received change notification message.
10. The apparatus according to claim 8, wherein the processing unit is specifically configured to determine that the first time meets a preset determination condition if the first time is determined to be a time when a full synchronization instruction is received; the full synchronization instruction is an instruction for respectively synchronizing each data to be synchronized at the first time into a corresponding index document; or if the time interval between the first time and the second time reaches the preset duration, determining that the first time meets the preset determination condition.
11. The apparatus according to claim 8, wherein the processing unit is specifically configured to determine, for each piece of first data information, the first data information including the identification information as the data information to be synchronized that is changed if it is determined that there is no second data information including the identification information; and if the second data information containing the identification information is determined to exist and is inconsistent with the first data information containing the identification information, determining the first data information containing the identification information as the data information to be synchronized, wherein the data information to be synchronized is changed.
12. The apparatus according to any one of claims 8-11, wherein the obtaining unit is further configured to obtain a target change type of the target data information;
the updating unit is further used for judging whether the target change type is a deletion type; the deletion type indicates that each piece of data to be synchronized corresponding to the target identification information contained in the target data information is deleted; if the target change type is a deletion type, deleting each piece of data to be synchronized corresponding to the target identification information contained in the index document of the target identification information directly, and updating the second version time of the index document of the target identification information according to the third time corresponding to the target data information; and if the target change type is not the deletion type, executing the step of determining whether to update the index document of the target identification information according to the comparison result of the second version time and the first version time of the index document of the target identification information.
13. The apparatus according to claim 12, wherein the updating unit is specifically configured to update the index document of the target identification information according to the target data information if it is determined that there is no second version time of the index document of the target identification information or the second version time is not later than the first version time; updating the second version time of the index document of the updated target identification information according to the first version time; and if the second version time of the index document with the target identification information is determined to exist and is later than the first version time, determining that the index document with the target identification information is not updated.
14. The apparatus according to claim 12, wherein the updating unit is configured to obtain the first version time by:
if the target data information is determined by the changed data information to be synchronized, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; determining the first version time according to the updated time and the first time;
if the target data information is determined through the received notification message, determining the time when the data to be synchronized corresponding to the target identification information contained in the target data information is updated; and determining the first version time according to the updated time and the fourth time corresponding to the target data information.
15. An electronic device comprising at least a processor and a memory, the processor being adapted to implement the steps of the data processing method according to any of claims 1-7 when executing a computer program stored in the memory.
16. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the data processing method according to any of claims 1-7.
CN202110520749.9A 2021-05-13 2021-05-13 Data processing method, device, equipment and medium Active CN113157716B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110520749.9A CN113157716B (en) 2021-05-13 2021-05-13 Data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110520749.9A CN113157716B (en) 2021-05-13 2021-05-13 Data processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113157716A CN113157716A (en) 2021-07-23
CN113157716B true CN113157716B (en) 2023-05-26

Family

ID=76874761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110520749.9A Active CN113157716B (en) 2021-05-13 2021-05-13 Data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113157716B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372064B (en) * 2022-03-22 2022-07-12 飞狐信息技术(天津)有限公司 Data processing apparatus, method, computer readable medium and processor

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256715A (en) * 2020-11-12 2021-01-22 微医云(杭州)控股有限公司 Index updating method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009064120A (en) * 2007-09-05 2009-03-26 Hitachi Ltd Search system
CN106254094B (en) * 2016-07-19 2019-08-13 中国银联股份有限公司 A kind of method of data synchronization and system
CN107315825B (en) * 2017-07-05 2020-02-28 北京奇艺世纪科技有限公司 Index updating system, method and device
CN111324660B (en) * 2018-12-13 2024-05-24 杭州海康威视系统技术有限公司 Data synchronization method, device, electronic equipment and machine-readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256715A (en) * 2020-11-12 2021-01-22 微医云(杭州)控股有限公司 Index updating method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113157716A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
US12007846B2 (en) Manifest-based snapshots in distributed computing environments
US11604782B2 (en) Systems and methods for scheduling concurrent summarization of indexed data
US10803016B2 (en) Predictive models of file access patterns by application and file type
KR102423125B1 (en) Database syncing
US10262032B2 (en) Cache based efficient access scheduling for super scaled stream processing systems
US10489378B2 (en) Detection and resolution of conflicts in data synchronization
CN105843895B (en) Data query and synchronous method based on Ehcache, apparatus and system
US20160034555A1 (en) Search result replication in a search head cluster
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
US20170255663A1 (en) Propagation of data changes in a distributed system
US10614087B2 (en) Data analytics on distributed databases
US20210373914A1 (en) Batch to stream processing in a feature management platform
US20170228409A1 (en) In-memory journaling
WO2015135370A1 (en) Data update method and system
CN113157716B (en) Data processing method, device, equipment and medium
US9390131B1 (en) Executing queries subject to different consistency requirements
US11023449B2 (en) Method and system to search logs that contain a massive number of entries
US11210212B2 (en) Conflict resolution and garbage collection in distributed databases
US9286055B1 (en) System, method, and computer program for aggregating fragments of data objects from a plurality of devices
CN113918648A (en) Data synchronization method and device, electronic equipment and storage medium
US11775864B2 (en) Feature management platform
CN111971667B (en) Recoverable merge ordering
CN110955460A (en) Service process starting method and device, electronic equipment and storage medium
US20220012212A1 (en) Speedup containers in production by ignoring sync to file system
WO2023177983A1 (en) Metadata-driven feature store for machine learning systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant