CN113918631A

CN113918631A - Data writing method and device, computer equipment and storage medium

Info

Publication number: CN113918631A
Application number: CN202111105250.8A
Authority: CN
Inventors: 陈晓欣; 郭小龙; 孙迁; 李成
Original assignee: Nanjing Suning Software Technology Co ltd
Current assignee: Nanjing Suning Software Technology Co ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2022-01-11

Abstract

The application relates to a data writing method, a data writing device, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a current data set of a current real-time batch, wherein the current data set comprises at least one current data record, conducting business logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record, writing the current data record into a target business database according to the current data mark, operating the current data record according to the current data mark when a write-in switch of a third-party storage engine is in an open state to obtain a new current data set, submitting the new current data set to a thread pool, and writing the new current data set into the third-party storage engine. By adopting the method, the third-party storage engine is introduced, and the business data is input into the third-party storage engine in real time in each batch, so that the data delay in the ingestion process is effectively reduced, and an IDE scheduling platform is not required to be relied on.

Description

Data writing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data writing method and apparatus, a computer device, and a storage medium.

Background

In the internet era of high-speed information development and extremely fast data expansion, continuous expansion of enterprise business generates a large amount of business data. How to extract information useful for enterprise analysis and decision from the mass data becomes a first problem for enterprise decision managers, and the enterprise emphasizes the accuracy and timeliness of decision due to the fact that market competition is intensified day by day. Thus, olap (online analytical processing) has emerged and rises rapidly, and it has a main purpose to support decision management analysis, providing analysts with efficient, rapid, and accurate decision information.

The traditional olap real-time non-time sequence data is written into a PostGresql (PG) database in batches due to the fact that the olap real-time non-time sequence data has no main time dimension and needs to be frequently inserted, updated and deleted according to business logic. And the export is to generate a corresponding task by an external IDE (Integrated Development Environment) scheduling platform, and the full-scale synchronous PG library writes a partial file every 10 time-sharing of the task. The problem of more than one hour of data delay exists, and the task scheduling consumes the computing resources of the platform and is limited by the concurrence of the IDE; and when the amount of data is very large, the catastrophic result of the PG library being too stressful to be used is also produced.

Disclosure of Invention

Therefore, it is necessary to provide a data writing method, device, computer device and storage medium for the above technical problems, and introduce a third-party storage engine into which service data is put in real time in batches, so as to effectively reduce data delay in the ingestion process, and eliminate the influence of platform concurrency limitation without depending on an IDE scheduling platform, and separate data export and data query without affecting the use of the PG database of the existing service.

A method of writing data, the method comprising:

acquiring a current data set of a current real-time batch, wherein the current data set comprises at least one current data record;

performing service logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record;

writing the current data record into a target service database according to the current data mark, and operating the current data record according to the current data mark when a write-in switch of a third-party storage engine is in an open state to obtain a new current data set;

and submitting the new current data set to a thread pool and writing the new current data set into a third-party storage engine.

In one embodiment, the method further comprises the following steps: receiving a current query request through a query interface provided by a third-party storage engine, packaging the current query request into a current query task, splicing according to a current model name of the current query task to obtain a current storage engine data file writing path, generating a current temporary table name according to the current model name, the third-party storage engine name and a current timestamp, replacing the current model name in the current storage engine data file writing path with the current temporary table name, generating a new current storage engine data file writing path, and obtaining a target query result corresponding to the current query request according to the new current storage engine data file writing path.

In one embodiment, the method further comprises the following steps: the method comprises the steps of receiving a current export request through an export interface provided by a third-party storage engine, obtaining a default distributed file system name according to the current export request, obtaining a current model system name and a current model name, generating a current storage engine data export path according to the distributed file system name, the current model system name and the current model name, and exporting target export data corresponding to the current export request according to the current storage engine data export path when the current storage engine data export path exists in the third-party storage engine.

In one embodiment, obtaining the current dataset of the current real-time batch comprises: and acquiring a non-time sequence data set in a preset time period, and analyzing and widening the non-time sequence data set to obtain a current data set of the current real-time batch.

In one embodiment, the writing of the current data record into the target service database according to the current data mark includes: determining whether the same main key record exists in a target business database according to a current data main key, determining whether a current data record operation field is a deleted field when the same main key record exists in the target business database, acquiring a first data record version number corresponding to the same main key record in the target business database when the current data record operation field is a non-deleted field, determining that a current data mark corresponding to the current data record is a current data updating mark when the current data record version number is larger than the first data record version number, and replacing the current data record with the data record of the same main key record in the target business database according to the current data updating mark.

In one embodiment, the method further comprises the following steps: when the same primary key record does not exist in the target service database, determining that the current data mark corresponding to the current data record is a current data adding mark, adding the current data record into the target service database according to the current data adding mark, determining that the current data mark corresponding to the current data record is a current data deleting mark when the current data record operation field is a deleting field, and deleting the current data record from the target service database according to the current data deleting mark.

In one embodiment, the current data record is a plurality of current data records, and the current data record is operated according to the current data mark to obtain a new current data set, including: and obtaining a first current data set according to the current data record corresponding to the current data mark as a current data updating mark or a current data adding mark, obtaining a second current data set according to the current data record corresponding to the current data mark as a current data deleting mark, and determining the first current data set and the current data set as new current data sets.

A data writing apparatus, the apparatus comprising:

the acquisition module is used for acquiring a current data set of a current real-time batch, wherein the current data set comprises at least one current data record;

the judging module is used for carrying out service logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record;

the first writing module is used for writing the current data record into the target business database according to the current data mark, and when a writing switch of the third-party storage engine is in an open state, operating the current data record according to the current data mark to obtain a new current data set;

and the second writing module is used for submitting the new current data set to the thread pool and writing the new current data set into the third-party storage engine.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The data writing method, the data writing device, the computer equipment and the storage medium acquire a current data set of a current real-time batch, wherein the current data set comprises at least one current data record, business logic judgment is carried out on the current data record in the current data set to obtain a current data mark corresponding to the current data record, the current data record is written into a target business database according to the current data mark, when a write-in switch of a third-party storage engine is in an open state, the current data record is operated according to the current data mark to obtain a new current data set, and the new current data set is submitted to a thread pool and written into the third-party storage engine.

The method introduces a third-party storage engine, business data are input into the third-party storage engine in real time in batches, data delay in the ingestion process is effectively reduced, an IDE scheduling platform is not needed, the influence of platform concurrency limitation is eliminated, data export and data query are separated, and the use of a PG database of the existing business is not influenced at all. And the problems that the PG database is required to be fully synchronized by timing every hour due to the traditional real-time non-time sequence derivation, the data volume is huge, the operation is frequent, the pressure of the PG database is high, and the PG query is seriously influenced can be solved.

Drawings

FIG. 1 is a diagram of an exemplary data writing method;

FIG. 2 is a flow chart illustrating a data writing method according to an embodiment;

FIG. 3 is a flow chart illustrating a data writing method according to an embodiment;

FIG. 4 is a flow chart illustrating a data writing method according to an embodiment;

FIG. 5 is a flowchart illustrating a target traffic database writing step in one embodiment;

FIG. 6 is a flow diagram that illustrates the steps of the current data set operation in one embodiment;

FIG. 7 is a flow diagram illustrating the writing of a pg library and a third-party storage engine in one embodiment;

FIG. 8 is a block diagram showing the structure of a data writing apparatus according to one embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The data writing method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

Specifically, the terminal 102 obtains a current data set of the current real-time batch, where the current data set includes at least one current data record, and sends the current data set to the server 104 through the network. The server 104 performs service logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record, writes the current data record into the target service database according to the current data mark, operates the current data record according to the current data mark when a write-in switch of the third-party storage engine is in an on state to obtain a new current data set, submits the new current data set to the thread pool, and writes the new current data set into the third-party storage engine.

In one embodiment, as shown in fig. 2, a data writing method is provided, which is described by taking the application of the method to the server in fig. 1 as an example, and includes the following steps:

step 202, a current data set of the current real-time batch is obtained, wherein the current data set comprises at least one current data record.

The current data set of the current real-time batch is a data record set corresponding to the currently processed batch, the current data set comprises at least one current data record, and the data record is a multi-field record, such as name, age, gender, home address and the like.

In one embodiment, obtaining the current dataset for the current real-time batch comprises: and acquiring a non-time sequence data set in a preset time period, and analyzing and widening the non-time sequence data set to obtain a current data set of the current real-time batch.

The current real-time batch may be a batch within a preset time period, for example, a batch is data every 15 seconds, and the preset time period may be set in advance according to an actual service requirement, a product requirement, or an actual application scenario. Specifically, a non-time-series data set within a preset time period is obtained, where the non-time-series data set is a data record set that is not recorded in chronological order by the same unified index. And analyzing and widening the non-time sequence data set to generate a current data set corresponding to the current real-time batch. Wherein the non-time sequential data set may be a collection of non-time sequential consumption data records.

And 204, performing service logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record.

The service logic judgment is to judge some relevant service rules, and may respectively perform service logic judgment on all current data records in the current data set to obtain a current data mark corresponding to each current data record. The current data flag refers to an operation flag of the current data record, and the current data flag includes, but is not limited to, a current data update flag, a current data addition flag, a current data deletion flag, and the like. Different current data marks correspond to different current data recording operations, and the corresponding current data records can be operated through the current data marks.

And step 206, writing the current data record into the target service database according to the current data mark, and operating the current data record according to the current data mark when a write-in switch of the third-party storage engine is in an open state to obtain a new current data set.

And step 208, submitting the new current data set to a thread pool, and writing the new current data set into a third-party storage engine.

The target business database is a postgresql (pg) database, and the third-party storage engine is Apache Hudi (Hadoop updates and accommodals), which may also be called Hudi data lake.

Specifically, after the current data mark corresponding to the current data record is obtained, the current data record may be written into the target service database, that is, into the pg database, according to the current data mark. And meanwhile, acquiring the state of a write-in switch of the third-party storage engine, and operating the current data record according to the current data mark to generate a new current data set when the state of the write-in switch is an open state.

Eventually, this new current data set needs to be committed to the thread pool and written to the third party storage engine. Wherein the thread pool includes a plurality of threads by which a new current data set may be written to the hudi data lake.

After each batch of data is written into the pg database, the batch of data can be synchronously written into a third-party storage engine, namely a hudi data lake, so that the consistency of the data is ensured, the data writing is near real-time, basically has no delay, and the dependence on an external platform can be eliminated. And the problems that the PG database is required to be fully synchronized by timing every hour in the traditional real-time non-time sequence derivation, the data volume is huge, the operation is frequent, the pressure of the PG database is high, and the PG query is seriously influenced are solved.

The data writing method includes the steps of obtaining a current data set of a current real-time batch, wherein the current data set comprises at least one current data record, conducting business logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record, writing the current data record into a target business database according to the current data mark, operating the current data record according to the current data mark when a write-in switch of a third-party storage engine is in an open state to obtain a new current data set, submitting the new current data set to a thread pool, and writing the new current data set into the third-party storage engine.

In one embodiment, as shown in fig. 3, further comprising:

step 302, receiving a current query request through a query interface provided by a third-party storage engine, and encapsulating the current query request into a current query task.

And 304, splicing according to the current model name of the current query task to obtain a current storage engine data file writing path.

After the current data set is written into the third-party storage engine, in order to verify whether the current data set is correctly written according to the business logic, a query interface of the third-party storage engine can be provided, and whether the current data set is correctly written according to the business logic is verified through the query interface.

Specifically, the current query request is received through a query interface provided by the third-party storage engine, and is encapsulated into a current query task, for example, the current query request is encapsulated into a queryhudatsk (modelName: string, querySql: string), and then a worker (small application) is selected to execute the task.

And further, acquiring the current model name of the current query task, and splicing the current storage engine data file write-in path according to the current model name. For example, the Worker splices a specific engine data file writing path according to a model name modelName in the queryhudotask.

Step 306, generating a current temporary table name according to the current model name, the third-party storage engine name and the current timestamp.

And 308, replacing the current model name in the current storage engine data file writing path with the current temporary table name to generate a new current storage engine data file writing path.

And 310, obtaining a target query result corresponding to the current query request according to the new current storage engine data file write-in path.

Specifically, the current temporary table name may be generated from the current model name, the third-party storage engine name, and the current timestamp, i.e., a unique temporary table name is generated: model name engine name timestamp. And reading the writing path of the engine file, registering the temporary table, replacing the current model name in the writing path of the current storage engine data file with the name of the current temporary table, and generating a new writing path of the current storage engine data file. For example, the current temporary table name is replaced with the model name modelName in the reference querySql (query statement), generating a new querySql.

And finally, obtaining a target query result corresponding to the current query request according to the current storage engine data file write-in path, and verifying whether the current data set is written into the hudi data lake according to correct service logic. For example, the third-party storage engine executes a new querySql statement and returns the query result to the query end.

In one embodiment, as shown in fig. 4, further comprising:

step 402, receiving a current export request through an export interface provided by a third-party storage engine, and obtaining a default distributed file system name according to the current export request.

And step 404, acquiring a current model system name and a current model name, and generating a current storage engine data export path according to the distributed file system name, the current model system name and the current model name.

And 406, when the third-party storage engine has the current storage engine data export path, exporting target export data corresponding to the current export request according to the current storage engine data export path.

For data export in the third-party storage engine, only one export interface needs to be provided externally, and an engine file write-in path corresponding to the model is returned. Specifically, the current export request is received through an export interface provided by the third-party storage engine, and the default distributed file system name is obtained according to the current export request. For example, a default hdfs (distributed file system) file system name is obtained, such as hdfs:// routerprd, and the specific obtaining manner is as follows: spark contrast, hadoopconfiguration, getraw ("fs.defaultfs");

and further, acquiring a current model system name and a current model name, and splicing the distributed file system name, the current model system name and the current model name to obtain a current storage engine data export path. For example, according to the interface: combining a model system name systemId and a model name modelName, and combining an engine file path generation rule to spell out a specific file writing path: the method comprises the following steps of/user/bigquery/hudi/systemId _ modeName, splicing the hdfs file system name and the engine file writing path, and generating an absolute path (namely the current storage engine data export path) returning to the outside: hdfs:// routerprd/user/bigquery/hudi/systemId _ modeName.

Finally, the current storage engine data export path is spelled according to a specific rule, so that the external part needs to judge whether the path really exists or not after acquiring the path, and if the path really exists, the path can be directly read to realize data export.

The export and the query are separated, the use of the PG database is not influenced at all, and the export or the query of the data can be carried out only by corresponding paths.

In an embodiment, as shown in fig. 5, the writing of the current data record into the target service database according to the current data mark includes:

step 502, determining whether the same primary key record exists in the target service database according to the current data primary key.

Step 504, when the same primary key record exists in the target service database, determining whether the current data record operation field is a deletion field.

Step 506, when the current data record operation field is a non-deleted field, acquiring a first data record version number corresponding to the same primary key record in the target service database.

The current data record comprises a current data record main key, a current data record operation field and a current data record version number, the current data record main key is used for uniquely identifying the current data record, and different current data records correspond to different current data record main keys. The current data record operation field is a field associated with a current data record operation, the current data record version number is the version number of the current data record, and each data record will have a version number. The version number is generally identified according to time, and the newer the time is, the newer the data is, the data that the user most thinks of is.

Specifically, whether the data records of the same current data main key exist in a target service database (pg database) is checked, the current data main key is a concept of the database, the data corresponding to the unique identifier is the unique record of the database, and if the data of the same main key is inserted, the database is rejected. And if the data records of the same primary key exist in the target business database, checking whether the current data record operation field in the target business database is a deleted field.

And if the current data record operation field in the target service database is a non-deleted field, acquiring a first data record version number corresponding to a data record of a primary key which is the same as the primary key of the current data record in the target service database. That is, the data record version number corresponding to the same primary key record in the target service database is obtained.

Step 508, when the version number of the current data record is greater than the version number of the first data record, determining that the current data mark corresponding to the current data record is the current data update mark.

And 510, replacing the data record of the same main key record in the target service database with the current data record according to the current data updating mark.

Specifically, it is determined whether the current data record version number of the current data record is greater than the first data record version number, and if the current data record version number is greater than the first data record version number, it indicates that the current data record is more updated than the data record in the target service database in time. And the current data updating mark is used for updating the current data record, so that the data record of the same main key record in the target service database is replaced by the current data record according to the current data updating record, and the data record in the target service database is updated.

Step 512, when the same primary key record does not exist in the target service database, determining that the current data mark corresponding to the current data record is the current data newly added mark.

And 514, adding the current data record into the target service database according to the current data addition mark.

Specifically, if the same primary key record does not exist in the target service database, it is indicated that the current data record does not exist in the target service database, and therefore, it can be directly determined that the current data mark corresponding to the current data record is the current data newly added mark. The current data newly-added mark is used for adding the current data record into the target service database, so that the current data record can be newly added into the target service database according to the current data newly-added mark, namely, a data record, namely the current data record, is added into the target service database.

Step 516, when the operation field of the current data record is the delete field, determining that the current data mark corresponding to the current data record is the current data delete mark.

And step 518, deleting the current data record from the target service database according to the current data deletion mark.

Specifically, if the current data record operation field is a delete field, it indicates that the current data record needs to be deleted from the target service database, and therefore, it is determined that the current data record corresponding to the current data record is marked as a current data delete marker. The current data deletion flag is used for deleting the current data record from the target service database, so that the current data record can be deleted from the target service database according to the current data deletion flag.

In one embodiment, as shown in fig. 6, the current data record is multiple, and the current data record is operated according to the current data flag to obtain a new current data set, including:

step 602, obtaining a first current data set according to a current data record corresponding to the current data mark or the current data newly added mark.

Step 604, obtaining a second current data set according to the current data record corresponding to the current data deletion mark marked by the current data mark, and determining the first current data set and the current data set as a new current data set.

If a plurality of current data records exist, each current data record has a corresponding current data record mark, and a current data set can be divided according to the current data records. Specifically, the current data is marked as a current data update mark or a current data record corresponding to a current data addition mark to form a first current data set, the current data is marked as a current data record corresponding to a current data deletion mark to form a second current data set, and the first current data set and the current data set form a new current data set.

In one embodiment, as shown in FIG. 7, FIG. 7 illustrates a flow diagram of data writing to the pg library and the third-party storage engine in one embodiment. Consuming real-time data in a database, taking a batch every 15 seconds, analyzing and widening the data of each batch to generate a data set of the current real-time batch, grouping the data sets of the current real-time batch according to a main key to obtain a record with the maximum version number of each group, initializing a pg database, and performing service logic judgment on each record in the data set of the current real-time batch, specifically, judging whether the same main key record exists in the pg database, if so, judging whether the record is deleted by a service mark, and if so, marking the record as: (delete), otherwise, if the record is not deleted by the service mark, then determine if the record version number is greater than the version number of the same primary key record in the pg database, if so, then mark the record as: ("update", record).

Wherein if no identical primary key record exists in the pg database, marking the record as: ("insert," records) resulting in a current data set of data records for each marker.

Further, persisting a marked current data set by a persistence (system framework), writing each record in the current data set into a pg database according to marking operation (deleting/adding/updating), meanwhile, judging whether a leading-out switch in the hudi data lake is opened, if the leading-out switch is opened, constructing an appendRdd (newly added or updated data set) and a deleetrdd (deleted data set) according to the record mark, submitting a thread pool, writing into the hudi data lake, and completing writing of each record in the current data set of the current real-time batch.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 8, there is provided a data writing apparatus 800 comprising: an obtaining module 802, a determining module 804, a first writing module 806, and a second writing module 808, wherein:

an obtaining module 802, configured to obtain a current data set of a current real-time batch, where the current data set includes at least one current data record.

The determining module 804 is configured to perform service logic determination on the current data record in the current data set, so as to obtain a current data mark corresponding to the current data record.

The first writing module 806 is configured to write the current data record into the target service database according to the current data flag, and when a write switch of the third-party storage engine is in an on state, operate the current data record according to the current data flag to obtain a new current data set.

And a second writing module 808, configured to submit the new current data set to the thread pool, and write the new current data set into the third-party storage engine.

In one embodiment, the data writing apparatus 800 receives a current query request through a query interface provided by a third-party storage engine, encapsulates the current query request into a current query task, obtains a current storage engine data file writing path by splicing according to a current model name of the current query task, generates a current temporary table name according to the current model name, the third-party storage engine name and a current timestamp, replaces the current model name in the current storage engine data file writing path with the current temporary table name, generates a new current storage engine data file writing path, and obtains a target query result corresponding to the current query request according to the new current storage engine data file writing path.

In one embodiment, the data writing apparatus 800 receives a current export request through an export interface provided by a third-party storage engine, obtains a default distributed file system name according to the current export request, obtains a current model system name and a current model name, generates a current storage engine data export path according to the distributed file system name, the current model system name, and the current model name, and exports target export data corresponding to the current export request according to the current storage engine data export path when the third-party storage engine has the current storage engine data export path.

In an embodiment, the obtaining module 802 obtains a non-time-series data set within a preset time period, and obtains a current data set of a current real-time batch after analyzing and broadening the non-time-series data set.

In one embodiment, the current data record includes a current data record primary key, a current data record operation field, and a current data record version number, the decision module 804 determines whether the same primary key record already exists in the target business database according to the current data primary key, when the same primary key record exists in the target business database, determining whether the current data record operation field is a deletion field, when the current data record operation field is a non-deletion field, acquiring a first data record version number corresponding to the same primary key record in the target service database, when the version number of the current data record is greater than the version number of the first data record, determining that the current data mark corresponding to the current data record is a current data update mark, and replacing the current data record with the data record of the same main key record in the target service database according to the current data updating mark.

In an embodiment, the determining module 804 determines that the current data corresponding to the current data record is marked as a current data adding mark when the same primary key record does not exist in the target service database, adds the current data record to the target service database according to the current data adding mark, determines that the current data corresponding to the current data record is marked as a current data deleting mark when the current data record operation field is a deleting field, and deletes the current data record from the target service database according to the current data deleting mark.

In an embodiment, the current data records are multiple, the second writing module 808 obtains a first current data set according to the current data record corresponding to the current data update flag or the current data addition flag marked as the current data, obtains a second current data set according to the current data record corresponding to the current data delete flag marked as the current data, and determines the first current data set and the current data set as new current data sets.

For specific limitations of the data writing device, reference may be made to the above limitations of the data writing method, which is not described herein again. The respective modules in the data writing apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store the current data set. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data writing method.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: the method comprises the steps of obtaining a current data set of a current real-time batch, wherein the current data set comprises at least one current data record, conducting business logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record, writing the current data record into a target business database according to the current data mark, operating the current data record according to the current data mark when a write-in switch of a third-party storage engine is in an open state to obtain a new current data set, submitting the new current data set to a thread pool, and writing the new current data set into the third-party storage engine.

In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving a current query request through a query interface provided by a third-party storage engine, packaging the current query request into a current query task, splicing according to a current model name of the current query task to obtain a current storage engine data file writing path, generating a current temporary table name according to the current model name, the third-party storage engine name and a current timestamp, replacing the current model name in the current storage engine data file writing path with the current temporary table name, generating a new current storage engine data file writing path, and obtaining a target query result corresponding to the current query request according to the new current storage engine data file writing path.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the method comprises the steps of receiving a current export request through an export interface provided by a third-party storage engine, obtaining a default distributed file system name according to the current export request, obtaining a current model system name and a current model name, generating a current storage engine data export path according to the distributed file system name, the current model system name and the current model name, and exporting target export data corresponding to the current export request according to the current storage engine data export path when the current storage engine data export path exists in the third-party storage engine.

In one embodiment, the processor, when executing the computer program, further performs the steps of: : and acquiring a non-time sequence data set in a preset time period, and analyzing and widening the non-time sequence data set to obtain a current data set of the current real-time batch.

In one embodiment, the current data record includes a current data record primary key, a current data record operation field, and a current data record version number, and the processor when executing the computer program further performs the steps of: determining whether the same main key record exists in a target business database according to a current data main key, determining whether a current data record operation field is a deleted field when the same main key record exists in the target business database, acquiring a first data record version number corresponding to the same main key record in the target business database when the current data record operation field is a non-deleted field, determining that a current data mark corresponding to the current data record is a current data updating mark when the current data record version number is larger than the first data record version number, and replacing the current data record with the data record of the same main key record in the target business database according to the current data updating mark.

In one embodiment, the processor, when executing the computer program, further performs the steps of: when the same primary key record does not exist in the target service database, determining that the current data mark corresponding to the current data record is a current data adding mark, adding the current data record into the target service database according to the current data adding mark, determining that the current data mark corresponding to the current data record is a current data deleting mark when the current data record operation field is a deleting field, and deleting the current data record from the target service database according to the current data deleting mark.

In one embodiment, the current data record is a plurality of data records, and the processor when executing the computer program further performs the following steps: and obtaining a first current data set according to the current data record corresponding to the current data mark as a current data updating mark or a current data adding mark, obtaining a second current data set according to the current data record corresponding to the current data mark as a current data deleting mark, and determining the first current data set and the current data set as new current data sets.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: the method comprises the steps of obtaining a current data set of a current real-time batch, wherein the current data set comprises at least one current data record, conducting business logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record, writing the current data record into a target business database according to the current data mark, operating the current data record according to the current data mark when a write-in switch of a third-party storage engine is in an open state to obtain a new current data set, submitting the new current data set to a thread pool, and writing the new current data set into the third-party storage engine.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A data writing method, the method comprising:

Obtain the current data set of the current real-time batch, the current data set includes at least one current data record;

Performing business logic judgment on the current data record in the current data set to obtain a current data mark corresponding to the current data record;

Write the current data record into the target service database according to the current data mark, and when the write switch of the third-party storage engine is on, operate the current data record according to the current data mark to obtain a new current data record. data set;

Submit the new current data set to the thread pool and write it into the third-party storage engine.

2. The method according to claim 1, wherein the method further comprises:

Receive the current query request through the query interface provided by the third-party storage engine, and encapsulate the current query request into a current query task;

According to the current model name of the current query task, the current storage engine data file writing path is obtained by splicing;

Generate the current temporary table name according to the current model name, the third-party storage engine name and the current timestamp;

Replace the current model name in the current storage engine data file writing path with the current temporary table name, and generate a new current storage engine data file writing path;

The target query result corresponding to the current query request is obtained according to the new current storage engine data file writing path.

3. The method according to claim 1, wherein the method further comprises:

Receive the current export request through the export interface provided by the third-party storage engine, and obtain the default distributed file system name according to the current export request;

Obtain the current model system name and the current model name, and generate the current storage engine data export path according to the distributed file system name, the current model system name and the current model name;

When the third-party storage engine has the current storage engine data export path, export the target export data corresponding to the current export request according to the current storage engine data export path.

4. The method according to claim 1, wherein the obtaining the current data set of the current real-time batch comprises:

Obtain non-series data sets within a preset time period;

After parsing and widening the non-sequential data set, the current data set of the current real-time batch is obtained.

5. The method according to claim 1, wherein the current data record comprises a current data record primary key, a current data record operation field and a current data record version number, and the current data record in the current data set is The data record performs business logic judgment, obtains the current data mark corresponding to the current data record, and writes the current data record into the target business database according to the current data mark, including:

Determine whether the same primary key record already exists in the target business database according to the current data primary key;

When the same primary key record already exists in the target business database, determine whether the current data record operation field is a delete field;

When the current data record operation field is a non-deleted field, obtain the first data record version number corresponding to the same primary key record in the target business database;

When the current data record version number is greater than the first data record version number, determining that the current data mark corresponding to the current data record is the current data update mark;

The current data record is replaced with the data record of the same primary key record in the target service database according to the current data update mark.

6. The method according to claim 5, wherein the method further comprises:

When the same primary key record does not exist in the target business database, determine that the current data mark corresponding to the current data record is a new mark for the current data;

adding the current data record to the target service database according to the current data addition mark;

When the current data record operation field is a delete field, determine that the current data mark corresponding to the current data record is the current data delete mark;

The current data record is deleted from the target service database according to the current data deletion flag.

7. The method according to any one of claims 5 or 6, wherein the current data record is multiple, and the current data record is operated according to the current data mark to obtain a new current data set, include:

Obtaining the first current data set according to the current data record corresponding to the current data mark as the current data update mark or the current data new mark;

A second current data set is obtained according to the current data record corresponding to the current data deletion flag, and the first current data set and the current data set are determined as new current data sets.

8. A data writing device, wherein the device comprises:

an acquisition module for acquiring the current data set of the current real-time batch, the current data set including at least one current data record;

a judgment module, configured to perform business logic judgment on the current data record in the current data set, and obtain a current data mark corresponding to the current data record;

The first writing module is used to write the current data record into the target service database according to the current data mark, and when the write switch of the third-party storage engine is in an open state, operate the current data mark according to the current data mark. The current data record, get the new current data set;

The second writing module is configured to submit the new current data set to the thread pool and write it into the third-party storage engine.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any of claims 1 to 7 when the processor executes the computer program. A step of the method.

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.