CN111324783B

CN111324783B - Data processing method and device

Info

Publication number: CN111324783B
Application number: CN202010191386.4A
Authority: CN
Inventors: 倪艳
Original assignee: Dongpu Software Co Ltd
Current assignee: Dongpu Software Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2023-08-29
Anticipated expiration: 2040-03-18
Also published as: CN111324783A

Abstract

The application discloses a data processing method and device, computer equipment and a computer readable storage medium, wherein the method comprises the following steps: importing data from outside and writing the data into an import queue; and executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into a database. The data processing method, the device, the equipment and the storage medium can solve the technical problem of low processing efficiency caused by mess of data in the existing data processing process.

Description

Data processing method and device

Technical Field

The present application relates to the field of data processing technology, and in particular, to a data processing method, a computer apparatus, a device, and a computer readable storage medium.

Background

In the data processing process, in order to achieve the task of the closest deadline, partial data is usually processed first to enter a database. This is prone to data clutter, omission, and data processing delays due to busyness, irregular manual operations, or manual carelessness.

Disclosure of Invention

The application aims to provide a data processing method and device, computer equipment and a computer readable storage medium, so as to solve the technical problem of low processing efficiency caused by data disorder in the existing data processing process.

The application adopts the following technical scheme:

a first aspect of the present application provides a data processing method, the method comprising:

importing data from outside and writing the data into an import queue;

and executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into a database.

By importing data from the outside and writing the data into the import queue and executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into the database, frequent importing of repeated data can be avoided, data disorder cannot occur in the data processing process, processing efficiency is improved, and human resources are saved.

Optionally, the data processing method further includes:

executing a reading script to write the data in the database into a processing queue in batches;

and executing a processing script to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which does not pass the check in the database.

In the data processing method, the data parts responsible for each script and each queue are different and work and cooperate separately, so that the script and the queue are independent of an individual, are integrated into a whole and are different, and the code operation is high-speed, stable and easy to maintain. Therefore, human resources can be saved, defects caused by irregular manual operation are avoided, and a large amount of time is saved.

Optionally, the data processing method further includes:

executing the sending script to push the data written into the sending queue to the appointed platform, and updating the processing state of the data which is not successfully sent in the database.

The execution of the transmission script transmits the data written into the transmission queue to the appointed platform, and updates the processing state of the data which is not successfully transmitted in the database, so that the method can adapt to the requirement of an actual use scene, and the data is transmitted to the appointed platform as a basic data source thereof.

Optionally, the method further comprises:

writing the data which is not checked and successfully transmitted in the database into an abnormal queue, executing an abnormal script to process the data in the abnormal queue and updating the processing state of the data.

By writing the data which is not checked and successfully transmitted in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and updating the processing state of the data, the data which is processed or is unsuccessfully transmitted due to the abnormal system can be processed, and the omission of data processing is avoided.

Optionally, the executing filtering script performs deduplication filtering on the data in the import queue, and then writes the data into a database, including:

inquiring the processing state of the data from a preset cache;

if the processing state of the data is the preset processing state, the data is not written into a database;

if the processing state of the data is not queried in the cache, recording the processing state of the data as to-be-processed, and writing the data into a database.

By the steps, the data with the processing state being the preset processing state is not repeatedly written into the database, so that not only can the data redundancy be avoided, but also the data processing is more efficient.

Optionally, the executing filtering script performs deduplication filtering on the data in the import queue, and writes the data into a database, and further includes:

and rewriting abnormal data which is abnormal during data processing and causes processing failure into the import queue.

The abnormal data which is abnormal and causes processing failure during data processing is rewritten into the import queue, so that the data which is filtered and failed due to system abnormality can be processed, and data processing omission is avoided.

Optionally, the processing state of the data written into the database after the deduplication filtration is to be processed, and the processing state of the failed data in the database is checked to be processing failure.

Because the processing state of the data written into the database after the duplicate removal filtering is to be processed, the processing state of the data which does not pass through the verification in the database is processing failure, and thus, corresponding processing can be carried out according to the processing state of the data.

Optionally, the executing the read script writes data in the database to the processing queue in batches, including:

reading the starting time from the first time file, executing a reading script to read the data to be processed in the processing state of the warehouse-in time within the first time after the starting time, and writing the data into a processing queue;

and when each reading is finished, writing the reading finishing time into the first time file as the starting time of the next reading.

The data to be processed in the database can be written into the processing queue in batches by reading the data to be processed in the processing state within the first time after the start time and writing the data into the processing queue, so that the data can be subjected to streamline processing, and the data processing efficiency is improved.

reading the starting time T1 from a second time file, executing a reading script to read the data which are to be processed and fail to be processed in the processing state within a second time length T1 before the starting time T1, and writing the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;

and when each reading is finished, writing the reading finishing time into the second time file as the starting time of the next reading.

The data which is to be processed and failed to be processed and is caused by omission and system abnormality can be processed by reading the data which is to be processed and failed to be processed and is in the processing queue in the processing state within the second time period T1 before the starting time T1 in the warehouse-in time.

A second aspect of the present application provides a data processing apparatus, the apparatus comprising:

the input module is used for importing data from the outside and writing the data into the import queue;

and the filtering module is used for executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into the database.

Optionally, the data processing apparatus further includes:

the reading module is used for executing a reading script to write the data in the database into the processing queue in batches;

and the processing module is used for executing processing scripts to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which is not checked in the database.

In the data processing device, the data parts responsible for each script and each queue are different, and work division cooperation is performed, so that the script and the queues are independent of an individual and are integrated together, and the scripts and the queues are different, and therefore the code operation is high-speed, stable and easy to maintain. Therefore, human resources can be saved, defects caused by irregular manual operation are avoided, and a large amount of time is saved.

Optionally, the data processing apparatus further includes:

and the sending module is used for executing the sending script to push the data written into the sending queue to the appointed platform and updating the processing state of the data which is not successfully sent in the database.

Optionally, the data processing apparatus further includes:

the exception module is used for writing the data which is not checked and successfully transmitted in the database into the exception queue, executing the exception script to process the data in the exception queue and updating the processing state of the data.

Optionally, the filtering module includes:

the inquiring unit is used for inquiring the processing state of the data from a preset cache;

a writing unit, configured to not write the data into a database if the processing state of the data is a predetermined processing state;

and the recording unit is used for recording the processing state of the data as to-be-processed and writing the data into a database if the processing state of the data is not inquired in the cache.

Because the filtering module is constructed to comprise the query unit, the writing unit and the recording unit, the data with the processing state being the preset processing state is not repeatedly written into the database, so that the data redundancy can be avoided, and the data processing is more efficient.

Optionally, the filtering module further comprises:

and the filtering exception unit is used for rewriting the exception data which is abnormal during the data processing and causes processing failure into the import queue.

Optionally, the reading module includes:

the first reading time unit is used for reading the starting time from the first time file, executing a reading script to read the data to be processed in the processing state of the warehouse-in time within the first time after the starting time and writing the data into the processing queue;

and the first writing time unit is used for writing the reading end time into the first time file as the starting time of the next reading when each reading is ended.

Optionally, the reading module further includes:

the second reading time unit is used for reading the starting time T1 from the second time file, executing the reading script, reading the data which are to be processed and failed to be processed and are written into the processing queue when the processing state of the warehouse-in time within a second duration T1 before the starting time T1; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;

and the second writing time unit is used for writing the reading end time into the second time file as the starting time of the next reading when each reading is ended.

A third aspect of the application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when the program is executed.

A fourth aspect of the application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.

Drawings

The application will be further described with reference to the drawings and examples.

FIG. 1 is a flow chart of a data processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of a data processing method according to another embodiment of the present application;

FIG. 3 is a flow chart of a data processing method according to another embodiment of the present application;

FIG. 4 is a flow chart of a data processing method according to another embodiment of the present application;

FIG. 5 is a schematic flow chart of step S200 in FIGS. 1-4;

FIG. 6 is another schematic flow chart of step S200 in FIGS. 1-4;

FIG. 7 is a flow chart of step S300 in FIGS. 2-4;

FIG. 8 is another flow chart of step S300 in FIGS. 2-4;

FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a data processing apparatus according to another embodiment of the present application;

FIG. 11 is a schematic diagram of a data processing apparatus according to another embodiment of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present application;

fig. 13 is a schematic view of the structure of the filter module 200 of fig. 9-12;

fig. 14 is another schematic diagram of the filter module 200 of fig. 9-12;

FIG. 15 is a schematic diagram of the structure of the read module 300 of FIGS. 10-12;

FIG. 16 is another schematic diagram of the read module 300 of FIGS. 10-12;

FIG. 17 is a flow chart of a data processing method according to yet another embodiment of the present application;

fig. 18 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

The embodiment of the application provides a data processing method. As shown in fig. 1, the data processing method includes:

and a recording step S100, wherein data is imported from the outside and written into the import queue. Specifically, the user configures fixed information required for data and selects the type and time of data to be generated, then imports data to be processed from the operation page, and then writes the data into the importation queue.

And a filtering step S200, wherein filtering scripts are executed to perform de-duplication filtering on the data in the import queue and then the data is written into a database. Specifically, the filtering script consumes the data of the import queue to carry out deduplication filtering and then warehousing, so that repeated data can be prevented from being imported repeatedly. Wherein, the data processing state can be recorded, and the data processing state is used as the basis for the data deduplication.

Optionally, as shown in fig. 2, the data processing method may further include:

and a reading step S300, wherein the reading script is executed to write the data in the database into the processing queue in batches. The data can be read according to the date and written into the processing queue, so that the data to be processed is read and written.

And a processing step S400, wherein a processing script is executed to check the data in the processing queue, the checked data is written into the sending queue, and the processing state of the data which is not checked in the database is updated. Specifically, the processing queue is consumed through the data processing script, and the data to be processed is checked in further detail. The verification is successful, and the queue to be sent is written; and (4) checking failure, recording error reasons and updating the processing state of the error reasons in the database. The updating can be processed in batches, abnormal data can be processed in batches, the reasons of the abnormal data are recorded, and finally the data are written into an abnormal queue.

Optionally, as shown in fig. 3, the data processing method may further include:

and a sending step S500, wherein the sending script is executed to push the data written into the sending queue to the appointed platform, and the processing state of the data which is not successfully sent in the database is updated. In particular, a data transmission script may be executed to consume a transmission queue to transmit data to a designated platform or other desired platform/system.

Further, as shown in fig. 4, the data processing method may further include:

and an exception step S600, namely writing the data which is not checked to pass in the database and the data which is not successfully transmitted into an exception queue, executing an exception script to process the data in the exception queue and updating the processing state of the data.

The filtering script, the reading script, the processing script, the sending script and the abnormal script are all uniformly scheduled by the appointed script. In particular, shell scripts may be used in unison to mobilize the scripts.

As the specified scripts are uniformly used for scheduling all scripts, the method is convenient to use and high in controllability, can control the increase and decrease of the process data volume, and can uniformly regulate and increase and decrease. The reading, processing, updating and sending speed of the data are controlled by adjusting the number of processes, so that the data are not piled up, the data processing of each link is independent, and even if one link is broken, the subsequent script can continue to operate to process the existing data in the queue.

As shown in fig. 5, the filtering the data in the import queue by filtering the script, and writing the data into the database may include:

and a query step S201, wherein the processing state of the data is queried from a preset cache.

And a writing step S202, wherein if the processing state of the data is a preset processing state, the data is not written into a database.

And a recording step S203, wherein if the processing state of the data is not queried in the cache, the processing state of the data is recorded as to-be-processed, and the data is written into a database.

By the steps, the data with the processing state being the preset processing state can be prevented from being repeatedly written into the database, so that not only can the data redundancy be avoided, but also the data processing is more efficient.

Further, as shown in fig. 6, the filtering script performs deduplication filtering on the data in the import queue, and then writes the data into a database, and may further include:

and filtering the exception, namely S204, rewriting the exception data which is abnormal during the data processing and causes processing failure into the import queue.

As an alternative implementation manner, the processing state of the data written into the database after the deduplication filtration is to be processed, and the processing state of the data which does not pass through the verification in the database is to be processing failure.

Wherein, as shown in fig. 7, the batch writing of the data in the database into the processing queue by the read script may include:

a first reading time step S301, reading a starting time from a first time file, executing a reading script to read data to be processed from a processing state of which the warehousing time is within a first time period after the starting time, and writing the data into a processing queue; . The first time period may be 3 minutes or 5 minutes or 10 minutes, for example only. It should be noted that the first duration may be determined according to actual needs, and is not limited to the above example.

And a first writing time step S302, when each reading is finished, writing the reading finishing time into the first time file as the starting time of the next reading.

Further, as shown in fig. 8, the batch writing of the data in the database into the processing queue by the read script may further include:

a second read time step S303, reading a start time T1 from a second time file, executing a read script, reading data whose processing status is to be processed and failed to be processed and whose processing status is within a second time period T1 before the start time T1, and writing the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is the third time length. Therein, for example only, the second time period t1 may be half an hour or an hour, and the third time period t2 may be 1 minute or 2 minutes or 3 minutes. It should be noted that the second time period t1 and the third time period t2 may be determined according to actual needs, and are not limited to the above examples.

And a second writing time step S304, when each reading is finished, writing the reading finishing time into the second time file as the starting time of the next reading.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

The embodiment of the application also provides a data processing device. As shown in fig. 9, the data processing apparatus may include:

an input module 100 for externally importing data and writing the data into an import queue;

and the filtering module 200 is used for executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into the database.

Optionally, as shown in fig. 10, the data processing apparatus may further include:

the reading module 300 is configured to execute a reading script to write data in the database into the processing queue in batches.

And the processing module 400 is used for executing processing scripts to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which is not checked in the database.

Optionally, as shown in fig. 11, the data processing apparatus may further include:

and the sending module 500 is used for executing a sending script to push the data written into the sending queue to the appointed platform and updating the processing state of the data which is not successfully sent in the database.

Further, as shown in fig. 12, the data processing apparatus may further include:

the exception module 600 is configured to write data that fails to pass the verification in the database and data that is unsuccessful to send into an exception queue, execute an exception script to process the data in the exception queue, and update a processing state of the data.

The filtering script, the reading script, the processing script, the sending script and the abnormal script are all uniformly scheduled by the appointed script.

As shown in fig. 13, the filtering module 200 may include:

a query unit 201, configured to query a processing state of the data from a preset cache;

a writing unit 202, configured to not write the data into the database if the processing state of the data is a predetermined processing state;

and the recording unit 203 is configured to record the processing state of the data as to-be-processed if the processing state of the data is not queried in the cache, and write the data into a database.

Further, as shown in fig. 14, the filtering module 200 may further include:

the filtering exception unit 204 is configured to rewrite, into the import queue, exception data that has failed in processing due to an exception occurring during data processing.

As shown in fig. 15, the reading module 300 may further include:

the first reading time unit 301 is configured to read a start time from a first time file, execute a reading script to read data to be processed from a processing state of the input time within a first time period after the start time, and write the data to a processing queue;

the first writing time unit 302 is configured to write, when each reading is finished, a reading finishing time into the first time file as a starting time of a next reading.

Further, as shown in fig. 16, the reading module 300 may further include:

the second reading time unit 303 is configured to read a start time T1 from a second time file, execute a reading script, read data whose processing status is to be processed and failed to be processed and whose processing status is within a second duration T1 before the start time T1, and write the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;

a second writing time unit 304, configured to write, when each reading ends, a reading end time into the second time file as a start time of a next reading.

The function implementation of each module in the data processing device corresponds to each step in the data processing method embodiment, and the function and implementation process of each module are not described in detail herein.

The embodiment of the application also provides a data processing method, as shown in fig. 17, and the specific implementation manner is as follows:

first, data is imported from a user page, and after preliminary determination is made in the background, the data is written into an import queue scan_entry_new_import.

And secondly, data are subjected to duplicate removal and warehousing. And starting a plurality of process consumption import queues through the shell scheduling script. Specifically, the order status is queried in the cache pika according to the order number shim_id and the order type scan_type. If the order state is any one of the three of "processing success", "waiting to be processed" and "processing", the import is not allowed, and the repeated import is avoided. If any state information does not exist, the initial import is judged, and the corresponding shift_id-scan_type information state is added as 'to be processed' and written into the cache pika. Based on this, the filtered information is written into the t_scan_entitr_tbl table. Due to the specificity of the queue, an exception may occur in the processing procedure, and the exception data is rewritten into the scan_entry_new_import queue to wait for the next reprocessing.

Again, the data is read. The start time in time file a is read by the shell enabling a single process. The data is read according to the value of judge tm (i.e. the time when the initial value in the library is the data in the library) every time the data is read for 5 minutes. Data is written into the processing queue test_scan_data_real. After the read data is processed, the time file is written in by taking the time of the fetch unit as the starting time of the next read.

Further, a single repeated read script can also be enabled by Shell to read the start time in time file b. Reading data before half an hour each time, and terminating the process if the current time-half an hour is less than or equal to the initial time plus 1 minute; otherwise, the non-processing successful order in the reading time period is written into the processing queue test_scan_data_deal. After the read data is processed, the time file is written in by taking the time of the fetch unit as the starting time of the next read. In this way, data that is missing and system anomalies that result in processing failures but that has not been reintroduced can be processed.

Again, the data is processed. Multiple processes are enabled through Shell to consume the test_scan_data_deal queue. And carrying out business detailed verification and batch processing on the data. For data which is not passed through the verification, recording the reason for the failed data, updating the data state in the database in batches, adding the error reason, changing the value of jude_tm to the current time and changing the state in the cache pika. If the check passes, the transmit queue test_scan_data is written to wait for transmit consumption.

Finally, the data is transmitted. Enabling multiple processes through Shell consumes the test_scan_data queue. The data may be processed in batches, specifically, assembled according to a joint debugging format and sent to a designated platform. If the transmission fails, recording an error reason, changing the state of the data in the database, adding the error reason, changing the judge_tm to the current time, and changing the state of the data in the cache pika; if the transmission is successful, the data state in the database is changed, and the data state in the cache pika is changed to be successful.

The following processes may be performed for the abnormal data in the above-described data processing and data transmitting steps: if the updated database has abnormal errors, the data can be recorded into an abnormal queue testScanDataError to perform unified error updating processing; the exception queue testScanDataError is consumed by Shell enabling multiple processes, uniformly updating database errors, changing processing state, changing judge_tm to current time, and changing data state in cache pika.

Taking the unit 4 and 9 as examples, the unit is internally sent to another platform of the company as a basic data source.

First, the imported data 4 and 9 specifically include the type of order and other basic information of the unit. The order types may include type 1 and type 2, respectively "a type" and "B type", specifically, 4 x 9-1 and 4 x 9-2 may be used. After import, the data enters the import queue scan_entry_new_import.

Second, the import queue scan_entry_new_import is consumed. Taking 4 x 9-2 as key, looking up the state in the buffer pika, if no state information is looked up, the single sign is indicated as initial import, and the key value added with 4 x 9-2 is 3, which indicates "waiting processing". Wherein if a query finds a value of 4-9-1 of 1 (indicating "processing is successful"), then the processing of type 4-9 of 1 is skipped and type 2 data is written into the database.

And starting a reading script, reading data with the type of 4 and 9 being 2 in a certain time range, and writing the data into a test_scan_data_data queue.

Again, the processing script starts and reads the 4 x 9 data in the test_scan_data_real queue. Checking the imported basic information, if the information is found to be wrong, updating the data state of 4 and 9 in the database to be 'processing failure', marking 'xx information error', and changing the value of 4 and 9-2 in the buffer pika to be 2 (representing 'processing failure').

Wherein, the basic information of the error caused by the user change is reintroduced, and the steps are re-executed. And checking other basic data, and if no error exists, writing the data into a transmission queue test_scan_data.

Finally, a send script is enabled. And splicing and transmitting the data in the transmission queue test_scan_data to another platform of the company according to the correct format. If the transmission fails, repeating the steps according to the failure reason; if the transmission is successful, then other data is processed.

The data processing process is seemingly complex, and more scripts are involved, but the data processing of each link is independently carried out, so that even if a problem occurs in one link, the normal operation processing of other scripts on the existing data is not affected. The data processing of each link is single and efficient, the problem investigation is convenient, and the division of labor is definite. Even if applied to processing millions of data, is not limited at all. In particular, if the logic verification in the processing script is not very complex, the throughput per hour will be more considerable, not only efficient and stable, but also very fast.

Embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the methods of the various embodiments described above.

Fig. 18 is a schematic diagram of a computer device according to an embodiment of the present application. The computer device 6 comprises: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60. The steps of the various data processing method embodiments described above are implemented when the processor 60 executes the computer program 62. Alternatively, the processor 60, when executing the computer program 62, performs the functions of the modules/units of the various data processing apparatus embodiments described above.

Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used to describe the execution of the computer program 62 in the data processing apparatus/computer device 6.

The computer device 6 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The terminal 6 device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the computer device 6 and is not limiting of the computer device 6, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the computer device 6 may also include input and output devices, network access devices, buses, etc.

The processor 60 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. The memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Further, the memory 61 may also include both an internal storage unit and an external storage device of the computer device 6. The memory 61 is used for storing the computer program and other programs and data required by the computer device. The memory 61 may also be used for temporarily storing data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of data processing, the method comprising:

importing data from outside and writing the data into an import queue;

executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into a database;

the executing filtering script performs de-duplication filtering on the data in the import queue and writes the data into a database, and the executing filtering script comprises:

inquiring the processing state of the data from a preset cache;

if the processing state of the data is not queried in the cache, recording the processing state of the data as to-be-processed, and writing the data into a database;

2. The data processing method according to claim 1, characterized by further comprising:

3. The data processing method according to claim 2, characterized by further comprising:

4. A data processing method according to claim 3, characterized in that the method further comprises:

5. A data processing method according to claim 2 or 3, wherein the processing status of the data written into the database after the deduplication filtering is to be processed, and the processing status of the data that is checked to be failed in the database is processing failure.

6. The data processing method of claim 5, wherein executing the read script writes data in the database to the processing queue in batches, comprising:

7. The data processing method of claim 5, wherein executing the read script writes data in the database to the processing queue in batches, comprising:

8. A data processing apparatus, the apparatus comprising:

the filtering module is used for executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into a database;

the filter module includes:

a recording unit, configured to record the processing state of the data as to-be-processed if the processing state of the data is not queried in the cache, and write the data into a database;