CN111324783A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN111324783A
CN111324783A CN202010191386.4A CN202010191386A CN111324783A CN 111324783 A CN111324783 A CN 111324783A CN 202010191386 A CN202010191386 A CN 202010191386A CN 111324783 A CN111324783 A CN 111324783A
Authority
CN
China
Prior art keywords
data
processing
queue
time
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010191386.4A
Other languages
Chinese (zh)
Other versions
CN111324783B (en
Inventor
倪艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongpu Software Co Ltd
Original Assignee
Dongpu Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongpu Software Co Ltd filed Critical Dongpu Software Co Ltd
Priority to CN202010191386.4A priority Critical patent/CN111324783B/en
Publication of CN111324783A publication Critical patent/CN111324783A/en
Application granted granted Critical
Publication of CN111324783B publication Critical patent/CN111324783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and device, computer equipment and a computer readable storage medium, wherein the method comprises the following steps: importing data from the outside and writing the data into an import queue; and executing a filtering script to perform de-duplication filtering on the data in the import queue and writing the data into a database. The data processing method, the data processing device, the data processing equipment and the storage medium can solve the technical problem of low processing efficiency caused by data disorder in the existing data processing process.

Description

Data processing method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, a computer device, a device, and a computer-readable storage medium.
Background
In the data processing process, in order to complete the task closest to the deadline, a portion of data is usually processed and recorded into a database. This is prone to data clutter, omissions, and data processing delays due to busy, non-standard manual operations, or manual ignorance.
Disclosure of Invention
The invention aims to provide a data processing method and device, computer equipment and a computer readable storage medium, so as to solve the technical problem of low processing efficiency caused by data disorder in the existing data processing process.
The purpose of the invention is realized by adopting the following technical scheme:
a first aspect of the present invention provides a data processing method, the method comprising:
importing data from the outside and writing the data into an import queue;
and executing a filtering script to perform de-duplication filtering on the data in the import queue and writing the data into a database.
By importing data from the outside, writing the data into an import queue, executing a filtering script to perform de-duplication filtering on the data in the import queue, and writing the data into a database, frequent import of repeated data can be avoided, so that data disorder cannot occur in the data processing process, the processing efficiency is improved, and manpower resources are saved.
Optionally, the data processing method further includes:
executing a reading script to write data in the database into a processing queue in batches;
and executing the processing script to check the data in the processing queue, writing the data passing the check into the sending queue, and updating the processing state of the data failing the check in the database.
In the data processing method, the data parts responsible for each script and each queue are different, work division cooperation is realized, the data processing method is independent of individuals, is integrated into a whole, and is different in both existence and existence, so that the code runs at a high speed and is stable and easy to maintain. Therefore, the manpower resource can be saved, the defect caused by non-standard manual operation is avoided, and a large amount of time is saved.
Optionally, the data processing method further includes:
and executing the sending script to push the data written into the sending queue to a specified platform, and updating the processing state of the data which is not sent successfully in the database.
The data written into the sending queue is sent to the designated platform by executing the sending script, and the processing state of the data which is sent unsuccessfully in the database is updated, so that the requirements of actual use scenes can be met, and the data is sent to the designated platform as a basic data source.
Optionally, the method further comprises:
and writing the data which is not verified in the database and the data which is not sent successfully into an exception queue, executing an exception script to process the data in the exception queue and updating the processing state of the data.
By writing the data which is not verified and sent unsuccessfully in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and update the processing state of the data, the data which is processed or sent unsuccessfully due to the system abnormality can be processed, and data processing omission is avoided.
Optionally, the executing the filtering script performs deduplication filtering on the data in the import queue and writes the data into the database, including:
inquiring the processing state of the data from a preset cache;
if the processing state of the data is inquired to be a preset processing state, the data is not written into a database;
and if the processing state of the data is not inquired in the cache, recording the processing state of the data as pending, and writing the data into a database.
Through the steps, the data with the processing state being the preset processing state can not be repeatedly written into the database, so that data redundancy can be avoided, and data processing is more efficient.
Optionally, the executing the filtering script performs deduplication filtering on the data in the import queue and writes the data into the database, and further includes:
and rewriting abnormal data which is processed in failure due to the abnormal during the data processing into the import queue.
By rewriting the abnormal data which is processed in failure due to the abnormal occurrence in the data processing period into the import queue, the data which is processed in failure due to the abnormal occurrence of the system can be processed, and the data processing omission is avoided.
Optionally, the processing state of the data written into the database after the deduplication filtering is to be processed, and the processing state of the data which is not passed through the verification in the database is processing failure.
Because the processing state of the data written into the database after the deduplication filtering is to be processed, the processing state of the data which is not verified in the database is processing failure, and thus corresponding processing can be performed according to the processing state of the data.
Optionally, the executing the read script writes data in the database into the processing queue in batches, including:
reading the starting time from the first time file, executing a reading script to read the data of which the processing state of the warehousing time within the first time after the starting time is to be processed and writing the data into a processing queue;
and writing the reading end time into the first time file as the starting time of the next reading when the reading is ended each time.
By reading the data to be processed in the processing state within the first time after the starting time of the warehousing time and writing the data into the processing queue, the data to be processed in the database can be written into the processing queue in batches, thereby being beneficial to carrying out the pipelining processing on the data and improving the data processing efficiency.
Optionally, the executing the read script writes data in the database into the processing queue in batches, including:
reading the starting time T1 from the second time file, executing the processing state of the reading script reading warehousing time within a second time length T1 before the starting time T1 as data to be processed and processing failure, and writing the data into a processing queue; if the time is (T2-T1) ≦ (T1+ T2), ending executing the reading script, wherein T2 is the current time, and T2 is the third time length;
and writing the reading end time into the second time file as the starting time of the next reading when the reading is ended each time.
By reading the data of which the processing state is to be processed and the processing failure within the second time length T1 before the entry time T1 and writing the data into the processing queue, the data of which the processing failure is caused by omission and system abnormality can be processed.
A second aspect of the present invention provides a data processing apparatus, the apparatus comprising:
the recording module is used for importing data from the outside and writing the data into an import queue;
and the filtering module is used for executing a filtering script to perform de-duplication filtering on the data in the import queue and writing the data into a database.
By importing data from the outside, writing the data into an import queue, executing a filtering script to perform de-duplication filtering on the data in the import queue, and writing the data into a database, frequent import of repeated data can be avoided, so that data disorder cannot occur in the data processing process, the processing efficiency is improved, and manpower resources are saved.
Optionally, the data processing apparatus further includes:
the reading module is used for executing the reading script to write the data in the database into the processing queue in batches;
and the processing module is used for executing the processing script to verify the data in the processing queue, writing the data passing the verification into the sending queue, and updating the processing state of the data failing to be verified in the database.
In the data processing device, the data parts responsible for each script and each queue are different, work division and cooperation are realized, the data processing device is independent of individuals and is integrated into a whole, and the same and different data are obtained, so that the code runs at a high speed, stably and easily maintained. Therefore, the manpower resource can be saved, the defect caused by non-standard manual operation is avoided, and a large amount of time is saved.
Optionally, the data processing apparatus further includes:
and the sending module is used for executing the sending script to push the data written into the sending queue to a specified platform and updating the processing state of the data which are not sent successfully in the database.
The data written into the sending queue is sent to the designated platform by executing the sending script, and the processing state of the data which is sent unsuccessfully in the database is updated, so that the requirements of actual use scenes can be met, and the data is sent to the designated platform as a basic data source.
Optionally, the data processing apparatus further includes:
and the exception module is used for writing the data which is not verified in the database and the data which is sent unsuccessfully into the exception queue, executing the exception script to process the data in the exception queue and updating the processing state of the data.
By writing the data which is not verified and sent unsuccessfully in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and update the processing state of the data, the data which is processed or sent unsuccessfully due to the system abnormality can be processed, and data processing omission is avoided.
Optionally, the filtration module comprises:
the query unit is used for querying the processing state of the data from a preset cache;
the writing unit is used for not writing the data into the database if the processing state of the data is inquired to be a preset processing state;
and the recording unit is used for recording the processing state of the data as pending if the processing state of the data is not inquired in the cache, and writing the data into a database.
Because the filtering module is constructed to comprise the query unit, the writing unit and the recording unit, the data with the processing state being the preset processing state can not be repeatedly written into the database, so that the data redundancy can be avoided, and the data processing is more efficient.
Optionally, the filtration module further comprises:
and the exception filtering unit is used for rewriting exception data which is processed in failure due to exception during data processing into the import queue.
By rewriting the abnormal data which is processed in failure due to the abnormal occurrence in the data processing period into the import queue, the data which is processed in failure due to the abnormal occurrence of the system can be processed, and the data processing omission is avoided.
Optionally, the processing state of the data written into the database after the deduplication filtering is to be processed, and the processing state of the data which is not passed through the verification in the database is processing failure.
Because the processing state of the data written into the database after the deduplication filtering is to be processed, the processing state of the data which is not verified in the database is processing failure, and thus corresponding processing can be performed according to the processing state of the data.
Optionally, the reading module comprises:
the first reading time unit is used for reading the starting time from the first time file, executing the reading script to read the data of which the processing state of the warehousing time within the first time after the starting time is to be processed and writing the data into the processing queue;
and the first writing time unit is used for writing the reading ending time into the first time file as the starting time of the next reading when the reading is ended each time.
By reading the data to be processed in the processing state within the first time after the starting time of the warehousing time and writing the data into the processing queue, the data to be processed in the database can be written into the processing queue in batches, thereby being beneficial to carrying out the pipelining processing on the data and improving the data processing efficiency.
Preferably, the reading module further comprises:
the second reading time unit is used for reading the starting time T1 from the second time file, executing the data with the processing state of the reading script reading warehousing time within a second time length T1 before the starting time T1 as the data to be processed and the data with the processing failure, and writing the data into the processing queue; if the time is (T2-T1) ≦ (T1+ T2), ending executing the reading script, wherein T2 is the current time, and T2 is the third time length;
and the second writing time unit is used for writing the reading ending time into the second time file as the starting time of the next reading when the reading is ended each time.
By reading the data of which the processing state is to be processed and the processing failure within the second time length T1 before the entry time T1 and writing the data into the processing queue, the data of which the processing failure is caused by omission and system abnormality can be processed.
A third aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the program.
A fourth aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method described above.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data processing method according to another embodiment of the present invention;
FIG. 3 is a flow chart of a data processing method according to another embodiment of the present invention;
FIG. 4 is a flow chart illustrating a data processing method according to another embodiment of the present invention;
FIG. 5 is a schematic flow chart of step S200 in FIGS. 1-4;
FIG. 6 is another schematic flow chart of step S200 in FIGS. 1-4;
FIG. 7 is a schematic flow chart of step S300 in FIGS. 2-4;
FIG. 8 is another schematic flow chart of step S300 in FIGS. 2-4;
FIG. 9 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a data processing apparatus according to yet another embodiment of the present invention;
FIG. 11 is a schematic structural diagram of a data processing apparatus according to yet another embodiment of the present invention;
FIG. 12 is a schematic structural diagram of a data processing apparatus according to yet another embodiment of the present invention;
FIG. 13 is a schematic diagram of the filter module 200 of FIGS. 9-12;
FIG. 14 is another schematic illustration of the filter module 200 of FIGS. 9-12;
FIG. 15 is a schematic diagram of the structure of the read module 300 of FIGS. 10-12;
FIG. 16 is a schematic view of another structure of the read module 300 of FIGS. 10-12;
FIG. 17 is a flow chart illustrating a data processing method according to yet another embodiment of the present invention;
fig. 18 is a hardware configuration diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
The embodiment of the invention provides a data processing method. As shown in fig. 1, the data processing method includes:
in step S100, data is imported from the outside and written into the import queue. Specifically, the user configures fixed information required for data and selects the type and time of data to be generated, then imports the data required to be processed from the operation page, and then writes the data into an import queue.
And a filtering step S200, executing a filtering script to perform de-duplication filtering on the data in the import queue, and writing the data into a database. Specifically, the data imported into the queue is consumed through the filtering script, and is subjected to deduplication filtering and then is put into a storage, so that repeated importing of repeated data can be avoided. Wherein, the data processing state can also be recorded, and the data processing state is used as the basis for data deduplication.
By importing data from the outside, writing the data into an import queue, executing a filtering script to perform de-duplication filtering on the data in the import queue, and writing the data into a database, frequent import of repeated data can be avoided, so that data disorder cannot occur in the data processing process, the processing efficiency is improved, and manpower resources are saved.
Optionally, as shown in fig. 2, the data processing method may further include:
and a reading step S300, executing a reading script to write the data in the database into a processing queue in batches. Data can be read and written into the processing queue according to the date, so that data reading and writing of the data needing to be processed are achieved.
And a processing step S400, namely executing the processing script to check the data in the processing queue, writing the data passing the check into the sending queue, and updating the processing state of the data failing to pass the check in the database. Specifically, the processing queue is consumed through the data processing script, and the data to be processed is checked in further detail. Successfully checking and writing the data into a queue to be sent; and (4) checking failure, recording error reasons and updating the processing state of the error reasons in the database. The updating can be processed in batch, abnormal data can be processed in batch, the reason of the abnormal data is recorded, and finally the data is written into the abnormal queue.
In the data processing method, the data parts responsible for each script and each queue are different, work division cooperation is realized, the data processing method is independent of individuals, is integrated into a whole, and is different in both existence and existence, so that the code runs at a high speed and is stable and easy to maintain. Therefore, the manpower resource can be saved, the defect caused by non-standard manual operation is avoided, and a large amount of time is saved.
Optionally, as shown in fig. 3, the data processing method may further include:
and a sending step S500, executing a sending script to push the data written into the sending queue to a specified platform, and updating the processing state of the data which is not sent successfully in the database. In particular, a data send script may be executed to consume a send queue to send data to a specified platform or other desired platform/system.
The data written into the sending queue is sent to the designated platform by executing the sending script, and the processing state of the data which is sent unsuccessfully in the database is updated, so that the requirements of actual use scenes can be met, and the data is sent to the designated platform as a basic data source.
Further, as shown in fig. 4, the data processing method may further include:
and an exception step S600, writing the data which is not verified in the database and unsuccessfully sent data into an exception queue, executing an exception script to process the data in the exception queue and updating the processing state of the data.
By writing the data which is not verified and sent unsuccessfully in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and update the processing state of the data, the data which is processed or sent unsuccessfully due to the system abnormality can be processed, and data processing omission is avoided.
And the filtering script, the reading script, the processing script, the sending script and the abnormal script are all uniformly dispatched by a specified script. In particular, shell scripts can be used uniformly to invoke these scripts.
Because all scripts are dispatched by uniformly using the specified scripts, the method is convenient to use and high in controllability, the increase and decrease of the process data volume can be controlled, and unified dispatching, increase and decrease can be realized. The reading, processing, updating and sending speeds of the data are controlled by adjusting the process quantity, the data are ensured not to be accumulated, the data processing of each link is mutually independent, and even if a certain link is broken, the subsequent script can continue to operate to process the existing data in the queue.
As shown in fig. 5, the writing the data in the import queue into the database after performing deduplication filtering on the data in the import queue by the filtering script may include:
and inquiring step S201, inquiring the processing state of the data from a preset cache.
And a writing step S202, if the processing state of the data is inquired to be a preset processing state, not writing the data into the database.
And a recording step S203, if the processing state of the data is not inquired in the cache, recording the processing state of the data as pending, and writing the data into a database.
Through the steps, the data with the processing state being the preset processing state can be prevented from being repeatedly written into the database, so that data redundancy can be avoided, and data processing is more efficient.
Further, as shown in fig. 6, the writing the data in the import queue into the database after performing the deduplication filtering on the data in the import queue by the filtering script may further include:
and an exception filtering step S204, rewriting exception data which is processed in failure due to exception during data processing into an import queue.
By rewriting the abnormal data which is processed in failure due to the abnormal occurrence in the data processing period into the import queue, the data which is processed in failure due to the abnormal occurrence of the system can be processed, and the data processing omission is avoided.
As an optional implementation manner, the processing state of the data written into the database after the deduplication filtering is to be processed, and the processing state of the data that fails to be checked in the database is processing failure.
Because the processing state of the data written into the database after the deduplication filtering is to be processed, the processing state of the data which is not verified in the database is processing failure, and thus corresponding processing can be performed according to the processing state of the data.
As shown in fig. 7, the batch writing of data in the database into the processing queue by the read script may include:
a first reading time step S301, reading a starting time from a first time file, executing a reading script to read data of which the processing state of the warehousing time is within a first time after the starting time as to-be-processed data, and writing the data into a processing queue; . Wherein, for example only, the first duration may be 3 minutes or 5 minutes or 10 minutes. It should be noted that the first time period may be determined according to actual needs, and is not limited to the above example.
In the first writing time step S302, each time reading is finished, the reading end time is written into the first time file as the start time of the next reading.
By reading the data to be processed in the processing state within the first time after the starting time of the warehousing time and writing the data into the processing queue, the data to be processed in the database can be written into the processing queue in batches, thereby being beneficial to carrying out the pipelining processing on the data and improving the data processing efficiency.
Further, as shown in fig. 8, the batch writing of data in the database into the processing queue by the reading script may further include:
a second reading time step S303, reading the starting time T1 from the second time file, executing the data with the processing state of the reading script reading warehousing time within a second time length T1 before the starting time T1 as the data to be processed and the data with the processing failure, and writing the data into a processing queue; if (T2-T1) ≦ (T1+ T2), ending executing the reading script, wherein T2 is the current time, and T2 is the third duration. Wherein, for example only, the second time period t1 may be half an hour or an hour, and the third time period t2 may be 1 minute or 2 minutes or 3 minutes. It is noted that the second duration t1 and the third duration t2 may be determined according to actual needs, respectively, and are not limited to the above examples.
And a second writing time step S304, writing the reading end time into the second time file as the starting time of the next reading when each reading is ended.
By reading the data of which the processing state is to be processed and the processing failure within the second time length T1 before the entry time T1 and writing the data into the processing queue, the data of which the processing failure is caused by omission and system abnormality can be processed.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
The embodiment of the invention also provides a data processing device. As shown in fig. 9, the data processing apparatus may include:
the recording module 100 is used for importing data from the outside and writing the data into an import queue;
and the filtering module 200 is configured to execute a filtering script to perform deduplication filtering on the data in the import queue and write the data into the database.
Optionally, as shown in fig. 10, the data processing apparatus may further include:
and the reading module 300 is used for executing the reading script to write the data in the database into the processing queue in batches.
The processing module 400 is configured to execute the processing script to check the data in the processing queue, write the data that passes the check into the sending queue, and update the processing state of the data that fails the check in the database.
In the data processing device, the data parts responsible for each script and each queue are different, work division and cooperation are realized, the data processing device is independent of individuals and is integrated into a whole, and the same and different data are obtained, so that the code runs at a high speed, stably and easily maintained. Therefore, the manpower resource can be saved, the defect caused by non-standard manual operation is avoided, and a large amount of time is saved.
Optionally, as shown in fig. 11, the data processing apparatus may further include:
the sending module 500 is configured to execute a sending script to push data written in the sending queue to a specified platform, and update a processing state of data that is sent unsuccessfully in the database.
The data written into the sending queue is sent to the designated platform by executing the sending script, and the processing state of the data which is sent unsuccessfully in the database is updated, so that the requirements of actual use scenes can be met, and the data is sent to the designated platform as a basic data source.
Further, as shown in fig. 12, the data processing apparatus may further include:
the exception module 600 is configured to write data that fails to be checked in the database and data that has failed to be sent into an exception queue, execute an exception script to process the data in the exception queue, and update a processing state of the data.
By writing the data which is not verified and sent unsuccessfully in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and update the processing state of the data, the data which is processed or sent unsuccessfully due to the system abnormality can be processed, and data processing omission is avoided.
And the filtering script, the reading script, the processing script, the sending script and the abnormal script are all uniformly dispatched by a specified script.
Because all scripts are dispatched by uniformly using the specified scripts, the method is convenient to use and high in controllability, the increase and decrease of the process data volume can be controlled, and unified dispatching, increase and decrease can be realized. The reading, processing, updating and sending speeds of the data are controlled by adjusting the process quantity, the data are ensured not to be accumulated, the data processing of each link is mutually independent, and even if a certain link is broken, the subsequent script can continue to operate to process the existing data in the queue.
As shown in fig. 13, the filtering module 200 may include:
the query unit 201 is configured to query a processing state of the data from a preset cache;
a writing unit 202, configured to not write the data into the database if the processing state of the data is found to be a predetermined processing state;
the recording unit 203 is configured to record the processing state of the data as to-be-processed if the processing state of the data is not queried in the cache, and write the data into the database.
Because the filtering module is constructed to comprise the query unit, the writing unit and the recording unit, the data with the processing state being the preset processing state can not be repeatedly written into the database, so that the data redundancy can be avoided, and the data processing is more efficient.
Further, as shown in fig. 14, the filtering module 200 may further include:
and the exception filtering unit 204 is used for rewriting exception data which is processed in failure due to exception during data processing into the import queue.
By rewriting the abnormal data which is processed in failure due to the abnormal occurrence in the data processing period into the import queue, the data which is processed in failure due to the abnormal occurrence of the system can be processed, and the data processing omission is avoided.
As an optional implementation manner, the processing state of the data written into the database after the deduplication filtering is to be processed, and the processing state of the data that fails to be checked in the database is processing failure.
Because the processing state of the data written into the database after the deduplication filtering is to be processed, the processing state of the data which is not verified in the database is processing failure, and thus corresponding processing can be performed according to the processing state of the data.
As shown in fig. 15, the reading module 300 may further include:
the first reading time unit 301 is configured to read a start time from a first time file, execute a reading script to read data whose processing state is to be processed and whose warehousing time is within a first time after the start time, and write the data into a processing queue;
and a first writing time unit 302, configured to write the reading end time into the first time file as a start time of a next reading when each reading is ended.
By reading the data to be processed in the processing state within the first time after the starting time of the warehousing time and writing the data into the processing queue, the data to be processed in the database can be written into the processing queue in batches, thereby being beneficial to carrying out the pipelining processing on the data and improving the data processing efficiency.
Further, as shown in fig. 16, the reading module 300 may further include:
the second reading time unit 303 is configured to read a start time T1 from the second time file, execute data whose processing state is to be processed and processing failure within a second duration T1 before the start time T1 and read the warehousing time of the reading script, and write the data into a processing queue; if the time is (T2-T1) ≦ (T1+ T2), ending executing the reading script, wherein T2 is the current time, and T2 is the third time length;
and a second writing time unit 304, configured to write the reading end time into the second time file as a start time of a next reading when each reading is ended.
By reading the data of which the processing state is to be processed and the processing failure within the second time length T1 before the entry time T1 and writing the data into the processing queue, the data of which the processing failure is caused by omission and system abnormality can be processed.
The function implementation of each module in the data processing apparatus corresponds to each step in the data processing method embodiment, and the function and implementation process thereof are not described in detail herein.
An embodiment of the present invention further provides a data processing method, as shown in fig. 17, the specific implementation manner is as follows:
firstly, data is imported from a user page, and after preliminary judgment is carried out in a background, the data is written into an import queue scan _ entering _ new _ import.
And secondly, removing the data and warehousing. And starting a plurality of process consumption import queues through the shell scheduling script. Specifically, the order status is queried in the cache pika according to the order number ship _ id and the order type scan _ typ. If the order state is any one of 'processing success', 'waiting to process' and 'processing', the import is not allowed, and the repeated import is avoided. If any state information does not exist, the initial import is determined, and the corresponding ship _ id-scan _ typ information is added to be in a state of 'pending' and written into the cache pika. Based on this, the filtered information is written in the t _ scan _ entr _ tbl table. Due to the particularity of the queue, an exception may occur in the processing process, and the exception data is rewritten into the scan _ entering _ new _ import queue to wait for the next reprocessing.
Again, the data is read. And starting a single process through the shell, and reading the starting time in the time file a. Each time 5 minutes of data is read, the data is read according to judge _ tm (i.e., the time when the initial value in the library was the data in-put). The data is written into the processing queue test _ scan _ data _ deal. And after the read data is processed, taking the time-to-fetch time writing time file as the starting time of the next reading.
Further, a single repeat-read script can be enabled by the Shell to read the start time in the time file b. Reading data half an hour before each time, and if the current time-half hour is less than or equal to the starting time +1 minute, terminating the process; otherwise, reading the non-processing success order in the time period, and writing the order into the processing queue test _ scan _ data _ deal. And after the read data is processed, taking the time-to-fetch time writing time file as the starting time of the next reading. In this way, data that misses and system exceptions cause processing failures but are not re-imported can be processed.
Again, the data is processed. Enabling multiple processes to consume the test _ scan _ data _ deal queue through the Shell. And carrying out detailed verification on the data in service and carrying out batch processing. For the data which is not passed through the verification, recording the failure reason, updating the data state in the database in batch, adding the error reason, changing judge _ tm to the current time, and changing the state in the cache pika. If the check passes, the transmission queue test _ scan _ data is written to wait for transmission consumption.
Finally, the data is transmitted. Enabling multiple processes through the Shell consumes the test _ scan _ data queue. Data can be processed in batches, specifically, data assembly according to a joint debugging format and sent to a specified platform. If the sending fails, recording error reasons, changing the state of the data in the database, adding error reasons, changing judge _ tm to be the current time, and changing the state of the data in the cache pika; if the sending is successful, the data state of the database is changed, and the data state in the cache pika is changed to be successful.
Wherein, the abnormal data in the step of processing data and sending data can be processed as follows: if the updating database has abnormal errors, the data can be recorded in an abnormal queue testScanDataError to carry out uniform error updating processing; and enabling a plurality of processes to consume the exception queue testScanDataError through the Shell, uniformly updating the database error, changing the processing state, changing the judge _ tm to the current time and changing the data state in the cache pika.
In the following, the single sign 4 x 9 is taken as an example, and it is internally necessary to send the single sign to another platform of the company as a basic data source.
First, data 4 x 9 is imported, including in particular the type of order and other basic information for that order. The order type may include 1 type and 2 types, and may be denoted by "a type" and "B type", and may be denoted by 4 × 9-1 and 4 × 9-2. After import, the data enters the import queue scan _ entering _ new _ import.
Second, the import queue scan _ entering _ new _ import is consumed. And 4 × × 9-2 is taken as key, the state is inquired in the cache pika, if no state information is inquired, the simple sign is taken as initial import, and the key value added with 4 × × 9-2 is 3, so that the condition is 'to be processed'. Wherein, if the query finds that the value of 4 x 9-1 is 1 (indicating "processing success"), the processing of type 1 of 4 x 9 is skipped, and 2 types of data are written into the database.
And starting a reading script, reading data with the type of 4 x 9 being 2 in a certain time range, and writing the data into a test _ scan _ data _ deal queue.
Again, the process script starts, reading the 4 × × 9 data in the test _ scan _ data _ deal queue. And checking the basic information, if the information is found to have errors, updating the data state of 4 × × 9 in the database to be 'processing failure', marking 'xx information errors', and changing the value of 4 × 9-2 in the cache pika to be 2 (indicating 'processing failure').
And the basic information which causes errors due to user change is re-imported, and the steps are re-executed. And checks other basic data, and if no error exists, writes the data into the transmission queue test _ scan _ data.
Finally, the send script is enabled. And splicing and sending the data in the sending queue test _ scan _ data to another platform of the company according to a correct format. If the transmission fails, repeating the steps according to the failure reason; if the transmission is successful, the other data is then processed.
The data processing process seems to be more complicated and involves more scripts, but because each link processes the data independently, even if a certain link goes wrong, the normal operation processing of other scripts on the existing data is not influenced. The data processing of each link is single and efficient, the problem is conveniently checked, and the labor division is clear. Even if applied to processing millions of data, is completely unlimited. In particular, if the logic check in the processing script is not very complicated, the processing amount per hour is more considerable, which is not only efficient and stable, but also very fast.
Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the methods of the above embodiments.
Fig. 18 is a schematic diagram of a computer device according to an embodiment of the present invention. The computer device 6 includes: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps in the various data processing method embodiments described above. Alternatively, the processor 60 implements the functions of the modules/units in the data processing apparatus embodiments described above when executing the computer program 62.
Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 62 in the data processing apparatus/computer device 6.
The computer device 6 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal 6 device may include, but is not limited to, a processor 60, a memory 61. Those skilled in the art will appreciate that fig. 6 is merely an example of a computer device 6 and is not intended to limit the computer device 6 and may include more or fewer components than shown, or some components may be combined, or different components, for example, the computer device 6 may also include input output devices, network access devices, buses, etc.
The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. The memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device. Further, the memory 61 may also include both an internal storage unit and an external storage device of the computer device 6. The memory 61 is used for storing the computer program and other programs and data required by the computer device. The memory 61 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method of data processing, the method comprising:
importing data from the outside and writing the data into an import queue;
and executing a filtering script to perform de-duplication filtering on the data in the import queue and writing the data into a database.
2. The data processing method of claim 1, further comprising:
executing a reading script to write data in the database into a processing queue in batches;
and executing the processing script to check the data in the processing queue, writing the data passing the check into the sending queue, and updating the processing state of the data failing the check in the database.
3. The data processing method of claim 2, further comprising:
and executing the sending script to push the data written into the sending queue to a specified platform, and updating the processing state of the data which is not sent successfully in the database.
4. The data processing method of claim 3, wherein the method further comprises:
and writing the data which is not verified in the database and the data which is not sent successfully into an exception queue, executing an exception script to process the data in the exception queue and updating the processing state of the data.
5. The data processing method according to any one of claims 1 to 4, wherein the executing the filtering script performs deduplication filtering on the data in the import queue and writes the data into the database, and comprises:
inquiring the processing state of the data from a preset cache;
if the processing state of the data is inquired to be a preset processing state, the data is not written into a database;
and if the processing state of the data is not inquired in the cache, recording the processing state of the data as pending, and writing the data into a database.
6. The data processing method of claim 5, wherein the executing filter script performs deduplication filtering on the data in the import queue and writes the data into a database, and further comprising:
and rewriting abnormal data which is processed in failure due to the abnormal during the data processing into the import queue.
7. The data processing method according to claim 2 or 3, wherein the processing status of the data written into the database after the deduplication filtering is pending, and the processing status of the data that fails to be checked in the database is processing failure.
8. The data processing method of claim 7, wherein the executing the read script batches data in the database into a processing queue, comprising:
reading the starting time from the first time file, executing a reading script to read the data of which the processing state of the warehousing time within the first time after the starting time is to be processed and writing the data into a processing queue;
and writing the reading end time into the first time file as the starting time of the next reading when the reading is ended each time.
9. The data processing method of claim 7, wherein the executing the read script batches data in the database into a processing queue, comprising:
reading the starting time T1 from the second time file, executing the processing state of the reading script reading warehousing time within a second time length T1 before the starting time T1 as data to be processed and processing failure, and writing the data into a processing queue; if the time is (T2-T1) ≦ (T1+ T2), ending executing the reading script, wherein T2 is the current time, and T2 is the third time length;
and writing the reading end time into the second time file as the starting time of the next reading when the reading is ended each time.
10. A data processing apparatus, characterized in that the apparatus comprises:
the recording module is used for importing data from the outside and writing the data into an import queue;
and the filtering module is used for executing a filtering script to perform de-duplication filtering on the data in the import queue and writing the data into a database.
CN202010191386.4A 2020-03-18 2020-03-18 Data processing method and device Active CN111324783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010191386.4A CN111324783B (en) 2020-03-18 2020-03-18 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010191386.4A CN111324783B (en) 2020-03-18 2020-03-18 Data processing method and device

Publications (2)

Publication Number Publication Date
CN111324783A true CN111324783A (en) 2020-06-23
CN111324783B CN111324783B (en) 2023-08-29

Family

ID=71169918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010191386.4A Active CN111324783B (en) 2020-03-18 2020-03-18 Data processing method and device

Country Status (1)

Country Link
CN (1) CN111324783B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181039A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Systems and methods for on-demand data storage
US20140181021A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Back up using locally distributed change detection
CN107302569A (en) * 2017-06-08 2017-10-27 武汉火凤凰云计算服务股份有限公司 A kind of security monitoring Data acquisition and storage method of facing cloud platform
US20180211046A1 (en) * 2017-01-26 2018-07-26 Intel Corporation Analysis and control of code flow and data flow
CN110019873A (en) * 2017-12-25 2019-07-16 深圳市优必选科技有限公司 Human face data processing method, device and equipment
WO2019169693A1 (en) * 2018-03-08 2019-09-12 平安科技(深圳)有限公司 Method for quickly importing data in batches, and electronic apparatus and computer-readable storage medium
CN110245011A (en) * 2018-03-08 2019-09-17 北京京东尚科信息技术有限公司 A kind of method for scheduling task and device
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181039A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Systems and methods for on-demand data storage
US20140181021A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Back up using locally distributed change detection
US20180211046A1 (en) * 2017-01-26 2018-07-26 Intel Corporation Analysis and control of code flow and data flow
CN107302569A (en) * 2017-06-08 2017-10-27 武汉火凤凰云计算服务股份有限公司 A kind of security monitoring Data acquisition and storage method of facing cloud platform
CN110019873A (en) * 2017-12-25 2019-07-16 深圳市优必选科技有限公司 Human face data processing method, device and equipment
WO2019169693A1 (en) * 2018-03-08 2019-09-12 平安科技(深圳)有限公司 Method for quickly importing data in batches, and electronic apparatus and computer-readable storage medium
CN110245011A (en) * 2018-03-08 2019-09-17 北京京东尚科信息技术有限公司 A kind of method for scheduling task and device
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
景晗;郑建生;陈鲤文;许朝威;: "基于MapReduce和HBase的海量网络数据处理", no. 34 *
杨超;徐如志;杨峰;: "基于消息队列的多进程数据处理系统", no. 13 *

Also Published As

Publication number Publication date
CN111324783B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN103729442A (en) Method for recording event logs and database engine
CN110753084B (en) Uplink data reading method, cache server and computer readable storage medium
US10274919B2 (en) Method, device and computer program product for programming a plurality of control units
CN110727539A (en) Method and system for processing exception in batch processing task and electronic equipment
CN111930489B (en) Task scheduling method, device, equipment and storage medium
CN106612330A (en) System and method supporting distributed multi-file importing
CN115934389A (en) System and method for error reporting and handling
CN108121774B (en) Data table backup method and terminal equipment
CN110019063B (en) Method for computing node data disaster recovery playback, terminal device and storage medium
US20150156340A1 (en) Information processing system, information processing apparatus, and program
CN113157491A (en) Data backup method and device, communication equipment and storage medium
CN112231403B (en) Consistency verification method, device, equipment and storage medium for data synchronization
CN113407376A (en) Data recovery method and device and electronic equipment
CN111049913B (en) Data file transmission method and device, storage medium and electronic equipment
CN109542860B (en) Service data management method based on HDFS and terminal equipment
CN111324783B (en) Data processing method and device
CN101751311B (en) Request processing device, request processing system, and access testing method
CN103559204A (en) Database operation request processing method, unit and system
US20060184729A1 (en) Device, method, and computer product for disk management
EP4208787A1 (en) Accelerated non-volatile memory device inspection and forensics
CN113342698A (en) Test environment scheduling method, computing device and storage medium
US10503722B2 (en) Log management apparatus and log management method
CN108460078B (en) Auxiliary function execution method and device, storage medium and terminal
CN116126587B (en) Unidirectional data transmission method, unidirectional data transmission device, electronic equipment, medium and program product
CN112269583B (en) Method for processing equipment operation abnormal file upgrade, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant