CN111324783B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN111324783B
CN111324783B CN202010191386.4A CN202010191386A CN111324783B CN 111324783 B CN111324783 B CN 111324783B CN 202010191386 A CN202010191386 A CN 202010191386A CN 111324783 B CN111324783 B CN 111324783B
Authority
CN
China
Prior art keywords
data
processing
queue
database
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010191386.4A
Other languages
Chinese (zh)
Other versions
CN111324783A (en
Inventor
倪艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongpu Software Co Ltd
Original Assignee
Dongpu Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongpu Software Co Ltd filed Critical Dongpu Software Co Ltd
Priority to CN202010191386.4A priority Critical patent/CN111324783B/en
Publication of CN111324783A publication Critical patent/CN111324783A/en
Application granted granted Critical
Publication of CN111324783B publication Critical patent/CN111324783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method and device, computer equipment and a computer readable storage medium, wherein the method comprises the following steps: importing data from outside and writing the data into an import queue; and executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into a database. The data processing method, the device, the equipment and the storage medium can solve the technical problem of low processing efficiency caused by mess of data in the existing data processing process.

Description

Data processing method and device
Technical Field
The present application relates to the field of data processing technology, and in particular, to a data processing method, a computer apparatus, a device, and a computer readable storage medium.
Background
In the data processing process, in order to achieve the task of the closest deadline, partial data is usually processed first to enter a database. This is prone to data clutter, omission, and data processing delays due to busyness, irregular manual operations, or manual carelessness.
Disclosure of Invention
The application aims to provide a data processing method and device, computer equipment and a computer readable storage medium, so as to solve the technical problem of low processing efficiency caused by data disorder in the existing data processing process.
The application adopts the following technical scheme:
a first aspect of the present application provides a data processing method, the method comprising:
importing data from outside and writing the data into an import queue;
and executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into a database.
By importing data from the outside and writing the data into the import queue and executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into the database, frequent importing of repeated data can be avoided, data disorder cannot occur in the data processing process, processing efficiency is improved, and human resources are saved.
Optionally, the data processing method further includes:
executing a reading script to write the data in the database into a processing queue in batches;
and executing a processing script to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which does not pass the check in the database.
In the data processing method, the data parts responsible for each script and each queue are different and work and cooperate separately, so that the script and the queue are independent of an individual, are integrated into a whole and are different, and the code operation is high-speed, stable and easy to maintain. Therefore, human resources can be saved, defects caused by irregular manual operation are avoided, and a large amount of time is saved.
Optionally, the data processing method further includes:
executing the sending script to push the data written into the sending queue to the appointed platform, and updating the processing state of the data which is not successfully sent in the database.
The execution of the transmission script transmits the data written into the transmission queue to the appointed platform, and updates the processing state of the data which is not successfully transmitted in the database, so that the method can adapt to the requirement of an actual use scene, and the data is transmitted to the appointed platform as a basic data source thereof.
Optionally, the method further comprises:
writing the data which is not checked and successfully transmitted in the database into an abnormal queue, executing an abnormal script to process the data in the abnormal queue and updating the processing state of the data.
By writing the data which is not checked and successfully transmitted in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and updating the processing state of the data, the data which is processed or is unsuccessfully transmitted due to the abnormal system can be processed, and the omission of data processing is avoided.
Optionally, the executing filtering script performs deduplication filtering on the data in the import queue, and then writes the data into a database, including:
inquiring the processing state of the data from a preset cache;
if the processing state of the data is the preset processing state, the data is not written into a database;
if the processing state of the data is not queried in the cache, recording the processing state of the data as to-be-processed, and writing the data into a database.
By the steps, the data with the processing state being the preset processing state is not repeatedly written into the database, so that not only can the data redundancy be avoided, but also the data processing is more efficient.
Optionally, the executing filtering script performs deduplication filtering on the data in the import queue, and writes the data into a database, and further includes:
and rewriting abnormal data which is abnormal during data processing and causes processing failure into the import queue.
The abnormal data which is abnormal and causes processing failure during data processing is rewritten into the import queue, so that the data which is filtered and failed due to system abnormality can be processed, and data processing omission is avoided.
Optionally, the processing state of the data written into the database after the deduplication filtration is to be processed, and the processing state of the failed data in the database is checked to be processing failure.
Because the processing state of the data written into the database after the duplicate removal filtering is to be processed, the processing state of the data which does not pass through the verification in the database is processing failure, and thus, corresponding processing can be carried out according to the processing state of the data.
Optionally, the executing the read script writes data in the database to the processing queue in batches, including:
reading the starting time from the first time file, executing a reading script to read the data to be processed in the processing state of the warehouse-in time within the first time after the starting time, and writing the data into a processing queue;
and when each reading is finished, writing the reading finishing time into the first time file as the starting time of the next reading.
The data to be processed in the database can be written into the processing queue in batches by reading the data to be processed in the processing state within the first time after the start time and writing the data into the processing queue, so that the data can be subjected to streamline processing, and the data processing efficiency is improved.
Optionally, the executing the read script writes data in the database to the processing queue in batches, including:
reading the starting time T1 from a second time file, executing a reading script to read the data which are to be processed and fail to be processed in the processing state within a second time length T1 before the starting time T1, and writing the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;
and when each reading is finished, writing the reading finishing time into the second time file as the starting time of the next reading.
The data which is to be processed and failed to be processed and is caused by omission and system abnormality can be processed by reading the data which is to be processed and failed to be processed and is in the processing queue in the processing state within the second time period T1 before the starting time T1 in the warehouse-in time.
A second aspect of the present application provides a data processing apparatus, the apparatus comprising:
the input module is used for importing data from the outside and writing the data into the import queue;
and the filtering module is used for executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into the database.
By importing data from the outside and writing the data into the import queue and executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into the database, frequent importing of repeated data can be avoided, data disorder cannot occur in the data processing process, processing efficiency is improved, and human resources are saved.
Optionally, the data processing apparatus further includes:
the reading module is used for executing a reading script to write the data in the database into the processing queue in batches;
and the processing module is used for executing processing scripts to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which is not checked in the database.
In the data processing device, the data parts responsible for each script and each queue are different, and work division cooperation is performed, so that the script and the queues are independent of an individual and are integrated together, and the scripts and the queues are different, and therefore the code operation is high-speed, stable and easy to maintain. Therefore, human resources can be saved, defects caused by irregular manual operation are avoided, and a large amount of time is saved.
Optionally, the data processing apparatus further includes:
and the sending module is used for executing the sending script to push the data written into the sending queue to the appointed platform and updating the processing state of the data which is not successfully sent in the database.
The execution of the transmission script transmits the data written into the transmission queue to the appointed platform, and updates the processing state of the data which is not successfully transmitted in the database, so that the method can adapt to the requirement of an actual use scene, and the data is transmitted to the appointed platform as a basic data source thereof.
Optionally, the data processing apparatus further includes:
the exception module is used for writing the data which is not checked and successfully transmitted in the database into the exception queue, executing the exception script to process the data in the exception queue and updating the processing state of the data.
By writing the data which is not checked and successfully transmitted in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and updating the processing state of the data, the data which is processed or is unsuccessfully transmitted due to the abnormal system can be processed, and the omission of data processing is avoided.
Optionally, the filtering module includes:
the inquiring unit is used for inquiring the processing state of the data from a preset cache;
a writing unit, configured to not write the data into a database if the processing state of the data is a predetermined processing state;
and the recording unit is used for recording the processing state of the data as to-be-processed and writing the data into a database if the processing state of the data is not inquired in the cache.
Because the filtering module is constructed to comprise the query unit, the writing unit and the recording unit, the data with the processing state being the preset processing state is not repeatedly written into the database, so that the data redundancy can be avoided, and the data processing is more efficient.
Optionally, the filtering module further comprises:
and the filtering exception unit is used for rewriting the exception data which is abnormal during the data processing and causes processing failure into the import queue.
The abnormal data which is abnormal and causes processing failure during data processing is rewritten into the import queue, so that the data which is filtered and failed due to system abnormality can be processed, and data processing omission is avoided.
Optionally, the processing state of the data written into the database after the deduplication filtration is to be processed, and the processing state of the failed data in the database is checked to be processing failure.
Because the processing state of the data written into the database after the duplicate removal filtering is to be processed, the processing state of the data which does not pass through the verification in the database is processing failure, and thus, corresponding processing can be carried out according to the processing state of the data.
Optionally, the reading module includes:
the first reading time unit is used for reading the starting time from the first time file, executing a reading script to read the data to be processed in the processing state of the warehouse-in time within the first time after the starting time and writing the data into the processing queue;
and the first writing time unit is used for writing the reading end time into the first time file as the starting time of the next reading when each reading is ended.
The data to be processed in the database can be written into the processing queue in batches by reading the data to be processed in the processing state within the first time after the start time and writing the data into the processing queue, so that the data can be subjected to streamline processing, and the data processing efficiency is improved.
Optionally, the reading module further includes:
the second reading time unit is used for reading the starting time T1 from the second time file, executing the reading script, reading the data which are to be processed and failed to be processed and are written into the processing queue when the processing state of the warehouse-in time within a second duration T1 before the starting time T1; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;
and the second writing time unit is used for writing the reading end time into the second time file as the starting time of the next reading when each reading is ended.
The data which is to be processed and failed to be processed and is caused by omission and system abnormality can be processed by reading the data which is to be processed and failed to be processed and is in the processing queue in the processing state within the second time period T1 before the starting time T1 in the warehouse-in time.
A third aspect of the application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when the program is executed.
A fourth aspect of the application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
Drawings
The application will be further described with reference to the drawings and examples.
FIG. 1 is a flow chart of a data processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of a data processing method according to another embodiment of the present application;
FIG. 3 is a flow chart of a data processing method according to another embodiment of the present application;
FIG. 4 is a flow chart of a data processing method according to another embodiment of the present application;
FIG. 5 is a schematic flow chart of step S200 in FIGS. 1-4;
FIG. 6 is another schematic flow chart of step S200 in FIGS. 1-4;
FIG. 7 is a flow chart of step S300 in FIGS. 2-4;
FIG. 8 is another flow chart of step S300 in FIGS. 2-4;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a data processing apparatus according to another embodiment of the present application;
FIG. 11 is a schematic diagram of a data processing apparatus according to another embodiment of the present application;
FIG. 12 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present application;
fig. 13 is a schematic view of the structure of the filter module 200 of fig. 9-12;
fig. 14 is another schematic diagram of the filter module 200 of fig. 9-12;
FIG. 15 is a schematic diagram of the structure of the read module 300 of FIGS. 10-12;
FIG. 16 is another schematic diagram of the read module 300 of FIGS. 10-12;
FIG. 17 is a flow chart of a data processing method according to yet another embodiment of the present application;
fig. 18 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to illustrate the technical scheme of the application, the following description is made by specific examples.
The embodiment of the application provides a data processing method. As shown in fig. 1, the data processing method includes:
and a recording step S100, wherein data is imported from the outside and written into the import queue. Specifically, the user configures fixed information required for data and selects the type and time of data to be generated, then imports data to be processed from the operation page, and then writes the data into the importation queue.
And a filtering step S200, wherein filtering scripts are executed to perform de-duplication filtering on the data in the import queue and then the data is written into a database. Specifically, the filtering script consumes the data of the import queue to carry out deduplication filtering and then warehousing, so that repeated data can be prevented from being imported repeatedly. Wherein, the data processing state can be recorded, and the data processing state is used as the basis for the data deduplication.
By importing data from the outside and writing the data into the import queue and executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into the database, frequent importing of repeated data can be avoided, data disorder cannot occur in the data processing process, processing efficiency is improved, and human resources are saved.
Optionally, as shown in fig. 2, the data processing method may further include:
and a reading step S300, wherein the reading script is executed to write the data in the database into the processing queue in batches. The data can be read according to the date and written into the processing queue, so that the data to be processed is read and written.
And a processing step S400, wherein a processing script is executed to check the data in the processing queue, the checked data is written into the sending queue, and the processing state of the data which is not checked in the database is updated. Specifically, the processing queue is consumed through the data processing script, and the data to be processed is checked in further detail. The verification is successful, and the queue to be sent is written; and (4) checking failure, recording error reasons and updating the processing state of the error reasons in the database. The updating can be processed in batches, abnormal data can be processed in batches, the reasons of the abnormal data are recorded, and finally the data are written into an abnormal queue.
In the data processing method, the data parts responsible for each script and each queue are different and work and cooperate separately, so that the script and the queue are independent of an individual, are integrated into a whole and are different, and the code operation is high-speed, stable and easy to maintain. Therefore, human resources can be saved, defects caused by irregular manual operation are avoided, and a large amount of time is saved.
Optionally, as shown in fig. 3, the data processing method may further include:
and a sending step S500, wherein the sending script is executed to push the data written into the sending queue to the appointed platform, and the processing state of the data which is not successfully sent in the database is updated. In particular, a data transmission script may be executed to consume a transmission queue to transmit data to a designated platform or other desired platform/system.
The execution of the transmission script transmits the data written into the transmission queue to the appointed platform, and updates the processing state of the data which is not successfully transmitted in the database, so that the method can adapt to the requirement of an actual use scene, and the data is transmitted to the appointed platform as a basic data source thereof.
Further, as shown in fig. 4, the data processing method may further include:
and an exception step S600, namely writing the data which is not checked to pass in the database and the data which is not successfully transmitted into an exception queue, executing an exception script to process the data in the exception queue and updating the processing state of the data.
By writing the data which is not checked and successfully transmitted in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and updating the processing state of the data, the data which is processed or is unsuccessfully transmitted due to the abnormal system can be processed, and the omission of data processing is avoided.
The filtering script, the reading script, the processing script, the sending script and the abnormal script are all uniformly scheduled by the appointed script. In particular, shell scripts may be used in unison to mobilize the scripts.
As the specified scripts are uniformly used for scheduling all scripts, the method is convenient to use and high in controllability, can control the increase and decrease of the process data volume, and can uniformly regulate and increase and decrease. The reading, processing, updating and sending speed of the data are controlled by adjusting the number of processes, so that the data are not piled up, the data processing of each link is independent, and even if one link is broken, the subsequent script can continue to operate to process the existing data in the queue.
As shown in fig. 5, the filtering the data in the import queue by filtering the script, and writing the data into the database may include:
and a query step S201, wherein the processing state of the data is queried from a preset cache.
And a writing step S202, wherein if the processing state of the data is a preset processing state, the data is not written into a database.
And a recording step S203, wherein if the processing state of the data is not queried in the cache, the processing state of the data is recorded as to-be-processed, and the data is written into a database.
By the steps, the data with the processing state being the preset processing state can be prevented from being repeatedly written into the database, so that not only can the data redundancy be avoided, but also the data processing is more efficient.
Further, as shown in fig. 6, the filtering script performs deduplication filtering on the data in the import queue, and then writes the data into a database, and may further include:
and filtering the exception, namely S204, rewriting the exception data which is abnormal during the data processing and causes processing failure into the import queue.
The abnormal data which is abnormal and causes processing failure during data processing is rewritten into the import queue, so that the data which is filtered and failed due to system abnormality can be processed, and data processing omission is avoided.
As an alternative implementation manner, the processing state of the data written into the database after the deduplication filtration is to be processed, and the processing state of the data which does not pass through the verification in the database is to be processing failure.
Because the processing state of the data written into the database after the duplicate removal filtering is to be processed, the processing state of the data which does not pass through the verification in the database is processing failure, and thus, corresponding processing can be carried out according to the processing state of the data.
Wherein, as shown in fig. 7, the batch writing of the data in the database into the processing queue by the read script may include:
a first reading time step S301, reading a starting time from a first time file, executing a reading script to read data to be processed from a processing state of which the warehousing time is within a first time period after the starting time, and writing the data into a processing queue; . The first time period may be 3 minutes or 5 minutes or 10 minutes, for example only. It should be noted that the first duration may be determined according to actual needs, and is not limited to the above example.
And a first writing time step S302, when each reading is finished, writing the reading finishing time into the first time file as the starting time of the next reading.
The data to be processed in the database can be written into the processing queue in batches by reading the data to be processed in the processing state within the first time after the start time and writing the data into the processing queue, so that the data can be subjected to streamline processing, and the data processing efficiency is improved.
Further, as shown in fig. 8, the batch writing of the data in the database into the processing queue by the read script may further include:
a second read time step S303, reading a start time T1 from a second time file, executing a read script, reading data whose processing status is to be processed and failed to be processed and whose processing status is within a second time period T1 before the start time T1, and writing the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is the third time length. Therein, for example only, the second time period t1 may be half an hour or an hour, and the third time period t2 may be 1 minute or 2 minutes or 3 minutes. It should be noted that the second time period t1 and the third time period t2 may be determined according to actual needs, and are not limited to the above examples.
And a second writing time step S304, when each reading is finished, writing the reading finishing time into the second time file as the starting time of the next reading.
The data which is to be processed and failed to be processed and is caused by omission and system abnormality can be processed by reading the data which is to be processed and failed to be processed and is in the processing queue in the processing state within the second time period T1 before the starting time T1 in the warehouse-in time.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
The embodiment of the application also provides a data processing device. As shown in fig. 9, the data processing apparatus may include:
an input module 100 for externally importing data and writing the data into an import queue;
and the filtering module 200 is used for executing filtering scripts to perform de-duplication filtering on the data in the import queue and writing the data into the database.
Optionally, as shown in fig. 10, the data processing apparatus may further include:
the reading module 300 is configured to execute a reading script to write data in the database into the processing queue in batches.
And the processing module 400 is used for executing processing scripts to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which is not checked in the database.
In the data processing device, the data parts responsible for each script and each queue are different, and work division cooperation is performed, so that the script and the queues are independent of an individual and are integrated together, and the scripts and the queues are different, and therefore the code operation is high-speed, stable and easy to maintain. Therefore, human resources can be saved, defects caused by irregular manual operation are avoided, and a large amount of time is saved.
Optionally, as shown in fig. 11, the data processing apparatus may further include:
and the sending module 500 is used for executing a sending script to push the data written into the sending queue to the appointed platform and updating the processing state of the data which is not successfully sent in the database.
The execution of the transmission script transmits the data written into the transmission queue to the appointed platform, and updates the processing state of the data which is not successfully transmitted in the database, so that the method can adapt to the requirement of an actual use scene, and the data is transmitted to the appointed platform as a basic data source thereof.
Further, as shown in fig. 12, the data processing apparatus may further include:
the exception module 600 is configured to write data that fails to pass the verification in the database and data that is unsuccessful to send into an exception queue, execute an exception script to process the data in the exception queue, and update a processing state of the data.
By writing the data which is not checked and successfully transmitted in the database into the abnormal queue, executing the abnormal script to process the data in the abnormal queue and updating the processing state of the data, the data which is processed or is unsuccessfully transmitted due to the abnormal system can be processed, and the omission of data processing is avoided.
The filtering script, the reading script, the processing script, the sending script and the abnormal script are all uniformly scheduled by the appointed script.
As the specified scripts are uniformly used for scheduling all scripts, the method is convenient to use and high in controllability, can control the increase and decrease of the process data volume, and can uniformly regulate and increase and decrease. The reading, processing, updating and sending speed of the data are controlled by adjusting the number of processes, so that the data are not piled up, the data processing of each link is independent, and even if one link is broken, the subsequent script can continue to operate to process the existing data in the queue.
As shown in fig. 13, the filtering module 200 may include:
a query unit 201, configured to query a processing state of the data from a preset cache;
a writing unit 202, configured to not write the data into the database if the processing state of the data is a predetermined processing state;
and the recording unit 203 is configured to record the processing state of the data as to-be-processed if the processing state of the data is not queried in the cache, and write the data into a database.
Because the filtering module is constructed to comprise the query unit, the writing unit and the recording unit, the data with the processing state being the preset processing state is not repeatedly written into the database, so that the data redundancy can be avoided, and the data processing is more efficient.
Further, as shown in fig. 14, the filtering module 200 may further include:
the filtering exception unit 204 is configured to rewrite, into the import queue, exception data that has failed in processing due to an exception occurring during data processing.
The abnormal data which is abnormal and causes processing failure during data processing is rewritten into the import queue, so that the data which is filtered and failed due to system abnormality can be processed, and data processing omission is avoided.
As an alternative implementation manner, the processing state of the data written into the database after the deduplication filtration is to be processed, and the processing state of the data which does not pass through the verification in the database is to be processing failure.
Because the processing state of the data written into the database after the duplicate removal filtering is to be processed, the processing state of the data which does not pass through the verification in the database is processing failure, and thus, corresponding processing can be carried out according to the processing state of the data.
As shown in fig. 15, the reading module 300 may further include:
the first reading time unit 301 is configured to read a start time from a first time file, execute a reading script to read data to be processed from a processing state of the input time within a first time period after the start time, and write the data to a processing queue;
the first writing time unit 302 is configured to write, when each reading is finished, a reading finishing time into the first time file as a starting time of a next reading.
The data to be processed in the database can be written into the processing queue in batches by reading the data to be processed in the processing state within the first time after the start time and writing the data into the processing queue, so that the data can be subjected to streamline processing, and the data processing efficiency is improved.
Further, as shown in fig. 16, the reading module 300 may further include:
the second reading time unit 303 is configured to read a start time T1 from a second time file, execute a reading script, read data whose processing status is to be processed and failed to be processed and whose processing status is within a second duration T1 before the start time T1, and write the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;
a second writing time unit 304, configured to write, when each reading ends, a reading end time into the second time file as a start time of a next reading.
The data which is to be processed and failed to be processed and is caused by omission and system abnormality can be processed by reading the data which is to be processed and failed to be processed and is in the processing queue in the processing state within the second time period T1 before the starting time T1 in the warehouse-in time.
The function implementation of each module in the data processing device corresponds to each step in the data processing method embodiment, and the function and implementation process of each module are not described in detail herein.
The embodiment of the application also provides a data processing method, as shown in fig. 17, and the specific implementation manner is as follows:
first, data is imported from a user page, and after preliminary determination is made in the background, the data is written into an import queue scan_entry_new_import.
And secondly, data are subjected to duplicate removal and warehousing. And starting a plurality of process consumption import queues through the shell scheduling script. Specifically, the order status is queried in the cache pika according to the order number shim_id and the order type scan_type. If the order state is any one of the three of "processing success", "waiting to be processed" and "processing", the import is not allowed, and the repeated import is avoided. If any state information does not exist, the initial import is judged, and the corresponding shift_id-scan_type information state is added as 'to be processed' and written into the cache pika. Based on this, the filtered information is written into the t_scan_entitr_tbl table. Due to the specificity of the queue, an exception may occur in the processing procedure, and the exception data is rewritten into the scan_entry_new_import queue to wait for the next reprocessing.
Again, the data is read. The start time in time file a is read by the shell enabling a single process. The data is read according to the value of judge tm (i.e. the time when the initial value in the library is the data in the library) every time the data is read for 5 minutes. Data is written into the processing queue test_scan_data_real. After the read data is processed, the time file is written in by taking the time of the fetch unit as the starting time of the next read.
Further, a single repeated read script can also be enabled by Shell to read the start time in time file b. Reading data before half an hour each time, and terminating the process if the current time-half an hour is less than or equal to the initial time plus 1 minute; otherwise, the non-processing successful order in the reading time period is written into the processing queue test_scan_data_deal. After the read data is processed, the time file is written in by taking the time of the fetch unit as the starting time of the next read. In this way, data that is missing and system anomalies that result in processing failures but that has not been reintroduced can be processed.
Again, the data is processed. Multiple processes are enabled through Shell to consume the test_scan_data_deal queue. And carrying out business detailed verification and batch processing on the data. For data which is not passed through the verification, recording the reason for the failed data, updating the data state in the database in batches, adding the error reason, changing the value of jude_tm to the current time and changing the state in the cache pika. If the check passes, the transmit queue test_scan_data is written to wait for transmit consumption.
Finally, the data is transmitted. Enabling multiple processes through Shell consumes the test_scan_data queue. The data may be processed in batches, specifically, assembled according to a joint debugging format and sent to a designated platform. If the transmission fails, recording an error reason, changing the state of the data in the database, adding the error reason, changing the judge_tm to the current time, and changing the state of the data in the cache pika; if the transmission is successful, the data state in the database is changed, and the data state in the cache pika is changed to be successful.
The following processes may be performed for the abnormal data in the above-described data processing and data transmitting steps: if the updated database has abnormal errors, the data can be recorded into an abnormal queue testScanDataError to perform unified error updating processing; the exception queue testScanDataError is consumed by Shell enabling multiple processes, uniformly updating database errors, changing processing state, changing judge_tm to current time, and changing data state in cache pika.
Taking the unit 4 and 9 as examples, the unit is internally sent to another platform of the company as a basic data source.
First, the imported data 4 and 9 specifically include the type of order and other basic information of the unit. The order types may include type 1 and type 2, respectively "a type" and "B type", specifically, 4 x 9-1 and 4 x 9-2 may be used. After import, the data enters the import queue scan_entry_new_import.
Second, the import queue scan_entry_new_import is consumed. Taking 4 x 9-2 as key, looking up the state in the buffer pika, if no state information is looked up, the single sign is indicated as initial import, and the key value added with 4 x 9-2 is 3, which indicates "waiting processing". Wherein if a query finds a value of 4-9-1 of 1 (indicating "processing is successful"), then the processing of type 4-9 of 1 is skipped and type 2 data is written into the database.
And starting a reading script, reading data with the type of 4 and 9 being 2 in a certain time range, and writing the data into a test_scan_data_data queue.
Again, the processing script starts and reads the 4 x 9 data in the test_scan_data_real queue. Checking the imported basic information, if the information is found to be wrong, updating the data state of 4 and 9 in the database to be 'processing failure', marking 'xx information error', and changing the value of 4 and 9-2 in the buffer pika to be 2 (representing 'processing failure').
Wherein, the basic information of the error caused by the user change is reintroduced, and the steps are re-executed. And checking other basic data, and if no error exists, writing the data into a transmission queue test_scan_data.
Finally, a send script is enabled. And splicing and transmitting the data in the transmission queue test_scan_data to another platform of the company according to the correct format. If the transmission fails, repeating the steps according to the failure reason; if the transmission is successful, then other data is processed.
The data processing process is seemingly complex, and more scripts are involved, but the data processing of each link is independently carried out, so that even if a problem occurs in one link, the normal operation processing of other scripts on the existing data is not affected. The data processing of each link is single and efficient, the problem investigation is convenient, and the division of labor is definite. Even if applied to processing millions of data, is not limited at all. In particular, if the logic verification in the processing script is not very complex, the throughput per hour will be more considerable, not only efficient and stable, but also very fast.
Embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the methods of the various embodiments described above.
Fig. 18 is a schematic diagram of a computer device according to an embodiment of the present application. The computer device 6 comprises: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60. The steps of the various data processing method embodiments described above are implemented when the processor 60 executes the computer program 62. Alternatively, the processor 60, when executing the computer program 62, performs the functions of the modules/units of the various data processing apparatus embodiments described above.
Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used to describe the execution of the computer program 62 in the data processing apparatus/computer device 6.
The computer device 6 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The terminal 6 device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the computer device 6 and is not limiting of the computer device 6, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the computer device 6 may also include input and output devices, network access devices, buses, etc.
The processor 60 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. The memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Further, the memory 61 may also include both an internal storage unit and an external storage device of the computer device 6. The memory 61 is used for storing the computer program and other programs and data required by the computer device. The memory 61 may also be used for temporarily storing data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A method of data processing, the method comprising:
importing data from outside and writing the data into an import queue;
executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into a database;
the executing filtering script performs de-duplication filtering on the data in the import queue and writes the data into a database, and the executing filtering script comprises:
inquiring the processing state of the data from a preset cache;
if the processing state of the data is the preset processing state, the data is not written into a database;
if the processing state of the data is not queried in the cache, recording the processing state of the data as to-be-processed, and writing the data into a database;
and rewriting abnormal data which is abnormal during data processing and causes processing failure into the import queue.
2. The data processing method according to claim 1, characterized by further comprising:
executing a reading script to write the data in the database into a processing queue in batches;
and executing a processing script to check the data in the processing queue, writing the checked data into the sending queue, and updating the processing state of the data which does not pass the check in the database.
3. The data processing method according to claim 2, characterized by further comprising:
executing the sending script to push the data written into the sending queue to the appointed platform, and updating the processing state of the data which is not successfully sent in the database.
4. A data processing method according to claim 3, characterized in that the method further comprises:
writing the data which is not checked and successfully transmitted in the database into an abnormal queue, executing an abnormal script to process the data in the abnormal queue and updating the processing state of the data.
5. A data processing method according to claim 2 or 3, wherein the processing status of the data written into the database after the deduplication filtering is to be processed, and the processing status of the data that is checked to be failed in the database is processing failure.
6. The data processing method of claim 5, wherein executing the read script writes data in the database to the processing queue in batches, comprising:
reading the starting time from the first time file, executing a reading script to read the data to be processed in the processing state of the warehouse-in time within the first time after the starting time, and writing the data into a processing queue;
and when each reading is finished, writing the reading finishing time into the first time file as the starting time of the next reading.
7. The data processing method of claim 5, wherein executing the read script writes data in the database to the processing queue in batches, comprising:
reading the starting time T1 from a second time file, executing a reading script to read the data which are to be processed and fail to be processed in the processing state within a second time length T1 before the starting time T1, and writing the data into a processing queue; if (T2-T1) is less than or equal to (T1+t2), ending executing the read script, wherein T2 is the current time, and T2 is a third duration;
and when each reading is finished, writing the reading finishing time into the second time file as the starting time of the next reading.
8. A data processing apparatus, the apparatus comprising:
the input module is used for importing data from the outside and writing the data into the import queue;
the filtering module is used for executing filtering scripts to perform de-duplication filtering on the data in the import queue and then writing the data into a database;
the filter module includes:
the inquiring unit is used for inquiring the processing state of the data from a preset cache;
a writing unit, configured to not write the data into a database if the processing state of the data is a predetermined processing state;
a recording unit, configured to record the processing state of the data as to-be-processed if the processing state of the data is not queried in the cache, and write the data into a database;
and the filtering exception unit is used for rewriting the exception data which is abnormal during the data processing and causes processing failure into the import queue.
CN202010191386.4A 2020-03-18 2020-03-18 Data processing method and device Active CN111324783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010191386.4A CN111324783B (en) 2020-03-18 2020-03-18 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010191386.4A CN111324783B (en) 2020-03-18 2020-03-18 Data processing method and device

Publications (2)

Publication Number Publication Date
CN111324783A CN111324783A (en) 2020-06-23
CN111324783B true CN111324783B (en) 2023-08-29

Family

ID=71169918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010191386.4A Active CN111324783B (en) 2020-03-18 2020-03-18 Data processing method and device

Country Status (1)

Country Link
CN (1) CN111324783B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302569A (en) * 2017-06-08 2017-10-27 武汉火凤凰云计算服务股份有限公司 A kind of security monitoring Data acquisition and storage method of facing cloud platform
CN110019873A (en) * 2017-12-25 2019-07-16 深圳市优必选科技有限公司 Human face data processing method, device and equipment
WO2019169693A1 (en) * 2018-03-08 2019-09-12 平安科技(深圳)有限公司 Method for quickly importing data in batches, and electronic apparatus and computer-readable storage medium
CN110245011A (en) * 2018-03-08 2019-09-17 北京京东尚科信息技术有限公司 A kind of method for scheduling task and device
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152642B2 (en) * 2012-12-21 2015-10-06 Zetta, Inc. Systems and methods for on-demand data storage
US8977596B2 (en) * 2012-12-21 2015-03-10 Zetta Inc. Back up using locally distributed change detection
US20180211046A1 (en) * 2017-01-26 2018-07-26 Intel Corporation Analysis and control of code flow and data flow

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302569A (en) * 2017-06-08 2017-10-27 武汉火凤凰云计算服务股份有限公司 A kind of security monitoring Data acquisition and storage method of facing cloud platform
CN110019873A (en) * 2017-12-25 2019-07-16 深圳市优必选科技有限公司 Human face data processing method, device and equipment
WO2019169693A1 (en) * 2018-03-08 2019-09-12 平安科技(深圳)有限公司 Method for quickly importing data in batches, and electronic apparatus and computer-readable storage medium
CN110245011A (en) * 2018-03-08 2019-09-17 北京京东尚科信息技术有限公司 A kind of method for scheduling task and device
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨超 ; 徐如志 ; 杨峰 ; .基于消息队列的多进程数据处理系统.计算机工程与设计.2010,(13),全文. *

Also Published As

Publication number Publication date
CN111324783A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN103729442A (en) Method for recording event logs and database engine
CN110750592B (en) Data synchronization method, device and terminal equipment
CN110727539A (en) Method and system for processing exception in batch processing task and electronic equipment
RU2653254C1 (en) Method, node and system for managing data for database cluster
US10274919B2 (en) Method, device and computer program product for programming a plurality of control units
CN110941502A (en) Message processing method, device, storage medium and equipment
CN110781231A (en) Batch import method, device, equipment and storage medium based on database
CN108733671B (en) Method and device for archiving data history
CN109634989B (en) HIVE task execution engine selection method and system
CN108121774B (en) Data table backup method and terminal equipment
CN110019063B (en) Method for computing node data disaster recovery playback, terminal device and storage medium
US20110072153A1 (en) Apparatus, system, and method for device level enablement of a communications protocol
CN111324783B (en) Data processing method and device
CN113157491A (en) Data backup method and device, communication equipment and storage medium
JP2012089049A (en) Computer system and server
CN106971293A (en) A kind of business event based on activiti and flow separation method and system
CN101751311B (en) Request processing device, request processing system, and access testing method
CN111049913A (en) Data file transmission method and device, storage medium and electronic equipment
US20060184729A1 (en) Device, method, and computer product for disk management
US20220121390A1 (en) Accelerated non-volatile memory device inspection and forensics
CN113392085A (en) Distributed file batch processing method and platform
US10503722B2 (en) Log management apparatus and log management method
KR20130032151A (en) Flash memory device capable of verifying reliability using bypass path, and system and method of verifying reliability using that device
US20200349304A1 (en) Method, apparatus, device, and medium for implementing simulator
WO2019134238A1 (en) Method for executing auxiliary function, device, storage medium, and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant