CN108733845B - Data processing method and device, computer equipment and storage medium - Google Patents

Data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN108733845B
CN108733845B CN201810603070.4A CN201810603070A CN108733845B CN 108733845 B CN108733845 B CN 108733845B CN 201810603070 A CN201810603070 A CN 201810603070A CN 108733845 B CN108733845 B CN 108733845B
Authority
CN
China
Prior art keywords
data
full link
processing
link record
record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810603070.4A
Other languages
Chinese (zh)
Other versions
CN108733845A (en
Inventor
王炼
吕远方
曾庚卓
邱佳文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810603070.4A priority Critical patent/CN108733845B/en
Publication of CN108733845A publication Critical patent/CN108733845A/en
Application granted granted Critical
Publication of CN108733845B publication Critical patent/CN108733845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method, a data processing device, computer equipment and a storage medium, and belongs to the technical field of internet. The method comprises the following steps: when each piece of data to be processed is selected, creating a full link record of the data, wherein the full link record is used for storing the identification of the full link record and the processing result of the data in each processing step; in the process of processing the data, every time a processing step is executed, storing the processing result of the processing step into the full link record; and when a query instruction is received, displaying a full link record corresponding to the data, wherein the query instruction carries an identifier of the full link record. In the process of processing the data, the processing result of the data in each processing step is stored to form the full link record of the data, so that the problem of incomplete data record is avoided, centralized query is facilitated, and the problem can be analyzed and positioned by querying the full link record of the data subsequently.

Description

Data processing method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
With the development of internet technology, more and more network data including network data of each web page in each website are provided, and how to process the network data is very important.
At present, in the related art, in the process of processing network data, a computer device generally observes a processing result of the network data according to own experience by a developer in a script logging manner, selects several most critical processing steps in the processing process, logs the processing steps selected by the developer by the computer device, and stores the logs. Subsequently, in the process of using the network data, if an anomaly is encountered, the problem is analyzed and located through the logs.
In the process of implementing the invention, the inventor finds that the related art has at least the following problems:
the processing process of the network data is recorded by adopting a logging mode, the quality of the log is completely dependent on the experience of a developer, and the problem of unrefined log is easy to occur, for example, some steps are not logged. In addition, distributed data processing, resulting in distributed storage of logs on multiple machines, fails to centralize queries.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device, computer equipment and a storage medium, which can solve the problems that the logs of the related art are not detailed and cannot be intensively queried. The technical scheme is as follows:
in one aspect, a data processing method is provided, and the method includes:
when each piece of data to be processed is selected, creating a full link record of the data, wherein the full link record is used for storing the identification of the full link record and the processing result of the data in each processing step;
in the process of processing the data, every time a processing step is executed, storing the processing result of the processing step into the full link record;
and when a query instruction is received, displaying a full link record corresponding to the data, wherein the query instruction carries an identifier of the full link record.
In one aspect, a data processing apparatus is provided, the apparatus comprising:
the system comprises a creating module, a processing module and a processing module, wherein the creating module is used for creating a full link record of data when each piece of data to be processed is selected, and the full link record is used for storing the identification of the full link record and the processing result of the data in each processing step;
the storage module is used for storing the processing result of each processing step into the full link record when executing each processing step in the process of processing the data;
and the display module is used for displaying the full link record corresponding to the data when receiving a query instruction, wherein the query instruction carries the identification of the full link record.
In one aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction, at least one program, set of codes, or set of instructions is stored in the memory, and the at least one instruction, at least one program, set of codes, or set of instructions is loaded and executed by the processor to implement the operations performed by the above data processing method.
In one aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, code set, or set of instructions is stored, which is loaded and executed by a processor to implement operations performed by the data processing method as described above.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
in the process of processing data, the processing result of the data in each processing step is saved, and the full link record of the data is formed, so that the problem of incomplete data record is avoided. In addition, the processing result of each processing step is stored in the full link record, so that centralized query is facilitated, and the problem can be analyzed and positioned by querying the full link record of the data subsequently.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data processing framework provided by an embodiment of the invention;
FIG. 3 is a diagram of a delete record according to an embodiment of the present invention;
FIG. 4 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data display provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a data processing job provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of a search interface provided by an embodiment of the present invention;
FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a computer device 1000 according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Before explaining the embodiments of the present disclosure in detail, some key terms related to the embodiments of the present disclosure are explained.
Full link: full link refers to all steps of data processing that have processing results, including intermediate processing results of intermediate steps and final processing results of the last step. By recording the results of the processing, each step of the data processing can be tracked.
Framework (framework): a frame refers to a support structure designed to address one of the openness concerns with certain constraints. In this structure, more components can be expanded and inserted according to specific problems, so that a complete solution to the problems can be constructed more quickly and conveniently.
Plug-in (Plug-in): a plug-in is a program written in an application program interface that conforms to a specification. It can only run under the system platform (possibly supporting multiple platforms simultaneously) specified by the program, and cannot run independently from the specified platform.
key-value database: the key-value database is a database that stores data in key-value pairs, like maps in java. The entire database can be understood as a large map, with each key corresponding to a unique value.
Job (job), a complete running flow including input-processing-output, and job configuration (such as retry times, exception handling mode, etc.), each of which has a unique identifier (job _ id).
Run _ count, a job may be run multiple times, and run _ count records that the job is run the number of times. The framework holds the per-run history of each job, and the binary (job _ id, run _ count) can find the per-run history of one job.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention. The method is performed by a computer device, see fig. 1, the method comprising:
101. a data processing job is created that instructs to process different batches of data in different rounds, the different batches of data having the same processing requirements.
The same processing requirement refers to that the same processing steps are executed on the data to achieve the same processing effect. For example, for web page data, the processing requirement may be a web page cleansing requirement or a web page validity check requirement for a certain website.
In embodiments of the present invention, a computer device may create a data processing job for the same processing requirement. For example, the computer device may create a job for cleaning or validity checking the web pages of a certain website. A data processing job may be run multiple times and thus have different rounds where the data processing job is used to instruct processing of different batches of data, for example, a first run, i.e. round 1, may instruct processing of a first batch of data and a second run, i.e. round 2, may instruct processing of a second batch of data.
The inventor realizes that data processing generally carries out operations of many steps on data, and when a processing flow of the data is developed, a processing result of each step needs to be seen and compared with an expected result to verify the accuracy of each step. After the development is completed, in actual operation, if the final processing result of the data is found to be abnormal, the processing result of each step needs to be checked to check the error step. Therefore, in the technical scheme provided by the embodiment of the invention, in the process of processing data, the computer equipment can record the processing result of each step, so that the full link record of the data is formed.
Referring to fig. 2, a schematic diagram of a data processing framework is provided, as shown in fig. 2, the data processing framework mainly includes three parts, input, processing and output. The framework is responsible for concatenating all the services and providing basic services, including the preservation of the processing results. In the "processing" part of the framework, the service plug-in implements the processing function of the data, the "processing" part is a service logic layer, and the service logic layer is changed and is responsible for executing the actual processing work according to different use scenes, for example, a webpage cleaning job, a plug-in using webpage cleaning, and a webpage validity checking job, a plug-in using webpage query.
102. When a corresponding batch of data is processed at the beginning of each turn, a job history record of the data processing job in each turn is created, and the job history record is used for saving at least one of the identification, the turn, the starting time, the ending time, the processing data amount and the job configuration of the data processing job.
The Identification of the data processing job may be a job ID (Identification), such as a job number; the turn may also be a number indicating the number of times the data processing job is run; the start time refers to the time at which the batch of data begins to be processed; the end time refers to the time when the batch of data is processed; the processing data amount can include the total amount of processed data, and can also include the data amount of successful processing, the data amount of failed processing and the data amount of skipped processing; the job configuration may include a job name and a job run period, for example, the job name may be a web page cleaning job, and the job run period may be one day, that is, once a day. Of course, other information, such as a job script, a running state, and the like, may also be stored in the job history record, which is not limited in this embodiment of the present invention.
In an embodiment of the present invention, when data processing is required, the computer device may start a data processing job, create a job history record of the data processing job in the current turn, and save an identifier of the data processing job, the current turn, a start time, and a job configuration in the job history record.
In a possible implementation manner, the computer device may store a job history record of the data processing job in the mysql database, and since the job history record only includes the original statistical information and occupies a small space, the computer device may permanently store the job history record, and of course, the computer device may also periodically clean the job history record, which is not limited in the embodiment of the present invention.
After the computer device creates the operation history record, processing a batch of data to be processed in the current round. Taking the batch of data as the web page to be cleaned as an example, the process of acquiring the batch of data may include: and acquiring a batch of crawled webpages from a database of the crawling server. For example, a certain website may release new articles every day, a crawling server may crawl dozens of new original web pages from the website every day and store the new dozens of new original web pages in a database of the crawling server, and a computer device may obtain dozens of original web pages crawled from the database of the crawling server in the current turn, and process the dozens of original web pages as a batch of data.
103. When any data in each batch of data is selected as the data to be processed, a full link record of the data is created, and the full link record is used for storing the identification of the full link record of the data and the processing result of the data in each processing step.
Wherein the identification of the full link record of data may be a globally unique ID for identifying the data processed in the current round, the globally unique ID may be automatically generated by the computer device. For example, the globally unique ID may be in the form of: cftflow _ prod _73.223.00000000000000000180, where cftflow is a fixed prefix, prod represents the operational data of the formal environment, 73 is the identification of the data processing job, 233 is the round, 00000000000000000188 is the number of this data in the result of this run.
In the embodiment of the present invention, for each round of the data processing job, when the computer device processes a batch of data corresponding to the round, a full link record of the data is created every time any data in the batch of data is selected. In addition to the identification of the full link record and the processing result of each processing step, the full link record may also store the original information (input information) of the data, including the URL (Uniform Resource Locator) and source code of the data. For example, the computer device may save the original information of the data in a full link record of the data when creating the full link record. In addition, the start processing time and the processing end time of the data may also be saved in the full link record, and the information recorded in the full link record is not specifically limited in the embodiment of the present invention.
In one possible implementation, the computer device may store the full link record of the data in a key-value database (e.g., leveldb), for example, the computer device may use the identification of the full link record as a key and the storage address of the full link record as a value, and by using the key, find the storage address of the corresponding full link record in the key-value database, and thus find the corresponding full link record.
Taking the batch of data as a web page as an example, when a batch of web pages are processed, the computer device selects one web page from a plurality of web pages each time for processing, and creates a full link record for recording the processing result of the web page in the processing process after selecting one web page each time until the batch of web pages is completely processed.
It should be noted that, the above steps 101 to 102 are optional steps, and the data processing job may not be created every time, but only need to be created once for one processing requirement. Step 103 is one possible implementation of creating a full link record of each piece of data to be processed when the piece of data is selected. When any data in each batch of data is processed, the full link record of the data is created, so that the computer equipment can acquire the processing information of the data through the full link record, and the subsequent analysis and problem positioning are facilitated.
104. In the process of processing the data, each time a processing step is executed, the processing result of the processing step is saved in the full link record of the data.
In the embodiment of the present invention, for each data to be processed, after creating the full link record of the data, the computer device may execute a plurality of processing steps on the data, and after obtaining the processing result of the step after executing one processing step, the processing result of the step is saved in the full link record of the data.
Taking the batch of data as a batch of web pages to be cleaned as an example, the web page cleaning process generally comprises three steps, namely an extraction step, a cleaning step and a release step. Wherein, the extracting step is used for extracting the title, the author and the publication time of the webpage; the cleaning step is used for removing information except text in the webpage, such as webpage frames, comments, advertisements and the like; and the issuing step is used for issuing the webpage text obtained in the cleaning step.
Correspondingly, in the process of processing the currently selected webpage, every time a step is executed by the computer equipment, the processing result of the step is stored in the full link record of the webpage. For example, after the extracting step is performed, the computer device saves the title, author, and publication time of the web page as the processing result of the extracting step into the full link record; after the web page cleaning step is executed, the text of the web page is taken as the processing result of the web page cleaning step and is stored in the full link record; after the issuing step is executed, the issuing result is saved in the full link record as the processing result of the issuing step, wherein the issuing result may include success of issuing and failure of issuing.
It should be noted that, after processing a batch of data of the current round, the computer device may save the end time and the processing data amount in the job history created in step 102.
105. And when a query instruction is received, displaying the full link record corresponding to the data, wherein the query instruction carries the identification of the full link record of the data.
In the embodiment of the invention, the computer equipment can provide the function of inquiring the full link record of the data. The query instruction can be triggered by user operation, for example, the computer device can provide a search interface, which is convenient for analyzing and positioning problems in case of abnormality, after data processing is completed, if abnormality is encountered in a subsequent use process, the user can input an identifier of a full link record of a certain data in an input box of the search interface, and click an query button of the search interface to trigger the query instruction of the full link record of the data.
In one possible implementation manner, when the query instruction is received, displaying the full link record corresponding to the data includes: when receiving the query instruction, the computer device may obtain the identifier of the full link record from the query instruction; when the full link record of the data is inquired according to the identification of the full link record, the computer equipment displays the full link record of the data; when the full link record of the data is not inquired according to the identification of the full link record, the computer equipment displays the full link record of the data except the data in the same batch of processed data. For example, a computer device may provide a Web interface that displays a full link record of data, thereby facilitating a user to visually see the results of each step.
This approach takes into account the fact that a full link record of data may not be queried, e.g., the full link record of the data may be deleted. In one possible implementation, for each round of the data processing job, the computer device may delete a full link record of partial data from full link records of a batch of data for the each round every time a preset period is reached.
Considering that when a large amount of data is processed by a computer device, a large amount of full link records are stored, each full link record is complete, and therefore, the full link records storing a large amount of data occupy a large amount of storage space. The whole link records which are far away from the current time are less likely to be checked, and because the information recorded in the whole link records is relatively comprehensive, the occupied space of the whole link records is far larger than the original statistical information in the operation history records, and the requirement on the storage capacity is high. Therefore, the computer equipment can automatically clear the full link records of partial data by adopting a preset period as a data clearing period. By automatically cleaning the full link records of partial data in each preset period, the storage space can be saved.
In one possible implementation, the computer device may delete every other full link record of data from the full link records of a batch of data corresponding to each round. The method is to automatically clean the full link record of partial data at regular time by referring to the rule of physical half-life. Taking the half-life of 1 week as an example, every week the computer device deletes a full link record for half of the data. The full link records of the data are uniformly deleted in a mode of deleting every other link record, so that the storage cost is reduced, and the uniform distribution of the residual data can be ensured.
Alternatively, when only a preset number of full link records remain in the process of deleting the full link records, the computer device may keep the preset number of full link records, for example, the preset number may be equal to 1, and accordingly, for each round, the computer device deletes the full link record of the partial data from the full link records of the batch of data of each round until a full link record of one data remains every time a preset period is reached. Of course, the preset number may also be greater than 1, which is not limited in the embodiment of the present invention. By maintaining at least one full link record of data, at least one full link record of data may be provided for viewing by a user. Referring to fig. 3, a schematic diagram of deleting records is provided, and a computer device may delete full link records of half of data every week, and finally, only one full link record of data is reserved.
For example, the computer device processed 30 articles from the 50 th turn of a web page cleaning job on 1/2018. When the computer device clears the record of the 50 th round on the 8 th day of the 1 st month in 2018, the records with the sequence numbers of 1,3,5 and 7 … 29 can be deleted, and the sequence numbers of 0,2,4,6,8 and 10 … 30 can be reserved. On day 1, month 15, when the record of the 50 th round is cleared, only the serial numbers 0,4,8,12,16 are retained. Only 0,8,16 remained after deletion at 22 days 1. Only 0,16 was retained for 1 month and 29 days. After 1 week, only the record with sequence number 0 is kept for ever.
Correspondingly, when receiving the query instruction, displaying the full link record corresponding to the data includes: when an inquiry instruction for the data is received and the full link record of the data is not deleted, the computer equipment displays the full link record of the data; when receiving the query instruction for the data and deleting the full link record of the data, the computer device displays the full link record of the data except the data in the data of the same round of the same data processing operation.
For example, if the data has been cleaned up, the computer device may find and display the full-link trace data of the same job for the same round according to the identification of the full-link record, i.e., the first half information of the globally unique ID, so as to facilitate the user to locate the commonality problem. Taking the data as a web page and the full link record of the web page identified as cftflow _ prod _73.223.0000000000000000018 as an example, if the record of the web page has been cleared, the computer device may know that the web page is formal environment processing according to the first half "cftflow _ prod _ 73.223", the job ID is 73, and the round is 233, and the computer device may find the closest record of the same round, such as cftflow _ prod _73.223.0000000000000000017, and may also analyze the problem as a reference.
When the computer device receives an inquiry instruction of the full link record of any data, if the full link record of the data is deleted during inquiry, the computer device can display the full link records of other data in the same batch, and because the processing time of the data in the same batch is relatively close, if problems occur in the processing process, the steps of the problems occurring in the data are probably the same, so the computer device can analyze and locate the problems occurring in the processing process of the data by means of the full link records of the other data.
Referring to fig. 4, a flowchart of a data processing method is provided, and in order to more clearly describe the above technical solution provided by the embodiment of the present invention, the following explains the above technical solution by taking data as a web page and taking a processing process as a web page cleaning example, with reference to the flowchart shown in fig. 4. For example, a website has good web pages, and a crawling server can crawl these web pages, but these web pages are displayed unfriendly on mobile terminals such as mobile phones, and therefore, the computer device needs to clean and sort these web pages into a simple mobile phone reading mode. As shown in fig. 5, a schematic diagram of data display is provided, the left diagram in fig. 5 is an original web page including a web page frame, comments, advertisements and texts, and the right diagram is a cleaned web page including only important information such as web page texts. After the crawling server downloads all the web pages of the website, the computer device can convert all the web pages into a mobile browsing format. To this end, the computer device may create a data processing job, the process flow of which is shown in FIG. 4. After the computer equipment starts the data processing operation, an operation history record can be created, the starting time and the operation configuration are saved, then all original web pages are input, and when one original web page is selected, a full link record is created to save the input information of the original web page. Further, step 1 is executed on the original webpage, that is, the title, author and publication time of the original webpage are extracted, and after step 1 is executed, the processing result of step 1, including the title, author and publication time of the original webpage, is saved in the full link record. Then, step 2 is executed on the original webpage, that is, the text of the original webpage is cleaned, and after step 2 is executed, the processing result of step 2, including the text of the original webpage, is stored in a full link record. And then, executing step 3 on the original webpage, namely, publishing the cleaned article, and after the step 3 is executed, storing the processing result of the step 3 in a full link record, including whether the publishing is successful or not. After one webpage is processed, the same processing process is carried out on the next webpage until the original webpage is completely processed, and information such as the end time and the number of processed webpages are stored in the operation history record.
The website can release new articles every day, the crawling server can crawl to dozens of new webpages every day, and data processing can be operated every day. Fig. 4 is a flow chart showing a flow of one run of a data processing job, and a computer device may process several tens of web pages in one run. Referring to FIG. 6, a schematic diagram of a data processing operation is provided, as shown in FIG. 6, the web page cleaning requirement of a certain website is a "Job" (Job); wash once per day, is a "run _ count"; the processing of each web page is a record; each step of the process is a "step". One data processing job, running at a time, will process multiple data simultaneously. Each time it is run, the computer device will maintain an overall history of the data processing job in that round, including a history of start times, end times, scripts and configurations of the job, amount of data processed, number of rounds, etc. (corresponding to the job history mentioned above in step 102). For example, the overall history of a turn may be as follows:
job ID: 73 runs 223
The running state successfully inputs the data volume: 998
Starting time: 2018-03-2909: 04:13 end time: 2018-03-2909:05:16
Amount of data successfully processed: 52 volume of data failed in processing: 0
Amount of data skipped for processing: 946
The full link record for each web page may include: the number of the job, the number of the turn, the globally unique ID of the web page processing, the start time and the end time of the web page processing, processing abnormality information (e.g., processing abnormality or no abnormality), input information such as URL and source code of the web page, the processing result of step 1 (the result of extracting title, author, and publication time, the processing result of step 2 (the result of cleaning the text), and the processing result of step 3 (the publication result, 0 indicates success).
The computer device may provide the processed results to the service party, which may record the globally unique ID of the data processing used. For example, after a web page is cleaned, it is published on a specific application, and the content library of the specific application records a globally unique ID of the web page processing. When a business has a problem, the global unique ID can quickly find the full link tracking data of the data in a key-value database, such as the URL, the source code and the current job configuration of an original webpage, and the processing result of each step, and the problem is located by checking the records. Referring to FIG. 7, a schematic diagram of a search interface is provided, as shown in FIG. 7, a user may enter cftflow _ prod _73.223.0000000000000000018 in an input box and then click a query button under the input box to trigger the computer device to display a full link record of the data.
By storing the full link records of the data, the full link records can be conveniently checked in the development stage, and the development efficiency is improved; after the business is on line, if the problem is found, abnormal data can be immediately traced by inquiring the record of the full link, and the problem can be quickly and accurately positioned due to the complete data of the full link, so that the logic of content processing is optimized, and the product quality is improved.
According to the method provided by the embodiment of the invention, the processing result of the data in each processing step is stored in the process of processing the data, so that the full link record of the data is formed, and the problem of incomplete data record is avoided. In addition, the processing result of each processing step is stored in the full link record, so that centralized query is facilitated, and the problem can be analyzed and positioned by querying the full link record of the data subsequently.
Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention. Referring to fig. 8, the apparatus includes:
a creating module 801, configured to create a full link record of data when selecting one piece of data to be processed, where the full link record is used to store an identifier of the full link record and a processing result of the data in each processing step;
a saving module 802, configured to save a processing result of each processing step in the full link record when the processing step is executed in the process of processing the data;
the display module 803 is configured to display the full link record corresponding to the data when receiving an inquiry instruction, where the inquiry instruction carries an identifier of the full link record.
In one possible implementation, the display module 803 is configured to:
when the query instruction is received, acquiring the identifier of the full link record from the query instruction;
and when the full link record of the data is inquired according to the identification of the full link record, displaying the full link record of the data.
In a possible implementation manner, the display module 803 is configured to display the full link record of the data other than the data in the same batch of processed data when the full link record of the data is not queried according to the identification of the full link record.
In one possible implementation, the creating module 801 is further configured to create a data processing job, where the data processing job is used to instruct different batches of data to be processed in different rounds, and the different batches of data have the same processing requirement;
the creating module 801 is further configured to create a job history record of the data processing job in each turn when the corresponding batch of data starts to be processed in each turn, where the job history record is used to store at least one of an identifier, the turn, a start time, an end time, a processing data amount, and a job configuration of the data processing job;
in one possible implementation, the creating module 801 is configured to perform the step of creating the full link record when any data in each batch of data is selected as the data to be processed.
In one possible implementation, referring to fig. 9, the apparatus further includes:
a deleting module 804, configured to delete the full link record of the partial data from the full link records of the batch of data in each round every time a preset period is reached.
In one possible implementation, the deleting module 804 is configured to delete every other full link record of data from the full link records of the batch of data for each round.
In a possible implementation manner, the display module 803 is configured to display the full link record of the data when the query instruction is received and the full link record of the data is not deleted.
In a possible implementation manner, the display module 803 is configured to display the full link records of the data other than the data in the data of the same round of the same data processing job when the query instruction is received and the full link record of the data is deleted.
In a possible implementation manner, the deleting module 804 is configured to delete, for each round, a full link record of partial data from the full link records of the batch of data of each round until a full link record of data remains, every time a preset period is reached.
In the embodiment of the invention, the processing result of the data in each processing step is stored in the process of processing the data to form the full link record of the data, so that the problem of incomplete data record is avoided. In addition, the processing result of each processing step is stored in the full link record, so that centralized query is facilitated, and the problem can be analyzed and positioned by querying the full link record of the data subsequently.
It should be noted that: in the data processing apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration in data processing, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the data processing apparatus and the data processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Fig. 10 is a schematic structural diagram of a computer device 1000 according to an embodiment of the present invention, where the computer device 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the memory 1002 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1001 to implement the data Processing method provided by the above method embodiments. Certainly, the computer device 1000 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the computer device 1000 may further include other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer readable storage medium is also provided, such as a memory including at least one instruction, at least one program, set of codes, or set of instructions that can be loaded and executed by a processor to perform the data processing method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random-Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method of data processing, the method comprising:
creating a data processing job, wherein the data processing job is used for indicating different batches of data to be processed in different rounds, and the different batches of data have the same processing requirement;
when a corresponding batch of data is processed at the beginning of each turn, creating a job history record of the data processing job in each turn, wherein the job history record is used for storing at least one of identification, turn, starting time, ending time, processing data quantity and job configuration of the data processing job;
when each piece of data to be processed is selected, creating a full link record of the data, wherein the full link record is used for storing the identification of the full link record and the processing result of the data in each processing step;
in the process of processing the data, every time a processing step is executed, storing the processing result of the processing step into the full link record;
when a query instruction is received, displaying a full link record corresponding to the data;
and when the full link record of the data is not inquired according to the identification of the full link record, displaying the full link record of the data except the data in the same batch of processed data.
2. The method of claim 1, wherein displaying the full link record corresponding to the data when the query instruction is received comprises:
when the query instruction is received, acquiring the identifier of the full link record from the query instruction;
and when the full link record of the data is inquired according to the identification of the full link record, displaying the full link record of the data.
3. The method of claim 1, wherein creating a full link record of the data each time a piece of data to be processed is selected comprises:
and when any data in each batch of data is selected as the data to be processed, the step of creating the full link record is executed.
4. The method of claim 1, further comprising:
and deleting the full link records of partial data from the full link records of a batch of data of each turn every time a preset period is reached.
5. The method of claim 4, wherein deleting the full link record of the partial data from the full link records of the batch of data for each round comprises:
and deleting a full link record of one data from every other full link record of the batch of data of each round.
6. The method according to claim 4, wherein displaying the full link record corresponding to the data when the query instruction is received comprises:
and when the query instruction is received and the full link record of the data is not deleted, displaying the full link record of the data.
7. The method according to claim 4, wherein displaying the full link record corresponding to the data when the query instruction is received comprises:
and when the query instruction is received and the full link records of the data are deleted, displaying the full link records of the data except the data in the data of the same round of the same data processing operation.
8. The method according to claim 4, wherein deleting the full link records of the partial data from the full link records of the batch of data of each round for a preset period for each round comprises:
and deleting the full link records of partial data from the full link records of a batch of data of each turn until one full link record of data is left for each turn every time a preset period is reached.
9. A data processing apparatus, characterized in that the apparatus comprises:
the system comprises a creating module, a processing module and a processing module, wherein the creating module is used for creating a data processing job, the data processing job is used for indicating that different batches of data are processed in different rounds, and the different batches of data have the same processing requirement; when a corresponding batch of data is processed at the beginning of each turn, creating a job history record of the data processing job in each turn, wherein the job history record is used for storing at least one of identification, turn, starting time, ending time, processing data quantity and job configuration of the data processing job; when each piece of data to be processed is selected, creating a full link record of the data, wherein the full link record is used for storing the identification of the full link record and the processing result of the data in each processing step;
the storage module is used for storing the processing result of each processing step into the full link record when executing each processing step in the process of processing the data;
the display module is used for displaying the full link record corresponding to the data when receiving a query instruction, wherein the query instruction carries the identification of the full link record; and when the full link record of the data is not inquired according to the identification of the full link record, displaying the full link record of the data except the data in the same batch of processed data.
10. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a data processing method according to any one of claims 1 to 8.
11. A computer-readable storage medium, having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the data processing method of any one of claims 1 to 8.
CN201810603070.4A 2018-06-12 2018-06-12 Data processing method and device, computer equipment and storage medium Active CN108733845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810603070.4A CN108733845B (en) 2018-06-12 2018-06-12 Data processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810603070.4A CN108733845B (en) 2018-06-12 2018-06-12 Data processing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108733845A CN108733845A (en) 2018-11-02
CN108733845B true CN108733845B (en) 2020-11-13

Family

ID=63929424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810603070.4A Active CN108733845B (en) 2018-06-12 2018-06-12 Data processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108733845B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109600375B (en) * 2018-12-13 2021-07-16 锐捷网络股份有限公司 Message tracking method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577697A (en) * 2017-07-18 2018-01-12 阿里巴巴集团控股有限公司 A kind of data processing method, device and equipment
CN107818112A (en) * 2016-09-13 2018-03-20 腾讯科技(深圳)有限公司 A kind of big data analysis operating system and task submit method
CN107958021A (en) * 2017-10-27 2018-04-24 东软集团股份有限公司 Processing method, device, storage medium and the equipment of operation flow data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818112A (en) * 2016-09-13 2018-03-20 腾讯科技(深圳)有限公司 A kind of big data analysis operating system and task submit method
CN107577697A (en) * 2017-07-18 2018-01-12 阿里巴巴集团控股有限公司 A kind of data processing method, device and equipment
CN107958021A (en) * 2017-10-27 2018-04-24 东软集团股份有限公司 Processing method, device, storage medium and the equipment of operation flow data

Also Published As

Publication number Publication date
CN108733845A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN109376166B (en) Script conversion method, script conversion device, computer equipment and storage medium
US9400733B2 (en) Pattern matching framework for log analysis
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
CN110825619A (en) Automatic generation method and device of interface test case and storage medium
CN113703785B (en) Component-based platform deployment method, device, equipment and storage medium
CN113268500B (en) Service processing method and device and electronic equipment
CN112925757A (en) Method, equipment and storage medium for tracking operation log of intelligent equipment
CN110347573B (en) Application program analysis method, device, electronic equipment and computer readable medium
CN110941554A (en) Method and device for reproducing fault
CN106611029B (en) Method and device for improving search efficiency in website
CN107391528B (en) Front-end component dependent information searching method and equipment
CN110245059B (en) Data processing method, device and storage medium
CN108733845B (en) Data processing method and device, computer equipment and storage medium
CN107544894B (en) Log processing method and device and server
CN112068981A (en) Knowledge base-based fault scanning recovery method and system in Linux operating system
CN107239399A (en) For the index generation method of test, device, system and readable storage medium storing program for executing
CN114611039B (en) Analysis method and device of asynchronous loading rule, storage medium and electronic equipment
CN110955562A (en) Data recovery method, system, equipment and readable storage medium
CN113918606B (en) Log query method and device
CN116155597A (en) Access request processing method and device and computer equipment
CN115129596A (en) Automatic interface testing method, system, device and storage medium
CN115705297A (en) Code call detection method, device, computer equipment and storage medium
CN111352824B (en) Test method and device and computer equipment
CN109739883B (en) Method and device for improving data query performance and electronic equipment
CN113177391A (en) Method for redirecting operation cursor in streaming interface, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant