CN115982441A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN115982441A
CN115982441A CN202310102218.7A CN202310102218A CN115982441A CN 115982441 A CN115982441 A CN 115982441A CN 202310102218 A CN202310102218 A CN 202310102218A CN 115982441 A CN115982441 A CN 115982441A
Authority
CN
China
Prior art keywords
data
script program
data processing
data acquisition
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310102218.7A
Other languages
Chinese (zh)
Inventor
杜玉麟
辛五一
吴双
张子晴
魏萌
刘博勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Kingsoft Digital Network Technology Co Ltd
Original Assignee
Zhuhai Kingsoft Digital Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Kingsoft Digital Network Technology Co Ltd filed Critical Zhuhai Kingsoft Digital Network Technology Co Ltd
Priority to CN202310102218.7A priority Critical patent/CN115982441A/en
Publication of CN115982441A publication Critical patent/CN115982441A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Stored Programmes (AREA)

Abstract

The application provides a data processing method and a device, wherein the data processing method comprises the following steps: under the condition of receiving a data acquisition instruction sent by a user, calling a data acquisition script program to acquire initial data from a third-party platform; and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data. Specifically, by applying the method to the data processing platform, after the data processing platform receives a data acquisition instruction sent by a user, the data acquisition script program is called to acquire corresponding initial data, and under the condition of receiving the data processing script program, data processing of the initial data is performed according to the data processing script program, so that the data acquisition rate is improved, meanwhile, the data is rapidly processed through the corresponding data processing script program, the target data acquisition rate is further accelerated, and the user experience is improved.

Description

Data processing method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method. The application also relates to a data processing device, a computing device and a computer readable storage medium.
Background
With the development of the internet technology, the data acquisition mode gradually appears diversified, and the crawler technology has the characteristics of high efficiency, high coverage rate and the like as a relatively universal data acquisition mode.
However, due to the characteristics of data format and data type diversity of data acquired by the crawler, operations such as secondary data cleaning are often required for the data acquired by the crawler, which causes the problems of low efficiency and poor user experience.
Therefore, how to solve the problem of low data acquisition efficiency has very important significance.
Disclosure of Invention
In view of this, embodiments of the present application provide a data processing method to solve technical defects in the prior art. The embodiment of the application also provides a data processing device, a computing device and a computer readable storage medium.
According to a first aspect of embodiments of the present application, there is provided a data processing method, including:
under the condition of receiving a data acquisition instruction sent by a user, calling a data acquisition script program to acquire initial data from a third-party platform, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user;
and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data.
According to a second aspect of embodiments of the present application, there is provided a data processing apparatus including:
the data acquisition module is configured to call a data acquisition script program to acquire initial data from a third-party platform under the condition of receiving a data acquisition instruction sent by a user, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user;
and the data processing module is configured to receive a data processing script program edited by the user according to the initial data, perform data processing on the initial data according to the data processing script program, and obtain processed target data.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is for storing computer-executable instructions that, when executed by the processor, implement the steps of the data processing method.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data processing method.
According to a fifth aspect of embodiments of the present application, there is provided a chip storing a computer program which, when executed by the chip, implements the steps of the data processing method.
The data processing method provided by the embodiment of the specification is applied to a data processing platform, and the initial data is acquired from a third-party platform by calling a data acquisition script program under the condition that a data acquisition instruction sent by a user is received, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user; and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data.
Specifically, after the data processing platform receives a data acquisition instruction sent by a user, the data acquisition script program is called to acquire corresponding initial data, and under the condition that the data processing script program is received, the data processing of the initial data is performed according to the data processing script program, so that the data acquisition rate is improved.
Drawings
Fig. 1 is a diagram illustrating a scene application of a data processing method according to an embodiment of the present application;
fig. 2 is a flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a processing flow chart of a data processing method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of a computing device according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application.
First, the noun terms to which one or more embodiments of the present invention relate are explained.
Freemarker (i.e., freMark): is a template engine, i.e. a general-purpose tool that is based on templates and data to be changed and used to generate output Text (HTML (hypertext Markup Language) web pages, e-mails, configuration files, source code, etc.).
URL (Uniform Resource Locator ): is a representation method for specifying the location of information on a web service program of the internet.
JSON (JavaScript Object Notation): is a lightweight data exchange format. It stores and represents data in a text format that is completely independent of the programming language, based on a subset of ECMAScript (js specification set by the European Computer Manufacturers Association).
XML (Extensible Markup Language): a subset of the standard, generic markup languages, which can be used to tag data, define data types, is a source language that allows a user to define his or her own markup language. XML is a standard general markup language with good expandability, content and form separation, following strict grammar requirements, good value-keeping and so on.
With the development of the internet technology, the data acquisition mode gradually becomes diversified, and the crawler technology has the characteristics of high efficiency, high coverage rate and the like as a relatively universal data acquisition mode.
However, data returned by the crawler has multiple data formats, and most of the data returned by the crawler is redundant data, so that a user can conveniently use the data after the data is analyzed and cleaned and then converted into a uniform format.
Moreover, most of the crawler systems only provide a crawler function, and because the crawler data processing function is not provided due to the diversity of the format and the type of the crawler data, only the original data is saved, so that a user needs to compile a very complex SQL (Structured Query Language) in a database to use the data, and the efficiency is low.
In order to solve the above technical problem, in the present application, a data processing method is provided. The present application relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Fig. 1 is a scene application diagram of a data processing method according to an embodiment of the present application, which specifically includes the following contents:
as shown in fig. 1, fig. 1 includes a client 102, a data processing platform 104, and a user 106, wherein the client 102 includes but is not limited to a desktop computer, a notebook computer, a tablet computer, a mobile phone, and the like.
For convenience of understanding, in the embodiment of the present specification, the client 102 is taken as a notebook, and the data processing platform 104 is taken as a crawler data processing platform, and based on the data processing method, the data acquisition (i.e., crawling) and processing are described in detail. The crawler data processing platform provides corresponding script program compiling and script program analyzing functions for a user through a Freemarker, and then the analyzed script program can perform data crawling operation and data processing operation through a corresponding server.
In specific implementation, the user 106 writes a corresponding data crawling script in advance, sends the data crawling script to the data processing platform 104 through the client 102, and the data processing platform 104 stores the data processing script after receiving the data crawling script.
After receiving a data crawling instruction triggered by a user 106 based on a data crawling control corresponding to the client 102, the data processing platform 104 calls a corresponding data crawling script, performs data crawling from a third-party platform (such as a game explaining platform, an information searching platform, and the like) corresponding to the data crawling instruction to obtain initial data, and feeds the initial data back to the user 106 through the client 102, the user 106 writes a data processing script through the client 102 according to the initial data, and sends the data processing script to the data processing platform 104, and the data processing platform 104 performs data processing on the initial data according to the data processing script under the condition of receiving the data processing script to generate target data, and then feeds the target data back to the user 106 through the client 102 to implement data crawling and processing operations.
According to the data processing method provided by the embodiment of the specification, a highly customized data processing function is provided for a user by using a Freemarker, and after the user writes a data crawling script or a data processing script on a page, the Freemarker calls the data processing script to process crawler data and then circularly crawls or stores URLs (uniform resource locators) which accord with processing rules into a database. The processing of various data formats and types is realized, such as the processing of common crawler data types like HTML, JSON, XML and the like. The user can also observe the processing result in real time through online debugging on the page without inquiring data in a database. Crawler data processing tools implemented based on Freemarker can implement dynamic addition of functions. On the basis, the requirement of a user on new data format or processing can be met.
Fig. 2 is a flowchart of a data processing method according to an embodiment of the present application, which specifically includes the following steps:
step 202: under the condition of receiving a data acquisition instruction sent by a user, calling a data acquisition script program to acquire initial data from a third-party platform, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user.
The data acquisition instruction can be understood as an instruction triggered by any user by clicking a corresponding control of the data processing platform. For example, the data acquisition instruction may be an instruction triggered by a user clicking a data acquisition control of the data processing platform.
The data acquisition script program may be understood as a script program that is generated by a user by editing in advance before clicking a data acquisition control to generate a data acquisition instruction, is embedded in the data processing platform, and performs corresponding data acquisition from a corresponding third-party platform, or the data acquisition script program may be a script program that is generated by a user writing on a corresponding page of the data processing platform or by a user touching a corresponding control of the page.
A third party platform may be understood to be any platform that presents data, such as a game interpretation platform, a code reference platform, and the like.
The initial data can be understood as data obtained by performing corresponding data acquisition from any platform with data displayed through a data acquisition script.
Specifically, in the case of receiving a data acquisition instruction sent by a user, invoking a data acquisition script program to acquire initial data from a third-party platform may be understood as: before a user triggers a data acquisition instruction through a data acquisition control of a data processing platform, the user pre-writes and generates a corresponding data acquisition script program, and sends the data acquisition script program to the data processing platform, so that the data acquisition script program is embedded into the data processing platform; and under the condition of receiving the data acquisition instruction, acquiring corresponding initial data from a third-party platform corresponding to the platform information of the third-party platform carried in the data acquisition instruction according to the data acquisition script program corresponding to the data acquisition instruction.
In practical application, the data acquisition script program may be a pre-configured data acquisition script program carried by the data processing platform itself, and the data acquisition instruction carries information such as corresponding platform information and data to be acquired, and calls the corresponding data acquisition script program to acquire initial data from a third-party platform corresponding to the platform information, and in order to satisfy acquisition of specific data by a user, the data acquisition script program may be generated by being edited by the user in advance and embedded into the data processing platform, and the specific implementation manner is as follows:
under the condition of receiving a data acquisition instruction sent by a user, before calling a data acquisition script program to acquire initial data from a third-party platform, the method further comprises the following steps:
and receiving a data acquisition script program edited by a user and embedding the data acquisition script program.
Specifically, receiving a data acquisition script program edited by a user and embedding the data acquisition script program can be understood as that the user edits and generates the data acquisition script program according to data to be acquired and sends the generated data acquisition script program to a data processing platform, and the data processing platform receives the data acquisition script program sent by the user and embeds the data acquisition script program into the data processing platform.
In practical application, the data acquisition script program can be a general initial data acquisition script program generated by corresponding developers, a user supplements and perfects the corresponding script program on a script generation page corresponding to the data processing platform based on the data acquisition script program so as to generate a complete data acquisition script program, and then the complete data acquisition script program is called to acquire initial data from a third-party platform under the condition of receiving a data acquisition instruction sent by the user; or, the data acquisition script program may also be a data acquisition script program automatically generated by a user by clicking a script generation page corresponding to the data processing platform, selecting controls such as a preset third-party platform selection control and a data content acquisition control, and then calling a corresponding data acquisition script to perform data acquisition from a corresponding third-party platform when a data acquisition instruction triggered by the user is received.
In addition, when it is determined that the third-party platform includes at least two data display pages through the platform information included in the data acquisition instruction, that is, when data needs to be acquired from multiple pages of a third-party platform, the next page data needs to be acquired according to the previous page where the data is acquired, so that the integrity of the data acquisition is ensured, and the data is more accurately acquired from the third-party platform, the specific implementation manner is as follows:
the third party platform comprises at least two data display pages;
correspondingly, the calling data acquisition script program acquires initial data from a third-party platform, and the method comprises the following steps:
calling the data acquisition script program to acquire first initial data from a target data display page of the at least two data display pages, wherein the target data display page is any one of the at least two data display pages; and
calling the data acquisition script program to acquire second initial data from other data display pages of the at least two data display pages, wherein the other data display pages are display pages except the target data display page of the at least two data display pages;
and generating initial data according to the first initial data and the second initial data.
The data display page may be understood as a page that has any data content displayed in the third party platform, for example, the data display page may be a page that has a game description content displayed in the game platform.
Specifically, when data content needs to be acquired from at least two data display pages of the third-party platform, a data acquisition script program can be called to perform data crawling (i.e., data acquisition) from a first data display page (i.e., a target data display page) of the at least two data display pages, acquire first initial data of the target data display page, determine other data display pages except the target data display page according to the target data display page, perform data crawling from the other data display pages according to the data acquisition script program, and acquire second initial data corresponding to the other data display pages; and further generating initial data according to the first initial data of the target data display page and the second initial data of the other data display pages.
In practical application, when data content needs to be acquired from at least two data display pages of a third-party platform, a first data display page of the at least two data display pages can be used as a target data display page in a default mode, and data can be crawled from the last three pages of the third-party platform through a page of a designated page, so that the latest data content in the platform is acquired, and the acquisition rate of the data content is further accelerated.
It should be noted that, when data is acquired, data may be acquired from the third-party platform in a cyclic manner according to a point-and-touch operation of the user on the data cyclic acquisition control, until the returned data does not have URL information that matches the result, the cyclic acquisition of data from the third-party platform is ended, that is, when the user clicks the data cyclic acquisition control of the data processing platform, the data is analyzed after acquiring data from the corresponding third-party platform each time, whether URL information that matches the result exists in the data is determined, if yes, the corresponding data acquisition operation is performed again on the third-party platform corresponding to the URL information, and if not, the data cyclic acquisition operation is ended.
In addition, in the process of data cycle acquisition operation, after corresponding data is acquired from the third-party platform each time, a corresponding data processing script program is executed, data cleaning and other operations are performed on the data, and waiting time for performing data cleaning operation on all data after the data cycle crawling operation is completed is avoided.
In the process of data acquisition, file data of HTML, JSON and XML types are generally acquired, the acquired data type is JSON as an example, the data acquisition is explained, after the data processing platform processes according to a user data acquisition script program, if a result matched with a user script exists in a JSON result returned by a request URL, a new URL is put into a data acquisition queue again for circular crawling, and the circular acquisition of the data is finished until no result matched with the user script exists in the new JSON result.
In addition, when data acquisition is performed according to the data acquisition script program, the data acquisition of the next data display page can be performed according to the previous data display page under the condition that the third-party platform includes at least two data display pages, and the data acquisition operation of the next third-party platform can be performed based on the previous third-party platform under the condition that the number of the third-party platforms is at least two, so that the accuracy and the comprehensiveness of the data acquisition are improved, and the specific implementation mode is as follows:
the third party platform comprises at least two;
correspondingly, the calling data acquisition script program acquires initial data from a third-party platform, and the method comprises the following steps:
calling the data acquisition script program to acquire first initial data from a target third-party platform of the at least two third-party platforms, wherein the target third-party platform is any one of the at least two third-party platforms; and
calling the data acquisition script program to acquire second initial data from other third-party platforms of the at least two third-party platforms, wherein the other third-party platforms are the platforms except the target third-party platform of the at least two third-party platforms;
and generating initial data according to the first initial data and the second initial data.
Specifically, under the condition that at least two third-party platforms are determined, a data acquisition script program can be called to perform data crawling (i.e., data acquisition) from a first third-party platform (i.e., a target third-party platform) of the at least two third-party platforms, to acquire first initial data of the target third-party platform, to determine other third-party platforms except the target third-party platform according to the target third-party platform, to perform data crawling from the other third-party platforms according to the data acquisition script program, and to acquire second initial data corresponding to the other third-party platforms; and further generating initial data according to the first initial data of the target third-party platform and the second initial data of the other third-party platforms.
For example, taking a third-party platform including a game platform, a code platform, and an information search platform as an example, when the third-party platform includes at least two third-party platforms, data acquisition is described, specifically, when a data acquisition instruction triggered by a user through a data acquisition control of a data processing platform is received, platform information carried in the data acquisition instruction is determined to include the game platform, the code platform, and the information search platform, the game platform is randomly determined to be a target third-party platform from the game platform, the code platform, and the information search platform, a data acquisition script program is invoked to perform data acquisition operation from the game platform, so as to acquire first initial data from the game platform, the code platform is determined to be another third-party platform from the code platform and the information search platform other than the game platform, then data acquisition operation is performed from the code platform according to the data acquisition script program, so as to acquire second initial data, the data acquisition operation of the information search platform also refers to the code platform to perform data acquisition, and then generates corresponding initial data according to the first initial data, the code platform, and the second initial data corresponding to the information search platform.
And when the data processing platform acquires data from the third-party platform through the data acquisition script program, a real-time data acquisition log can be generated, and then the real-time data acquisition log is fed back to a user, so that the user can check the data acquisition condition, real-time monitoring of data acquisition is realized, and meanwhile, the user can perform operations such as real-time debugging of the data acquisition script program through the data acquisition log.
In the data processing method provided by the embodiment of the present specification, the data acquisition script program pre-edited and generated by the user is embedded into the data processing platform, so that the user invokes the data acquisition script program to acquire initial data from the third-party platform under the condition that the data acquisition command is triggered by the data acquisition control of the data processing platform, thereby realizing fast acquisition of data content and reducing time occupation of the user.
Step 204: and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data.
The data processing script program may be understood as a script program for performing a data cleaning operation on data, for example, the data processing script program may be a script program for performing redundant data deletion on data, or may also be a script program for further screening data, and the like.
Specifically, receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data, where the initial data is sent to the user after being obtained from the third-party platform through the data obtaining script program, the user performs editing generation of a corresponding data processing script program according to the initial data, and then sends the generated data processing script program to the data processing platform, and the data processing platform performs data processing on the initial data according to the data processing script program, deletes redundant data in the initial data, and obtains the processed target data when receiving the data processing script program edited by the user for the initial data.
In practical application, before sending initial data to a user, the accuracy of the initial data obtained by a data acquisition script program needs to be determined, that is, data expected to be obtained in the data acquisition script program is compared with the actually obtained initial data, and then data acquisition failure information is generated under the condition that the data acquisition accuracy does not meet a preset condition, and then the data acquisition failure information is sent to the user, so that a subsequent user can debug and optimize the data acquisition script program based on the data acquisition failure information, the accuracy of the data acquisition script program is improved, and the experience of the user is further improved, and the specific implementation mode is as follows:
before the receiving the data processing script program edited by the user according to the initial data, the method further comprises the following steps:
comparing the initial data with data acquisition information corresponding to the data acquisition script program;
and under the condition that the initial data and the data to be acquired corresponding to the data acquisition information are different according to the comparison result, generating data acquisition failure information, and feeding back the data acquisition failure information to the user.
The data acquisition information may be understood as information that the data acquisition script program determines should acquire by processing the data acquisition script information.
The data to be acquired can be understood as data corresponding to the data acquisition information and to be acquired by the data acquisition script program.
Specifically, according to a preset data comparison method, initial data acquired from a third-party platform through a data acquisition script program is compared with data acquisition information corresponding to the data acquisition script program to generate a corresponding comparison result, and further, under the condition that the initial data is determined to be different from data to be acquired corresponding to the data acquisition information based on the comparison result, data acquisition failure information is generated and fed back to a user, so that the user can perform optimized debugging on the data acquisition script program based on the data acquisition failure information. The user may be a user who needs to perform data acquisition and data processing, or a corresponding developer.
In practical applications, the preset data comparison method may be a method of comparing analysis results generated by analyzing the acquired data and the data to be acquired by a semantic analysis method, or may be a method of comparing the acquired initial data and the data to be acquired according to similarities and the like between the acquired initial data and the data to be acquired, and may be set according to practical applications, which is not limited in this specification.
When the comparison result of the initial data and the data to be acquired corresponding to the data acquisition information is judged to be the same or not according to the preset data comparison method, the comparison result value and the corresponding threshold value can be set so as to determine that the initial data and the data to be acquired corresponding to the data acquisition information are different under the condition that the comparison result value of the initial data and the data to be acquired corresponding to the tax bureau acquisition information is not met with the threshold value, and then corresponding data acquisition failure information is generated and fed back to the user.
For example, a preset comparison method is taken as a similarity comparison method, a comparison threshold value is 0.8, for example, comparison of initial data and data to be acquired corresponding to data acquisition information is described, specifically, the initial data and the data to be acquired corresponding to the data acquisition information are compared according to the similarity comparison method, and when a comparison result value (i.e., similarity) of the initial data and the data to be acquired corresponding to the data acquisition information is determined to be 0.6, it is determined that the comparison result value does not satisfy the comparison threshold value of 0.8, that is, the initial data and the data to be acquired corresponding to the data acquisition information are different, it is described that an error exists in a data acquisition script program, and corresponding user number acquisition failure information is generated, and the data acquisition failure information is fed back to a user.
In addition, under the condition that the initial data is compared with the data acquisition information corresponding to the data acquisition script program and the data to be acquired corresponding to the initial data and the data acquisition information is determined to be the same according to the comparison result, it is indicated that the data acquisition script program has no problem, the initial data acquired from the third-party platform through the data acquisition script program is fed back to the user, so that the user can write the data processing script program according to the initial data, the accuracy of data acquisition is improved, and the experience of the user is further improved, and the specific implementation mode is as follows:
after comparing the initial data with the data acquisition information corresponding to the data acquisition script program, the method includes:
and under the condition that the initial data and the data to be acquired corresponding to the data acquisition information are determined to be the same according to the comparison result, sending the initial data to the user.
Specifically, according to a preset data comparison method, initial data acquired from a third-party platform through a data acquisition script program is compared with data acquisition information corresponding to the data acquisition script program to generate a corresponding comparison result, and further, on the basis of the comparison result, the initial data acquired from the third-party platform through the data acquisition script program is fed back to a user to write a data processing script program based on the initial data under the condition that the initial data is determined to be the same as data to be acquired corresponding to the data acquisition information.
For example, a preset comparison method is taken as a similarity comparison method, a comparison threshold value is 0.8, for example, comparison between initial data and data to be acquired corresponding to data acquisition information is described, specifically, the initial data and the data to be acquired corresponding to the data acquisition information are compared according to the similarity comparison method, and when a comparison result value (that is, a similarity) between the initial data and the data to be acquired corresponding to the data acquisition information is determined to be 0.9, it is determined that the comparison result value meets the comparison threshold value of 0.8, that is, the initial data is the same as the data to be acquired corresponding to the data acquisition information, which means that the data acquisition script program has a small error and does not need to be debugged and optimized, and the initial data acquired from a third-party platform through the data acquisition script program is fed back to a user, so that the user can write a data processing script program based on the initial data.
In practical application, in the process of performing data processing on initial data through a data processing script program and generating processed target data, corresponding screening operation is generally performed on redundant data in the initial data, so that a complex data cleaning program (such as an SQL program) is avoided, the program writing time of a user is saved, the experience of the user is improved, and the specific implementation mode is as follows:
the data processing the initial data according to the data processing script program to obtain processed target data includes:
and determining candidate data from the initial data according to the data processing script program, and deleting the candidate data from the initial data to obtain deleted target data.
Specifically, the data processing platform determines candidate data needing data deletion from the initial data according to the data processing script program, and then deletes the candidate data from the initial data to obtain the deleted target data.
In addition, in the process of performing data processing on the initial data through the data processing script program, the data processing operation needs to be judged to determine whether the data processing is successfully completed or not, and in the case of data processing failure, data processing failure information is generated and fed back to the user, so that the user can perform debugging optimization on the data processing script program. The specific implementation mode is as follows:
before the obtaining of the processed target data, the method further includes:
and under the condition that the initial data is failed to be processed according to the data processing script program, generating data processing failure information, and feeding back the data processing failure information to the user.
Specifically, the initial data is subjected to data processing according to the data processing script program to generate a corresponding processing result, the processing result is compared with a data processing result corresponding to the data processing script program, and when the processing result is determined to be different from the data processing result corresponding to the data processing script program, it is determined that the data processing script program has an error, the data processing script program needs to be debugged and optimized, data processing failure information is generated, and the data processing failure information is fed back to a user, so that the user can adjust and optimize the data processing script program.
It should be noted that, an implementation manner of the data processing performed by the data processing script program is similar to an implementation manner of the data acquisition performed by the data acquisition script program, and the data processing has no content discussed above, and may refer to a specific implementation of the data acquisition, which is not described herein again.
In the embodiment of the specification, the initial data acquired by calling the data acquisition script program from the third-party platform is fed back to the user, and then the data processing script program compiled by the user according to the initial data is received to perform data processing on the initial data, so that the target data is generated, the data is rapidly processed, the rate of acquiring the target data is further increased, and the experience of the user is improved.
The data processing method provided by the embodiment of the specification is applied to a data processing platform, and is used for calling a data acquisition script program to acquire initial data from a third-party platform under the condition of receiving a data acquisition instruction sent by a user, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user; and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data.
Specifically, by applying the method to the data processing platform, after the data processing platform receives a data acquisition instruction sent by a user, the data acquisition script program is called to acquire corresponding initial data, and under the condition of receiving the data processing script program, data processing of the initial data is performed according to the data processing script program, so that the data acquisition rate is improved, meanwhile, the data is rapidly processed through the corresponding data processing script program, the target data acquisition rate is further accelerated, and the user experience is improved.
Fig. 3 is a processing flow chart of a data processing method according to an embodiment of the present application, where fig. 3 specifically includes the following steps:
step 302: the user writes a data crawling script.
The data crawling script may be understood as the data acquiring script program.
Step 304: and embedding the data crawling script into a crawler data processing platform.
Step 306: and responding to the data crawling instruction, calling a corresponding data crawling script through the crawler data processing platform to crawl data, and acquiring initial data.
Step 308: and acquiring and analyzing the data processing script.
Specifically, a data processing script written by a user for initial data is received, and the script is analyzed through a Freemarker tool in the crawler data processing platform.
Step 310: and judging whether the analysis is successful.
If yes, go on to step 312;
if not, go to step 318.
Step 312: and performing data processing on the initial data according to the data processing script.
The data processing script can be understood as the data processing script program.
Specifically, data cleaning operation is carried out on data acquired by the crawler according to the data processing script, and data corresponding to the corresponding data processing script are screened out.
Step 314: and judging whether the processing is successful.
If yes, go to step 316;
if not, go to step 320.
Step 316: target data is generated.
Specifically, the data screened from the initial data is used as target data, and the target data is fed back to the user.
Step 318: and returning analysis error information.
Step 320: and returning processing error information.
Step 322: and judging whether circular crawling of data is required.
If yes, go to step 324;
if not, go to step 308.
Specifically, whether cyclic crawling of data needs to be performed from a corresponding third-party platform is determined according to crawling information carried in the data crawling instruction. Wherein the crawling information can be generated based on a corresponding circular crawling control of the user for the crawler data processing platform.
Step 324: and acquiring and analyzing a new URL processing script.
The new URL processing script is generated by the user through pre-compiling the data crawling processing platform.
Step 326: and judging whether the analysis is successful.
If yes, go to step 328;
if not, go to step 340.
Step 328: and processing the initial data according to the new URL processing script.
Specifically, the initial data is processed according to the new URL processing script, and corresponding URL information is extracted from the initial data.
Step 330: and judging whether the processing is successful.
Step 332: and judging whether URL information conforming to the result exists or not.
If yes, go on to step 306;
if not, go to step 334.
Step 334: and finishing the data cycle crawling.
Step 336: and returning processing error information.
Step 338: and returning analysis error information.
The specific implementation of the above steps 302-338 is consistent with the specific implementation of the data processing method of the above embodiment, and will not be discussed in detail here, and details can be referred to the data processing method of the above embodiment.
The embodiment of the specification realizes the processing of various data formats and types, such as the processing of common crawler data types of HTML, JSON, XML and the like. A user can realize data processing, URL cyclic crawling and key data extraction only by compiling a small amount of processing scripts according to the document, and the efficiency of data crawling and data processing is improved. The user can directly establish, configure and debug the crawler task on line on the data processing platform, and the crawler data processing result is observed through the real-time log, so that the complex SQL processing program and the extracted data are avoided, and the data crawling and data processing efficiency is further improved.
And moreover, the data processing and extraction rules are selected with high freedom degree, the grammar of the Freemarker tool can realize the highly customized data extraction rules, and when a user has new extraction rule requirements, a user can add a customized function at any time to meet the requirements of any extraction rule. Meanwhile, the JSON and HTML types automatically realize the circular analysis result until no URL meeting the rule exists, and the JSON analysis type solves the problems of circular capture of common paging and offset paging types.
Corresponding to the above method embodiment, the present application further provides an embodiment of a data processing apparatus, and fig. 4 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of the present application. As shown in fig. 4, the apparatus includes:
a data obtaining module 402, configured to, in a case that a data obtaining instruction sent by a user is received, invoke a data obtaining script program to obtain initial data from a third-party platform, where the data obtaining instruction carries platform information of the third-party platform, and the data obtaining script program is an embedded script program edited by the user;
and the data processing module 404 is configured to receive a data processing script program edited by the user according to the initial data, and perform data processing on the initial data according to the data processing script program to obtain processed target data.
Optionally, the apparatus further comprises:
a program embedding device configured to:
and receiving a data acquisition script program edited by a user and embedding the data acquisition script program.
Optionally, the data obtaining module 402 is further configured to:
calling the data acquisition script program to acquire first initial data from a target data display page of the at least two data display pages, wherein the target data display page is any one of the at least two data display pages; and
calling the data acquisition script program to acquire second initial data from other data display pages of the at least two data display pages, wherein the other data display pages are display pages of the at least two data display pages except the target data display page;
and generating initial data according to the first initial data and the second initial data.
Optionally, the data obtaining module 402 is further configured to:
calling the data acquisition script program to acquire first initial data from a target third-party platform of the at least two third-party platforms, wherein the target third-party platform is any one of the at least two third-party platforms; and
calling the data acquisition script program to acquire second initial data from other third-party platforms of the at least two third-party platforms, wherein the other third-party platforms are the platforms except the target third-party platform of the at least two third-party platforms;
and generating initial data according to the first initial data and the second initial data.
Optionally, the apparatus further comprises:
a data comparison failure module configured to
Comparing the initial data with data acquisition information corresponding to the data acquisition script program;
and under the condition that the initial data and the data to be acquired corresponding to the data acquisition information are different according to the comparison result, generating data acquisition failure information, and feeding back the data acquisition failure information to the user.
Optionally, the apparatus further comprises:
a data alignment success module configured to:
and under the condition that the initial data and the data to be acquired corresponding to the data acquisition information are determined to be the same according to the comparison result, sending the initial data to the user.
Optionally, the data processing module 404 is further configured to:
and determining candidate data from the initial data according to the data processing script program, and deleting the candidate data from the initial data to obtain deleted target data.
Optionally, the apparatus further comprises:
a data processing failure module configured to:
and under the condition that the initial data is failed to be processed according to the data processing script program, generating data processing failure information, and feeding back the data processing failure information to the user.
The data processing device provided by the embodiment of the specification is applied to a data processing platform, and is used for calling a data acquisition script program to acquire initial data from a third-party platform under the condition of receiving a data acquisition instruction sent by a user, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user; and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data.
Specifically, by applying the method to the data processing platform, after the data processing platform receives a data acquisition instruction sent by a user, the data acquisition script program is called to acquire corresponding initial data, and under the condition of receiving the data processing script program, data processing of the initial data is performed according to the data processing script program, so that the data acquisition rate is improved, meanwhile, the data is rapidly processed through the corresponding data processing script program, the target data acquisition rate is further accelerated, and the user experience is improved.
The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method. Further, the components in the device embodiment should be understood as functional blocks that must be created to implement the steps of the program flow or the steps of the method, and each functional block is not actually divided or separately defined. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
Fig. 5 illustrates a block diagram of a computing device 500 provided according to an embodiment of the present application. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.
Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The Access device 540 may include one or more of any type of Network interface (e.g., a Network interface controller) that may be wired or Wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) Wireless interface, a Worldwide Interoperability for Microwave Access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular Network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the application, the above-described components of computing device 500 and other components not shown in FIG. 5 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 5 is for purposes of example only and is not intended to limit the scope of the present application. Other components may be added or replaced as desired by those skilled in the art.
Computing device 500 may be any type of stationary or mobile computing device, including a mobile Computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop Computer or Personal Computer (PC). Computing device 500 may also be a mobile or stationary server.
Wherein processor 520 is configured to execute the computer-executable instructions of the data processing method.
The foregoing is a schematic diagram of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.
An embodiment of the present application also provides a computer readable storage medium storing computer instructions, which when executed by a processor, are used for a data processing method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
An embodiment of the present application further provides a chip, which stores a computer program, and the computer program implements the steps of the data processing method when executed by the chip.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (11)

1. A data processing method is applied to a data processing platform and is characterized by comprising the following steps:
under the condition of receiving a data acquisition instruction sent by a user, calling a data acquisition script program to acquire initial data from a third-party platform, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user;
and receiving a data processing script program edited by the user according to the initial data, and performing data processing on the initial data according to the data processing script program to obtain processed target data.
2. The data processing method according to claim 1, wherein before invoking the data acquisition script program to acquire the initial data from the third-party platform in case of receiving a data acquisition instruction sent by a user, the method further comprises:
and receiving a data acquisition script program edited by a user and embedding the data acquisition script program.
3. The data processing method of claim 1, wherein the third party platform comprises at least two data presentation pages;
correspondingly, the calling data acquisition script program acquires initial data from a third-party platform, and the method comprises the following steps:
calling the data acquisition script program to acquire first initial data from a target data display page of the at least two data display pages, wherein the target data display page is any one of the at least two data display pages; and
calling the data acquisition script program to acquire second initial data from other data display pages of the at least two data display pages, wherein the other data display pages are display pages of the at least two data display pages except the target data display page;
and generating initial data according to the first initial data and the second initial data.
4. The data processing method of claim 1, wherein the third party platform comprises at least two;
correspondingly, the calling data acquisition script program acquires initial data from a third-party platform, and the method comprises the following steps:
calling the data acquisition script program to acquire first initial data from a target third-party platform of the at least two third-party platforms, wherein the target third-party platform is any one of the at least two third-party platforms; and
calling the data acquisition script program to acquire second initial data from other third-party platforms of the at least two third-party platforms, wherein the other third-party platforms are the platforms except the target third-party platform of the at least two third-party platforms;
and generating initial data according to the first initial data and the second initial data.
5. The data processing method of claim 1, wherein before the receiving the data processing script edited by the user according to the initial data, further comprising:
comparing the initial data with data acquisition information corresponding to the data acquisition script program;
and under the condition that the initial data and the data to be acquired corresponding to the data acquisition information are different according to the comparison result, generating data acquisition failure information, and feeding back the data acquisition failure information to the user.
6. The data processing method of claim 5, wherein after comparing the initial data with the data acquisition information corresponding to the data acquisition script program, the method further comprises:
and under the condition that the initial data and the data to be acquired corresponding to the data acquisition information are determined to be the same according to the comparison result, sending the initial data to the user.
7. The data processing method of claim 1, wherein the performing data processing on the initial data according to the data processing script program to obtain processed target data comprises:
and determining candidate data from the initial data according to the data processing script program, and deleting the candidate data from the initial data to obtain deleted target data.
8. The data processing method of claim 1, further comprising, before obtaining the processed target data:
and under the condition that the initial data is failed to be processed according to the data processing script program, generating data processing failure information, and feeding the data processing failure information back to the user.
9. A data processing apparatus, characterized by comprising:
the data acquisition module is configured to call a data acquisition script program to acquire initial data from a third-party platform under the condition of receiving a data acquisition instruction sent by a user, wherein the data acquisition instruction carries platform information of the third-party platform, and the data acquisition script program is an embedded script program edited by the user;
and the data processing module is configured to receive a data processing script program edited by the user according to the initial data, perform data processing on the initial data according to the data processing script program, and obtain processed target data.
10. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to implement the steps of the data processing method of any one of claims 1 to 8.
11. A computer-readable storage medium storing computer instructions, which when executed by a processor implement the steps of the data processing method of any one of claims 1 to 8.
CN202310102218.7A 2023-02-08 2023-02-08 Data processing method and device Pending CN115982441A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310102218.7A CN115982441A (en) 2023-02-08 2023-02-08 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310102218.7A CN115982441A (en) 2023-02-08 2023-02-08 Data processing method and device

Publications (1)

Publication Number Publication Date
CN115982441A true CN115982441A (en) 2023-04-18

Family

ID=85972451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310102218.7A Pending CN115982441A (en) 2023-02-08 2023-02-08 Data processing method and device

Country Status (1)

Country Link
CN (1) CN115982441A (en)

Similar Documents

Publication Publication Date Title
US10990367B2 (en) Application development method, tool, and device, and storage medium
US10394925B2 (en) Automating web tasks based on web browsing histories and user actions
CN108984155B (en) Data processing flow setting method and device
US8826297B2 (en) Creating web services from an existing web site
WO2016082468A1 (en) Data graphing method, device and database server
CN113420201B (en) Cross-domain element positioning and tree generating method for browser RPA system
CN108984202B (en) Electronic resource sharing method and device and storage medium
CN111596902B (en) Method, device, equipment and storage medium for building front-end and back-end development framework
CN111506298A (en) Method for carrying out interface visual configuration based on JSON object
CN107615270A (en) A kind of man-machine interaction method and its device
CN112328219A (en) Service access processing method, device and system and computer equipment
CN103324567A (en) App engine debugging method and debugging system
CN113934632A (en) Code detection method and device
CN111221888A (en) Big data analysis system and method
CN112540925A (en) New characteristic compatibility detection system and method, electronic device and readable storage medium
CN115982441A (en) Data processing method and device
CN114153547B (en) Management page display method and device
CN113590564B (en) Data storage method, device, electronic equipment and storage medium
CN114546381A (en) Front-end page code file generation method and device, electronic equipment and storage medium
CN114356330A (en) Page configuration method and device, electronic equipment and storage medium
CN111782608A (en) Automatic file generation method and device, electronic equipment and storage medium
CN114338240B (en) Vulnerability scanning method and device
CN110780983A (en) Task exception handling method and device, computer equipment and storage medium
CN112527290A (en) Method and device for building page based on biological characteristic information
CN115378996B (en) Method, device, equipment and storage medium for data transmission between systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination