CN112988806A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN112988806A
CN112988806A CN201911302975.9A CN201911302975A CN112988806A CN 112988806 A CN112988806 A CN 112988806A CN 201911302975 A CN201911302975 A CN 201911302975A CN 112988806 A CN112988806 A CN 112988806A
Authority
CN
China
Prior art keywords
data
script
user behavior
processing
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911302975.9A
Other languages
Chinese (zh)
Inventor
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN201911302975.9A priority Critical patent/CN112988806A/en
Publication of CN112988806A publication Critical patent/CN112988806A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a data processing method and device, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a data request, wherein the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs; calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time; and outputting the processing result of the user behavior data. The implementation mode shortens the feedback cycle of the data request and meets the scene of real-time delivery of the data.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
With the development of computer technology, the processing and extraction scenarios of big data, such as processing and extraction of user behavior data, are more and more, and in these scenarios of big data processing and extraction, data delivery is generally completed by configuring timing tasks.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
and the timing task processes the existing data request after being triggered at the current trigger time, and stops after the existing data request is processed. If a new data request is received after the execution of the timing task is completed and before the next trigger time, the timing task needs to process the new data request at the next trigger time, which results in a long feedback period of the data request and difficulty in meeting a real-time data processing and extracting scenario.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for data processing, which can call a computation script in real time after receiving a data request, so as to perform real-time processing on user behavior data by using the computation script, and output a corresponding processing result of the user behavior data, thereby shortening a feedback period of the data request and meeting a scenario of real-time delivery of data. And the user portrait and/or the article portrait can be determined in time according to the processing result of the user behavior data, so that the viscosity of the user is improved, and the user experience is improved.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of data processing.
The data processing method of the embodiment of the invention comprises the following steps: acquiring a data request, wherein the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs;
calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time;
and outputting the processing result of the user behavior data.
Alternatively,
the data request further indicates a calling order of a plurality of computation scripts for processing the user behavior data;
the real-time calling of the calculation script according to the data request to process the data in real time by using the calculation script comprises the following steps:
and calling the calculation script according to the calling sequence so as to process the user behavior data according to the calling sequence.
Alternatively,
the calling the computing script according to the calling sequence comprises:
and respectively generating threads corresponding to a plurality of calculation scripts for processing the user behavior data in real time, and executing the threads in series and/or in parallel according to the calling sequence so as to call the calculation scripts.
Alternatively,
the serially executing each of the threads includes:
and determining parent-child dependency relationships among the plurality of calculation scripts according to the calling sequence, and executing threads corresponding to the calculation scripts with the parent-child dependency relationships in series.
Alternatively,
the data request further indicates a requested task type;
the calling of the calculation script according to the data request comprises:
and determining the processing time limit of the data request according to the task type, and calling the calculation script according to the processing time limit.
Alternatively,
when a plurality of data requests are received, determining the execution priority of the data requests according to the task type indicated by the data requests, and calling a calculation script according to the execution priority.
Optionally, the method further comprises:
and determining a user portrait and/or an article portrait corresponding to the user behavior data according to the processing result of the user behavior data.
Optionally, the method further comprises:
and monitoring the execution state of the called computing script, and stopping executing the called computing script when the execution duration of the called computing script is greater than a first threshold and/or the memory resource occupied by the called computing script is greater than a second threshold.
Alternatively,
the data request further indicates a source identification of a source system inputting the data request;
and when the called computing script fails to be executed, feeding back a result of the computing script failed to be executed to the source system according to the source identifier.
Alternatively,
the data request further indicates a storage address of the data;
the outputting of the processing result of the user behavior data comprises:
and storing the processing result of the user behavior data to a storage space corresponding to the storage address.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an apparatus for data processing.
The data processing device of the embodiment of the invention comprises: the device comprises a request acquisition module, a script calling module and a processing module; wherein the content of the first and second substances,
the request acquisition module is used for acquiring a data request, wherein the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs;
the script calling module is used for calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time;
and the processing module is used for outputting the processing result of the user behavior data.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for data processing.
An electronic device for data processing according to an embodiment of the present invention includes: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement a method of data processing according to an embodiment of the present invention.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, implements a method of data processing of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: after the data request is received, the calculation script is called in real time, so that the user behavior data are processed in real time by the calculation script, and a corresponding processing result of the user behavior data is output, so that the feedback period of the data request is shortened, and a scene of real-time delivery of the data is met. And the user portrait and/or the article portrait can be determined in time according to the processing result of the user behavior data, so that the viscosity of the user is improved, and the user experience is improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method of data processing according to an embodiment of the invention;
FIG. 2 is a schematic diagram of the main steps of another method of data processing according to an embodiment of the invention;
FIG. 3 is a schematic diagram of the main steps of yet another method of data processing according to an embodiment of the invention;
FIG. 4 is a schematic diagram of the main steps of yet another method of data processing according to an embodiment of the invention;
FIG. 5 is a schematic diagram of the main steps of yet another method of data processing according to an embodiment of the invention;
FIG. 6 is a schematic diagram of the main blocks of a data processing apparatus according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic diagram of the main steps of a method of data processing according to an embodiment of the invention.
As shown in fig. 1, the data processing method according to the embodiment of the present invention mainly includes the following steps S101 to S103:
step S101: the method comprises the steps of obtaining a data request, wherein the data request indicates a calculation script used for processing user behavior data and a data dimension of a data table to which the user behavior data belongs.
The data processing method provided by the embodiment of the invention can be realized on the basis of a data processing device comprising a driving execution engine, and the driving execution engine can realize the uploading and calling of the data script. Specifically, the driver execution engine may provide a floor mart for the computation script, and the big data research and development personnel may upload the developed computation script to the floor mart. When the business system needs to process data, the data request is sent to the driving execution engine, the data request indicates a calculation script for processing the data, and the driving execution engine is called to drive the corresponding calculation script to run in real time to realize the processing of the user behavior data. It will be appreciated that different business systems may invoke corresponding computing scripts by sending data requests to the driver execution engine.
In the scenario of big data processing and extraction, the user behavior data is generally stored in the data warehouse in the form of a data table, and therefore, in order to facilitate extracting the user behavior data from the data table to process the extracted user behavior data, the data request further indicates the data dimension of the data table to which the user behavior data belongs. It is understood that the data dimension corresponds to a computation script, for example, computation script a is used for processing user behavior data in data table 1, and computation script B is used for processing user behavior data in data table 2, then the data request will respectively indicate data dimensions corresponding to different computation scripts, that is, when the computation script indicated by the data request is multiple for processing data, the data request respectively indicates data dimensions of a data table to which the user behavior data corresponding to each computation script belongs. In this example, the data request indicates which data dimensions in data Table 1 are to be processed by computation script A and which data dimensions in data Table 2 are to be processed by computation script B, respectively.
Step S102: and calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time.
When the data request indicates the data dimension of the data table to which the user behavior data belongs, when the data processing device calls the calculation script in real time, the user behavior data can be extracted from the data table according to the data dimension by using the calculation script, and the user behavior data is processed in real time. For example, when executing a task instance corresponding to the computation script, the user behavior data of the corresponding data dimension may be extracted from the Hive according to the data dimension indicated by the data request, and the user behavior data may be processed. The Hive is a data warehouse tool based on Hadoop, and can map the structured data file into a table and provide SQL-like query functions.
Step S103: and outputting the processing result of the user behavior data.
After receiving a data request, a data processing device including a drive execution engine may construct a corresponding task instance according to a computation script indicated by the data request and used for processing user behavior data, where each computation script corresponds to one task instance, and each task instance may be bound with a unique identifier (hereinafter referred to as a task instance ID), and then may generate a task corresponding to the data request according to task instances respectively corresponding to one or more computation scripts indicated by the data request and used for processing user behavior data.
Of course, the process of constructing the task instance may also be performed before the data request is received, that is, before the data request is received, the task instance corresponding to each computation script is generated in advance. After the data request is received, the task corresponding to the data request can be generated according to the computation script indicated by the data request and used for processing the data request and the task instance corresponding to the computation script. After the task corresponding to the data request is generated, the task is operated, and then the corresponding calculation script can be called, so that the data is processed by the calculation script.
Before extracting the user behavior data, a temporary data table may be constructed, after extracting the user behavior data of the corresponding data dimension from the Hive, the extracted user behavior data is stored in the temporary data table, and then the data in the temporary data table is processed to obtain a result of processing the user behavior data. In addition, the user behavior data of the corresponding dimension can be directly extracted from the Hive, and the corresponding user behavior data can be processed in the Hive.
Because each calculation script corresponds to a task instance, when user behavior data with different data dimensions in the same data table needs to be processed, different data processing results can be obtained only by changing the data dimensions indicated by the data request. For example, after the business system sends a data request indicating data dimension 1 and data dimension 2 of the data table a to the data processing device and obtains a corresponding data processing result, the business system wants to further obtain the data processing result about data dimension 3 and data dimension 4 of the data table a, at this time, the business system only needs to change the data dimension indicated by the data request, and re-generates the data request according to the changed data dimension, and sends the newly generated data request to the data processing device, so that the data processing result about data dimension 3 and data dimension 4 of the data table a can be obtained. Therefore, for the same calculation script, different user behavior data can be extracted by the calculation script according to the data dimension by changing the data dimension so as to obtain different data processing results, and further different user portraits and/or article portraits can be obtained. For example, when the user _ info table stores basic information of people in the age range of 1-80 years, different result sets can be output by adjusting the age parameter, so as to filter user information in different age ranges, and the embodiment can be implemented by at least the following programming languages: select name, age, sex from user _ info where age in (data dimension). Therefore, different data processing results can be conveniently obtained through a mode of changing data dimensions, in the prior art, when the data dimensions need to be changed, the timing task can only be temporarily stopped, and after the data dimensions input before the data dimensions are changed, the task after the data dimensions are changed can be executed only by waiting for the next trigger time of the timing task.
The task can be run by executing task instances included in the task in sequence, when the task includes a plurality of task instances, that is, when the data request indicates a plurality of calculation scripts for processing data, the plurality of task instances corresponding to the plurality of calculation scripts can be executed in sequence, and after one task instance is successfully executed, the next task instance is executed until all task instances are successfully executed to complete data processing or the task instances are abnormal, and the task is stopped running.
That is to say, in the process of running a task, the execution state of the called computing script may be monitored, and when the execution duration of the called computing script is greater than a first threshold and/or the memory resource occupied by the called computing script is greater than a second threshold, the called computing script is stopped from being executed.
When the execution duration of the called calculation script is greater than a first threshold, the running duration of the calculation script is too long; and when the memory resource occupied by the called calculation script is larger than a second threshold value, the calculation script occupies too much resource. These situations may be caused by an exception occurring in the computation script, and in order to avoid that the computation script occupies too many resources and affects the data processing efficiency, the called computation script may be stopped from being executed, that is, the task instance corresponding to the computation script is stopped from being executed.
Moreover, each task instance has the unique task instance ID, so that when the task instance is abnormal, the abnormal task instance can be checked according to the unique task instance ID, and the result of execution failure is fed back to the source system sending the data request. In other words, when the task instance is not successfully executed, i.e., stopped, it is described that the task instance fails to execute, i.e., the computing script corresponding to the task instance fails to execute, at this time, the task instance failed to execute may be located according to the task instance ID, and the result of the computing script failing to execute may be fed back to the source system according to the source identifier of the source system of the input data request indicated by the data request, and when the result is fed back, the task instance ID corresponding to the specific computing script may be fed back, so that the source system may explicitly execute the failed computing script.
According to the embodiment, in the execution process of the called calculation script, the data processing device can monitor the execution state of the calculation script to ensure the real-time performance and the accuracy of the task corresponding to the data request. When the execution of the computation script fails, the result of the execution failure of the computation script, which also characterizes the result of the data processing failure, can be fed back to the source system of the input data request.
Based on the foregoing embodiments, referring to fig. 2, a method for processing data provided by an embodiment of the present invention may include the following steps:
step S201: a data request is obtained, the data request indicating a computing script for processing the data and a source identification of a source system that entered the data request.
Step S202: and calling a calculation script according to the data request, and monitoring the execution state of the called calculation script.
Step S203: and when the execution duration of the called computing script is greater than a first threshold value, stopping executing the called computing script, and determining that the execution of the computing script fails.
Step S204: and feeding back the result of the computation script execution failure to the source system according to the source identifier so as to feed back the result of data processing failure to the source system.
In addition, when the memory resource occupied by the called computing script is greater than the second threshold, the called computing script is also stopped from being executed, that is, the data processing method provided in the embodiment of the present invention may further include steps S301 to S304 shown in fig. 3:
step S301: a data request is obtained, the data request indicating a computing script for processing the data and a source identification of a source system that entered the data request.
Step S302: and calling a calculation script according to the data request, and monitoring the execution state of the called calculation script.
Step S303: and when the memory resources occupied by the called calculation script are larger than a second threshold value, stopping executing the called calculation script, and determining that the execution of the calculation script fails.
Step S304: and feeding back the result of the computation script execution failure to the source system according to the source identifier so as to feed back the result of data processing failure to the source system.
When the data request indicates a plurality of computing scripts for processing data, that is, when the task corresponding to the data request includes a plurality of task instances, the business system may further specify an execution sequence of the plurality of task instances, that is, the data request may further indicate a call sequence of the plurality of computing scripts, and the driving execution engine may call the corresponding computing script according to the call sequence to execute the task instance corresponding to each computing script according to the call sequence to process the data according to the call sequence.
When the calculation scripts are called, threads corresponding to a plurality of calculation scripts can be generated respectively to call the calculation scripts, and if each calculation script corresponds to one thread, in order to improve the calling efficiency of the calculation scripts and further improve the data processing efficiency, all the threads can be executed in parallel. However, if some computing scripts have parent-child dependency relationships therebetween, that is, the execution of one computing script depends on the execution result of another computing script, the threads corresponding to such computing scripts having parent-child dependency relationships can only be executed serially, wherein the parent-child dependency relationships between the computing scripts can be determined according to the calling order of the computing scripts indicated by the data request.
For example, when the plurality of computation scripts indicated by the data request for user behavior data processing are computation script a, computation script B, and computation script C, and the invocation order of the computation scripts indicated by the data request is: the calculation script A is called first, and then the calculation script B is called. Since the calling of the computation script B is after the computation script a, it is described that the calling of the computation script B needs to be based on the execution process of the computation script a, that is, it is described that there is a parent-child dependency relationship between the computation script a and the computation script B, and at this time, threads corresponding to the computation script a and the computation script B need to be executed in series. However, the computation script C has no context with the call sequence of the computation script a and the computation script B, the thread corresponding to the computation script C may be executed in parallel with the threads corresponding to the computation script a and the computation script B, that is, the thread corresponding to the computation script C may be executed simultaneously with the thread corresponding to the computation script a, and may also be executed simultaneously with the thread corresponding to the computation script B, so as to improve the efficiency of data processing.
Of course, in the specific implementation process, each thread may also be selected to be executed serially or in parallel according to the actual situation, for example, a thread corresponding to each computation script without parent-child dependency relationship is selected to be executed in a serial manner.
In a preferred embodiment of the present invention, the threads corresponding to the computation scripts having parent-child dependencies are executed serially, and the threads corresponding to the computation scripts having no parent-child dependencies are executed in parallel, so as to improve the efficiency of data processing as much as possible. Thus, as shown in fig. 4, the data processing method provided by the embodiment of the present invention may include the following steps S401 to S404:
step S401: obtaining a data request indicating a plurality of computation scripts for processing the data and a calling order of the plurality of computation scripts.
Step S402: and respectively generating threads corresponding to the plurality of calculation scripts, and determining parent-child dependency relationships among the plurality of calculation scripts according to the calling sequence.
Step S403: and serially executing the threads corresponding to the calculation scripts with the parent-child dependency relationship, and parallelly executing the threads corresponding to the calculation scripts without the parent-child dependency relationship so as to call the calculation scripts by utilizing the threads, so that data are processed according to a calling sequence.
Step S404: and outputting the result after data processing.
In addition, different business systems may send data requests for different task types according to different business scenarios, that is, the data requests may indicate not only data scripts for processing data, but also the requested task type. The data processing device may determine a processing time limit of the data request according to the task type and call the calculation script according to the processing time limit. And when the data processing device receives a plurality of data requests, determining the execution priority of the data requests according to the task type indicated by the data requests, and calling a calculation script according to the execution priority.
The processing time limit corresponding to each task type can be configured in advance, for example, the data processing device can record the processing time limit corresponding to different task types in a data table mode, and after receiving a corresponding data request, according to the task type indicated by the data request, the computing script indicated by the data request is called on the premise of no later than the processing time limit.
The execution priority of the data request can be determined according to the processing time limit of the task type indicated by the data request, that is, the shorter the processing time limit is, the higher the execution priority of the data request is, namely, the data processing device can process the data request preferentially; accordingly, the longer the processing deadline is, the lower its execution priority is.
For example, a service system for video processing may send data requests for different task types according to actual service scenarios, such as a task type with priority on transmission efficiency and a task type with priority on definition. During video processing, the processing time limit is shorter for the task type with the priority on transmission efficiency, and the requirement on the processing time limit is not high for the task type with the priority on definition because the task type with the priority on definition focuses on image quality definition, so when the data processing device receives the data requests of the two task types at the same time, the data request corresponding to the task type with the priority on transmission efficiency is processed first, and then the data request corresponding to the task type with the priority on definition is processed.
In addition, the data request can also indicate the storage address of the data, and after the data processing device performs data processing, the result after the data processing can be stored in the storage space corresponding to the storage address. Referring to fig. 5, the data processing method according to the embodiment of the present invention may include the following steps S501 to S503:
step S501: obtaining a data request, wherein the data request indicates a calculation script for processing the data and a storage address of the data.
Step S502: and calling a calculation script according to the data request so as to process the user behavior data by using the calculation script.
Step S503: and storing the processing result of the user behavior data to a storage space corresponding to the storage address.
For example, the data request indicates an address of a MySQL data table for storing the data result, and the MySQL data table may be located in the service system, and the data processing apparatus may push the result after the data processing to the MySQL data table, and may further notify the service system that the data processing result is stored in the corresponding MySQL data table in a form of a message queue. The message queue is a communication method of an application program to the application program.
When the data request does not indicate the storage address of the data, the data processing result can be temporarily stored in the Hive, and when the service system needs to check the data processing result, the Presto API component can directly access the Hive to quickly extract the data processing result in the Hive, so that the data processing result is obtained. Presto is a distributed query engine, which does not store data itself, but can access multiple data sources and support cascading queries across data sources.
After the user behavior data are processed in real time, corresponding user portraits and/or object portraits can be obtained in time according to processing results of the user behavior data, so that corresponding business adjustment of a business system is facilitated, such as adjustment of object recommendation aiming at the user portraits, and improvement of user viscosity is facilitated, and user experience is improved.
In the process of implementing the data processing method provided by the embodiment of the invention, a plurality of record tables can be used for recording different information, so that the execution of the data processing method and the subsequent troubleshooting of relevant abnormal information are facilitated. Specifically, a data task record table (cmg _ data _ task _ record) may be used to record a task corresponding to a data request, where the data task record table may record all task instances corresponding to one task, for example, a task corresponds to 100 task instances, and then the data task record table includes 100 records, where each record corresponds to one task instance, that is, corresponds to one computation script. In addition, the task instance execution flow record table (cmg _ exe _ process _ record) may be used to record the execution process and the execution result of each task instance, such as execution success or execution failure. The task information table (cmg _ task) can also be used for recording tasks respectively corresponding to different data requests. The script task relationship table (cmg _ script _ task _ rel) is used to record the relationship between the calculation scripts, such as the calling order or parent-child dependency relationship between the calculation scripts. The configuration information of the script is recorded using a script configuration table (cmg _ bdp _ script). The configuration information of the floor marts driving the computation scripts provided by the execution engine is recorded by using a mart information configuration table (cmg _ bdp _ market), for example, when there are a plurality of floor marts, the mart information configuration table is used for recording which computation script is in which floor mart, that is, the corresponding relationship between the computation script and the floor mart is recorded, so as to call the computation script according to the corresponding relationship. And the script uploading record table (cmg _ upload _ record) records information such as time and version of the uploading calculation script. And recording configuration information of the business system, such as interface information, authority information and the like by using the data source configuration table (cmg _ database) so as to facilitate the data processing device to communicate with the business system. The external system information table (cmg _ ext _ system _ info) may also be used to record identification information of the service system in the data processing apparatus, such as information of a source identifier of the service system, so as to return a data processing result to the service system. In addition, the MQ information record table (cmg _ queue _ message) sent to the service system can be used to record the data processing result returned to the service system by the data processing device, so as to perform data statistics, data investigation and the like.
According to the data processing method provided by the embodiment of the invention, after the data request is received, the calculation script is called in real time, so that the user behavior data is processed in real time by using the calculation script, and the corresponding processing result of the user behavior data is output, thereby shortening the feedback cycle of the data request and meeting the scene of real-time delivery of the data. And the user portrait and/or the article portrait can be determined in time according to the processing result of the user behavior data, so that the viscosity of the user is improved, and the user experience is improved.
In addition, the embodiment of the invention is based on the parameterized computation script, and realizes the automatic change of data dimension, thereby being beneficial to improving the data processing efficiency, improving the real-time performance of data delivery and meeting the quasi-real-time delivery scene of data. Therefore, on the basis of supporting data processing and data extraction of a big data T + N mode, configurable, parameterizable, quasi-real-time data processing, data extraction and other data processing scenes are supported more flexibly.
Fig. 6 is a schematic diagram of main blocks of a data processing apparatus according to an embodiment of the present invention.
As shown in fig. 6, the data processing apparatus 600 according to the embodiment of the present invention includes: a request acquisition module 601, a script calling module 602 and a processing module 603; wherein the content of the first and second substances,
the request obtaining module 601 is configured to obtain a data request, where the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs;
the script calling module 602 is configured to call a calculation script in real time according to the data request, so as to extract the user behavior data from a data table according to the data dimension by using the calculation script, and process the user behavior data in real time;
the processing module 603 is configured to output a processing result of the user behavior data.
In an embodiment of the present invention, the data request further indicates a call sequence of a plurality of computation scripts for processing the user behavior data, and the script call module 602 is configured to call the computation scripts according to the call sequence to process the user behavior data according to the call sequence.
In an embodiment of the present invention, the script invoking module 602 is configured to generate threads corresponding to a plurality of computation scripts for processing the user behavior data in real time, respectively, and execute each of the threads in series and/or in parallel according to the invoking sequence to invoke the computation scripts.
In an embodiment of the present invention, the script invoking module 602 is configured to determine parent-child dependency relationships among a plurality of computing scripts according to the invoking sequence, and serially execute a thread corresponding to the computing script with the parent-child dependency relationships.
In one embodiment of the invention, the data request further indicates a requested task type; the script calling module 602 is configured to determine a processing time limit of the data request according to the task type, and call the computation script according to the processing time limit.
In an embodiment of the present invention, when a plurality of data requests are received, the script invoking module 602 is configured to determine an execution priority of the data requests according to the task type indicated by the data request, and invoke a computation script according to the execution priority.
In an embodiment of the present invention, the processing module 603 is further configured to determine a user portrait and/or an article portrait corresponding to the user behavior data according to a processing result of the user behavior data.
In an embodiment of the present invention, the processing module 603 is further configured to monitor an execution state of the called computing script, and when an execution duration of the called computing script is greater than a first threshold and/or a memory resource occupied by the called computing script is greater than a second threshold, stop executing the called computing script.
In an embodiment of the present invention, the data request further indicates a source identifier of a source system that inputs the data request, and the processing module 603 is further configured to, when it is monitored that the invoked computing script fails to execute, feed back a result of the computing script failing to execute to the source system according to the source identifier.
In an embodiment of the present invention, the processing module 603 is configured to store a processing result of the user behavior data in a storage space corresponding to the storage address.
According to the data processing device disclosed by the embodiment of the invention, after the data request is received, the calculation script is called in real time, so that the user behavior data is processed in real time by using the calculation script, and the corresponding processing result of the user behavior data is output, so that the feedback cycle of the data request is shortened, and the real-time delivery scene of the data is met. And the user portrait and/or the article portrait can be determined in time according to the processing result of the user behavior data, so that the viscosity of the user is improved, and the user experience is improved.
Fig. 7 shows an exemplary system architecture 700 of a data processing apparatus or a method of data processing to which embodiments of the invention may be applied.
As shown in fig. 7, the system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the terminal devices 701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. Various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 701, 702, and 703.
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 705 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 701, 702, and 703. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the data processing method provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the data processing apparatus is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a request acquisition module, a script calling module, and a processing module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, a request acquisition module may also be described as a "module that acquires a data request".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a data request, wherein the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs; calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time; and outputting the processing result of the user behavior data.
According to the technical scheme of the embodiment of the invention, after the data request is received, the calculation script is called in real time so as to process the user behavior data in real time by using the calculation script and output the corresponding processing result of the user behavior data, thereby shortening the feedback cycle of the data request and meeting the scene of real-time delivery of the data. And the user portrait and/or the article portrait can be determined in time according to the processing result of the user behavior data, so that the viscosity of the user is improved, and the user experience is improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A method of data processing, comprising:
acquiring a data request, wherein the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs;
calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time;
and outputting the processing result of the user behavior data.
2. The method of claim 1, wherein the data request further indicates an order of invocation of a plurality of computing scripts for processing the user behavior data;
the real-time calling of the calculation script according to the data request to process the data in real time by using the calculation script comprises the following steps:
and calling the calculation script according to the calling sequence so as to process the user behavior data according to the calling sequence.
3. The method of claim 2, wherein said invoking said computing script according to said invocation order comprises:
and respectively generating threads corresponding to a plurality of calculation scripts for processing the user behavior data in real time, and executing the threads in series and/or in parallel according to the calling sequence so as to call the calculation scripts.
4. The method of claim 3, wherein executing each of the threads in series comprises:
and determining parent-child dependency relationships among the plurality of calculation scripts according to the calling sequence, and executing threads corresponding to the calculation scripts with the parent-child dependency relationships in series.
5. The method of claim 1, wherein the data request further indicates a requested task type;
the calling of the calculation script according to the data request comprises:
and determining the processing time limit of the data request according to the task type, and calling the calculation script according to the processing time limit.
6. The method of claim 5,
when a plurality of data requests are received, determining the execution priority of the data requests according to the task type indicated by the data requests, and calling a calculation script according to the execution priority.
7. The method of claim 1, further comprising: and determining a user portrait and/or an article portrait corresponding to the user behavior data according to the processing result of the user behavior data.
8. The method of claim 1, further comprising:
and monitoring the execution state of the called computing script, and stopping executing the called computing script when the execution duration of the called computing script is greater than a first threshold and/or the memory resource occupied by the called computing script is greater than a second threshold.
9. The method of claim 8, wherein the data request further indicates a source identification of a source system from which the data request was entered;
and when the called computing script fails to be executed, feeding back a result of the computing script failed to be executed to the source system according to the source identifier.
10. The method of claim 1, wherein the data request further indicates a memory address of the data;
the outputting of the processing result of the user behavior data comprises:
and storing the processing result of the user behavior data to a storage space corresponding to the storage address.
11. An apparatus for data processing, comprising: the device comprises a request acquisition module, a script calling module and a processing module; wherein the content of the first and second substances,
the request acquisition module is used for acquiring a data request, wherein the data request indicates a calculation script for processing user behavior data and a data dimension of a data table to which the user behavior data belongs;
the script calling module is used for calling a calculation script in real time according to the data request so as to extract the user behavior data from a data table according to the data dimension by using the calculation script and process the user behavior data in real time;
and the processing module is used for outputting the processing result of the user behavior data.
12. An electronic device for data processing, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN201911302975.9A 2019-12-17 2019-12-17 Data processing method and device Pending CN112988806A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911302975.9A CN112988806A (en) 2019-12-17 2019-12-17 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911302975.9A CN112988806A (en) 2019-12-17 2019-12-17 Data processing method and device

Publications (1)

Publication Number Publication Date
CN112988806A true CN112988806A (en) 2021-06-18

Family

ID=76342322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911302975.9A Pending CN112988806A (en) 2019-12-17 2019-12-17 Data processing method and device

Country Status (1)

Country Link
CN (1) CN112988806A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557470A (en) * 2015-09-24 2017-04-05 腾讯科技(北京)有限公司 data extraction method and device
CN107665233A (en) * 2017-07-24 2018-02-06 上海壹账通金融科技有限公司 Database data processing method, device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557470A (en) * 2015-09-24 2017-04-05 腾讯科技(北京)有限公司 data extraction method and device
CN107665233A (en) * 2017-07-24 2018-02-06 上海壹账通金融科技有限公司 Database data processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US11755371B1 (en) Data intake and query system with distributed data acquisition, indexing and search
CN110310034B (en) Service arrangement and business flow processing method and device applied to SaaS
CN107729139B (en) Method and device for concurrently acquiring resources
CN105787077B (en) Data synchronization method and device
US20210216370A1 (en) Resource monitor for monitoring long-standing computing resources
US20140229628A1 (en) Cloud-based streaming data receiver and persister
CN107766509B (en) Method and device for static backup of webpage
CN111478781B (en) Message broadcasting method and device
CN111427899A (en) Method, device, equipment and computer readable medium for storing file
CN112395337B (en) Data export method and device
CN112148705A (en) Data migration method and device
CN111831503A (en) Monitoring method based on monitoring agent and monitoring agent device
US11277300B2 (en) Method and apparatus for outputting information
CN113138772B (en) Construction method and device of data processing platform, electronic equipment and storage medium
CN112988806A (en) Data processing method and device
CN112241332B (en) Interface compensation method and device
CN114564249A (en) Recommendation scheduling engine, recommendation scheduling method, and computer-readable storage medium
CN112688982B (en) User request processing method and device
CN113779122A (en) Method and apparatus for exporting data
CN111786801A (en) Method and device for charging based on data flow
CN113760836B (en) Wide table calculation method and device
WO2021047506A1 (en) System and method for statistical analysis of data, and computer-readable storage medium
CN112783665B (en) Interface compensation method and device
CN113760925A (en) Data processing method and device
CN108563677B (en) Data display method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination