CN113128741A - Data processing method, device, system, equipment and readable storage medium - Google Patents

Data processing method, device, system, equipment and readable storage medium Download PDF

Info

Publication number
CN113128741A
CN113128741A CN202010027839.XA CN202010027839A CN113128741A CN 113128741 A CN113128741 A CN 113128741A CN 202010027839 A CN202010027839 A CN 202010027839A CN 113128741 A CN113128741 A CN 113128741A
Authority
CN
China
Prior art keywords
data
prediction
features
data processing
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010027839.XA
Other languages
Chinese (zh)
Inventor
张继海
肖文明
王智楠
王剑峰
裴勇泉
杨程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010027839.XA priority Critical patent/CN113128741A/en
Publication of CN113128741A publication Critical patent/CN113128741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the disclosure discloses a data processing method, a device, a system, equipment and a readable storage medium, wherein the data processing method comprises the following steps: acquiring first data; extracting features from the first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model; processing the first data through a second processing mode to generate second data; predicting using the feature, the data processing model, and the second data. The scheme of the embodiment of the disclosure provides a multi-scenario universal prediction and operation prediction-based service framework, the framework can be applied to various prediction and operation prediction-based service scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.

Description

Data processing method, device, system, equipment and readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, apparatus, system, device, and readable storage medium.
Background
In recent years, with the rapid development of cloud technology, the cloud is not in good condition in various business scenes. However, many departments of non-core customization business face human cost investment that cannot predict all scenes and customize services based on prediction. For example, in the e-commerce field, the content field, new video, news, etc., all suffer from the lack of behavioral data when forecasting and running a forecast-based service. In view of the above, there is an urgent need for a universal prediction and a prediction-based business framework to reduce the development, operation and maintenance costs.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present disclosure provide a data processing method, apparatus, system, device, and readable storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including:
acquiring first data;
extracting features from the first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model;
processing the first data through a second processing mode to generate second data;
predicting using the feature, the data processing model, and the second data.
With reference to the first aspect, in a first implementation manner of the first aspect, the present disclosure further includes:
storing the features and the data processing model.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the present disclosure further includes:
updating the stored features and the data processing model according to the second data;
the updated features and data processing model are stored.
With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the predicting by using the feature, the data processing model, and the second data includes:
predicting the second data-triggered event based on the stored features and a data processing model.
With reference to the first aspect, in a fourth implementation manner of the first aspect, the present disclosure further includes:
and storing the prediction result.
With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the present disclosure further includes:
providing a service based on the stored prediction.
With reference to the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the providing a service according to the stored prediction result includes:
the stored prediction results are provided to the application.
With reference to the sixth implementation manner of the first aspect, the present disclosure provides, in a seventh implementation manner of the first aspect, the extracting features from the first data through a first processing manner, and performing model training using the extracted features to generate a data processing model, including:
updating features extracted from the first data and updating the data processing model with the updated features,
wherein said predicting using said features, said data processing model and said second data comprises:
predicting using the updated feature, the updated data processing model, and the second data to obtain a prediction result for the updated feature.
With reference to the seventh implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the obtaining the first data includes:
first data is acquired in response to selection of a cold start mode or a warm start mode.
With reference to the seventh implementation manner or the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect, the providing the stored prediction result to the application program includes:
the stored prediction is provided to the application for the updated feature.
With reference to the first aspect, in a tenth implementation manner of the first aspect, the extracting, by a first processing manner, features from the first data, and performing model training using the extracted features to generate a data processing model, further includes:
the data processing model is stored.
With reference to the first aspect, in an eleventh implementation manner of the first aspect, the extracting, by a first processing manner, features from the first data, and performing model training using the extracted features to generate a data processing model, further includes:
selecting a training subset or a training full set from a training candidate pool of the first data;
features are extracted from samples in the selected training subset or training ensemble.
With reference to the eleventh implementation manner of the first aspect, in a twelfth implementation manner of the first aspect, the extracting features from the first data through the first processing manner, and performing model training using the extracted features to generate a data processing model, further includes:
the selected data processing model is trained by inputting the extracted features into the selected data processing model to obtain a trained data processing model.
With reference to the twelfth implementation manner of the first aspect, in a thirteenth implementation manner of the first aspect, the predicting using the feature, the data processing model, and the second data includes:
selecting a prediction subset or a prediction complete set from a prediction candidate pool of the first data;
extracting features from samples in the selected prediction subset or prediction ensemble;
the samples in the prediction subset or the prediction ensemble are predicted by inputting the extracted features into a trained data processing model to obtain a first prediction result.
With reference to the thirteenth implementation manner of the first aspect, in a fourteenth implementation manner of the first aspect, the predicting using the feature, the data processing model, and the second data further includes:
updating the features and the data processing model according to the second data;
predicting using the updated features and data processing model and the second data to obtain a second prediction result;
and updating the first prediction result by using the second prediction result.
With reference to the fourteenth implementation manner of the first aspect, in a fifteenth implementation manner of the first aspect, the second prediction result is a real-time prediction result, and the first prediction result is an offline prediction result.
With reference to the first aspect, in a sixteenth implementation manner of the first aspect, the first data includes log data generated by an application program.
With reference to the sixteenth implementation manner of the first aspect, in a seventeenth implementation manner of the first aspect, the first data further includes meta information.
With reference to the first aspect, in an eighteenth implementation manner of the first aspect, the first processing manner is a batch processing manner, and the second processing manner is a stream processing manner.
In a second aspect, an embodiment of the present disclosure provides a data processing apparatus, including:
an acquisition module configured to acquire first data;
a first processing module configured to extract features from the first data by a first processing means and perform model training using the extracted features to generate a data processing model;
a second processing module configured to process the first data by a second processing manner to generate second data;
a prediction module configured to make a prediction using the feature, the data processing model, and the second data.
In a third aspect, an embodiment of the present disclosure provides a data processing system, including:
the acquisition device is used for acquiring first data;
the first cloud platform is used for extracting features from the first data through a first processing mode and performing model training by using the extracted features to generate a data processing model;
the second cloud platform is used for processing the first data through a second processing mode to generate second data;
at least one computing device for making predictions using the features, the data processing model and the second data.
In a fourth aspect, an embodiment of the present disclosure provides a data processing method, including:
acquiring log data of an application program;
extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model;
processing the log data in a stream processing mode to generate stream data;
predicting using the features, the data processing model, and the flow data;
storing the prediction results and providing the prediction results to the application.
In a fifth aspect, an embodiment of the present disclosure provides a data processing apparatus,
an acquisition module configured to acquire log data of an application program;
a batch processing module configured to extract features from the log data in a batch processing manner and perform model training using the extracted features to generate a data processing model;
a stream processing module configured to process the log data by a stream processing manner to generate stream data;
a prediction module configured to make predictions using the features, the data processing model, and the flow data;
a storage module configured to store a prediction result and provide the prediction result to the application program.
In a sixth aspect, an embodiment of the present disclosure provides a data processing system, including:
the acquisition device is used for acquiring log data of the application program;
a batch processing platform for extracting features from the log data in a batch processing manner and performing model training using the extracted features to generate a data processing model;
the stream processing platform is used for processing the log data in a stream processing mode to generate stream data;
at least one computing device for predicting using the features, the data processing model, and the flow data to predict;
a storage configured to store a prediction result and provide the prediction result to the application.
In a seventh aspect, an embodiment of the present disclosure provides an information recommendation method, including:
providing log data of an application program in response to a first instruction;
extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model;
processing the log data in a stream processing mode to generate stream data;
predicting using the features, the data processing model, and the flow data;
storing a prediction result and providing the prediction result to the application;
presenting the prediction result through the application.
In an eighth aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein,
the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement the method according to any one of the first aspect, the first implementation manner to the eighteenth implementation manner of the first aspect, the fourth aspect, and the seventh aspect.
In a ninth aspect, an embodiment of the present disclosure provides a readable storage medium, on which computer instructions are stored, and the computer instructions, when executed by a processor, implement the method according to any one of the first aspect, the first implementation manner to the eighteenth implementation manner of the first aspect, the fourth aspect, and the seventh aspect.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the technical scheme provided by the embodiment of the disclosure, first data are acquired; extracting features from the first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model; processing the first data through a second processing mode to generate second data; by utilizing the characteristics, the data processing model and the second data for prediction, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, by storing the characteristics and the data processing model, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, the stored characteristics and the data processing model are updated according to the second data; the updated characteristics and the data processing model are stored, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical solution provided by the embodiment of the present disclosure, the predicting by using the features, the data processing model and the second data includes: and predicting the event triggered by the second data according to the stored characteristics and the data processing model, so that a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, a multi-scenario universal prediction and prediction-based service operation framework can be provided by storing the prediction result, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, the service is provided according to the stored prediction result, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the disclosure, the providing service according to the stored prediction result comprises the following steps: the stored prediction result is provided for an application program, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the present disclosure, extracting features from the first data through the first processing manner, and performing model training using the extracted features to generate a data processing model includes: updating features extracted from the first data and updating the data processing model with the updated features, wherein the predicting with the features, the data processing model, and the second data comprises: the updated features, the updated data processing model and the second data are used for prediction to obtain a prediction result aiming at the updated features, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs. In addition, the problems of prediction and recommendation of cold start or hot start scenes, scenes with unknown data distribution and the like can be solved by calculating real-time update characteristics and models in real time.
According to the technical scheme provided by the embodiment of the disclosure, the acquiring the first data comprises: the first data is acquired in response to the selection of the cold start mode or the hot start mode, a prediction-based business framework can be predicted and operated under the cold start or hot start scene, the framework can be applied to various prediction-based business scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost. In addition, the problems of prediction and recommendation of cold start or hot start scenes, scenes with unknown data distribution and the like can be solved by calculating real-time update characteristics and models in real time.
According to the technical scheme provided by the embodiment of the disclosure, the providing the stored prediction result to the application program comprises the following steps: the stored prediction result is provided for the application program according to the updated characteristics, a prediction and prediction-based business framework can be predicted and operated under the scene of cold start or hot start, the framework can be applied to various scenes of prediction and operation of prediction-based business, and a system applying the framework can save a large amount of development, operation and maintenance cost. In addition, personalized recommendation can be performed according to the prediction result.
According to the technical solution provided by the embodiment of the present disclosure, extracting features from the first data through the first processing manner, and performing model training using the extracted features to generate a data processing model, further includes: the data processing model is stored, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical solution provided by the embodiment of the present disclosure, extracting features from the first data through the first processing manner, and performing model training using the extracted features to generate a data processing model, further includes: selecting a training subset or a training full set from a training candidate pool of the first data; the characteristics are extracted from the samples in the selected training subset or the training complete set, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical solution provided by the embodiment of the present disclosure, extracting features from the first data through the first processing manner, and performing model training using the extracted features to generate a data processing model, further includes: the extracted features are input into the selected data processing model to train the selected data processing model to obtain the trained data processing model, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical solution provided by the embodiment of the present disclosure, the predicting by using the features, the data processing model and the second data includes: selecting a prediction subset or a prediction complete set from a prediction candidate pool of the first data; extracting features from samples in the selected prediction subset or prediction ensemble; by inputting the extracted features into the trained data processing model to predict the samples in the prediction subset or the prediction complete set to obtain a first prediction result, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical solution provided by the embodiment of the present disclosure, the predicting by using the features, the data processing model, and the second data further includes: updating the features and the data processing model according to the second data; predicting using the updated features and data processing model and the second data to obtain a second prediction result; the second prediction result is used for updating the first prediction result, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the disclosure, the second prediction result is a real-time prediction result, the first prediction result is an off-line prediction result, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, the first data comprises log data generated by an application program, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, the first data further comprises the meta information, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, the first processing mode is a batch processing mode, the second processing mode is a stream processing mode, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition module is configured to acquire first data; a first processing module configured to extract features from the first data by a first processing means and perform model training using the extracted features to generate a data processing model; a second processing module configured to process the first data by a second processing manner to generate second data; the prediction module is configured to predict by utilizing the characteristics, the data processing model and the second data, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition device is used for acquiring first data; the first cloud platform is used for extracting features from the first data through a first processing mode and performing model training by using the extracted features to generate a data processing model; the second cloud platform is used for processing the first data through a second processing mode to generate second data; and the at least one computing device is used for predicting by utilizing the characteristics, the data processing model and the second data, so that a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
According to the technical scheme provided by the embodiment of the disclosure, log data of an application program is acquired; extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model; processing the log data in a stream processing mode to generate stream data; predicting using the features, the data processing model, and the flow data; the prediction result is stored and provided for the application program, a multi-scene universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenes, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition module is configured to acquire log data of an application program; a batch processing module configured to extract features from the log data in a batch processing manner and perform model training using the extracted features to generate a data processing model; a stream processing module configured to process the log data by a stream processing manner to generate stream data; a prediction module configured to make predictions using the features, the data processing model, and the flow data; the storage module is configured to store the prediction result and provide the prediction result to the application program, a multi-scenario universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenarios, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition device is used for acquiring log data of an application program; a batch processing platform for extracting features from the log data in a batch processing manner and performing model training using the extracted features to generate a data processing model; the stream processing platform is used for processing the log data in a stream processing mode to generate stream data; at least one computing device for predicting using the features, the data processing model, and the flow data to predict; the storage device is configured to store the prediction result and provide the prediction result to the application program, a multi-scenario universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenarios, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, and a system applying the framework can save a large amount of development, operation and maintenance cost.
According to the technical scheme provided by the embodiment of the disclosure, log data of an application program is provided by responding to a first instruction; extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model; processing the log data in a stream processing mode to generate stream data; predicting using the features, the data processing model, and the flow data; storing a prediction result and providing the prediction result to the application; the application program presents the prediction result, a multi-scene universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenes, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, the prediction result is recommended to the user through the application program, and a system applying the framework can save a large amount of development, operation and maintenance cost.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other labels, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure;
FIG. 3 shows a flow diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 4 shows a flow diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 6 shows a schematic diagram of an exemplary application scenario of a data processing method according to an embodiment of the present disclosure;
FIG. 7 illustrates a structural schematic diagram of an example of a big data processing cloud platform module in the exemplary application scenario illustrated in FIG. 6;
FIG. 8 illustrates a schematic diagram of an example of a real-time computing module in the exemplary application scenario shown in FIG. 6;
FIG. 9 is a schematic diagram illustrating an exemplary algorithm executed by a recall sub-module in a big data processing cloud platform module in the exemplary application scenario illustrated in FIG. 6;
FIG. 10 shows a schematic diagram of an exemplary algorithm executed by a model update sub-module in the real-time computation module in the exemplary application scenario shown in FIG. 6;
FIG. 11 shows a schematic diagram of an exemplary algorithm executed by an online prediction sub-module in the real-time computation module in the exemplary application scenario shown in FIG. 6;
FIG. 12 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 13 shows a block diagram of a data processing system, according to an embodiment of the present disclosure;
FIG. 14 shows a flow diagram of a data processing method according to another embodiment of the present disclosure;
fig. 15 shows a block diagram of a data processing apparatus according to another embodiment of the present disclosure;
FIG. 16 shows a block diagram of a data processing system, according to another embodiment of the present disclosure;
FIG. 17 shows a block diagram of an electronic device according to an embodiment of the present disclosure;
FIG. 18 is a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of labels, numbers, steps, actions, components, parts, or combinations thereof disclosed in the present specification, and are not intended to preclude the possibility that one or more other labels, numbers, steps, actions, components, parts, or combinations thereof are present or added.
It should be further noted that the embodiments and labels in the embodiments of the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
According to the technical scheme provided by the embodiment of the disclosure, first data are acquired; extracting features from the first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model; processing the first data through a second processing mode to generate second data; the characteristics, the data processing model and the second data are used for forecasting, a multi-scene universal forecasting and forecasting-based service operation framework can be provided, the framework can be applied to various forecasting and forecasting-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes the following steps S110, S120, S130, and S140:
in step S110, first data is acquired.
In step S120, features are extracted from the first data by a first processing manner, and model training is performed using the extracted features to generate a data processing model.
In step S130, the first data is processed by the second processing means to generate second data.
In step S140, a prediction is made using the features, the data processing model, and the second data.
In one embodiment of the present disclosure, the first data may be various data obtained from an application on which the relevant prediction is desired to be performed.
In one embodiment of the present disclosure, the first data includes application-generated log data.
According to the technical scheme provided by the embodiment of the disclosure, the first data comprises log data generated by an application program, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In an embodiment of the present disclosure, various means known or unknown in the art may be adopted to obtain or collect the log data generated by the application, which is not described in detail in the present disclosure.
In one embodiment of the disclosure, the application program can display information to the user and make logs such as user behaviors and the like flow back to the log acquisition module for use by the data processing platform, so as to form a virtuous circle of the whole system.
In an embodiment of the disclosure, in a log collection process, an application program records a log required in a flow through a buried point in a transparent transmission mode or a writing mode of an application module, and the like, the log is recorded at a server (other modules do not need to pay attention to the log), and a log collection system can collect, analyze and transmit the recorded log to a subsequent module for use according to configuration information.
In one embodiment of the present disclosure, the first data further includes meta information.
According to the technical scheme provided by the embodiment of the disclosure, the first data further comprises the meta information, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, the meta information may be meta information of log data, and may also be meta information of other data related to data processing. The first data including log data and meta information can be used to perform appropriate batch processing operations, thereby realizing extraction of off-line batch processing characteristics, generation of data processing models, and the like.
In one embodiment of the present disclosure, the first processing mode is a batch processing mode, and the second processing mode is a stream processing mode.
According to the technical scheme provided by the embodiment of the disclosure, the first processing mode is a batch processing mode, the second processing mode is a stream processing mode, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
In one embodiment of the present disclosure, in a batch technique, data is first stored and then analyzed. For example, data may be divided into small blocks of data, which are then processed in parallel and produce intermediate results in a distributed manner, and finally the intermediate results are combined to produce a final result. Tasks that require processing of large amounts of data are often best suited for processing in batch processing operations. In general, processing of a large amount of data requires a large amount of time, and thus batch processing is not suitable for a case where the processing time is required to be high.
In one embodiment of the present disclosure, in a stream processing technique, it is assumed that the potential value of data is the freshness of the data, and therefore the stream processing should process the data and obtain the result as quickly as possible. In this manner, data arrives in a streaming manner. The stream processing system will perform calculations on data that is entered into the system at any time. This is a distinct approach to processing compared to batch mode. The stream processing approach does not need to perform operations on the entire data set, but rather on each data item transmitted through the system.
A data processing method according to another embodiment of the present disclosure is described below with reference to fig. 2.
Fig. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure. The embodiment shown in fig. 2 includes step S210 in addition to steps S110 to S140 shown in the embodiment of fig. 1.
In step S210, the features and the data processing model are stored.
According to the technical scheme provided by the embodiment of the disclosure, a multi-scenario universal prediction and prediction-based service operation framework can be provided through the storage characteristics and the data processing model, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs. In one embodiment of the present disclosure, the cloud online database may be utilized to store features and offline pre-trained data processing models (model optional) for subsequent computation.
In one embodiment of the present disclosure, the data processing model stored in step S210 is a trained data processing model, which can be used for prediction in subsequent real-time calculation.
A data processing method according to still another embodiment of the present disclosure is described below with reference to fig. 3.
Fig. 3 shows a flow chart of a data processing method according to yet another embodiment of the present disclosure. The embodiment shown in fig. 3 includes steps S310 and S320 in addition to steps S110 to S140 and S210 shown in the embodiment of fig. 2.
In step S310, the stored features and the data processing model are updated according to the second data.
In step S320, the updated features and data processing model are stored.
In one embodiment of the present disclosure, storing updated features and data processing models may be considered as a merging of new data processing models with existing data processing models. Under the scene that personalized recommendation is needed, prediction and/or recommendation can be performed by using the updated data processing model.
According to the technical scheme provided by the embodiment of the disclosure, the stored characteristics and the data processing model are updated according to the second data; the updated characteristics and the data processing model are stored, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
In one embodiment of the present disclosure, real-time calculation can be performed according to information (e.g., feedback information) collected in real time, and features, data processing models, and the like can be updated online, so that timeliness and business effects can be improved.
In one embodiment of the present disclosure, the cloud online database may be utilized to store updated features and data processing models for subsequent computing.
In one embodiment of the present disclosure, step S140 includes: the second data-triggered event is predicted from the stored features and the data processing model.
According to the technical scheme provided by the embodiment of the disclosure, the prediction is carried out by utilizing the characteristics, the data processing model and the second data, and the method comprises the following steps: the event triggered by the second data is predicted according to the stored characteristics and the data processing model, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, the event triggered by the stream input may be predicted online using updated or not yet updated features and data processing models stored in a cloud database storing the features and data processing models, and the results may be transmitted to subsequent processing. In one embodiment of the present disclosure, the triggered event may refer to a certain logic, and the triggering logic may be customized according to a specific scenario. For example, in a scenario where an application makes a recommendation based on a predicted outcome, the trigger logic may be divided into two types, one for a corresponding update based on a primary recommendation item of second data (e.g., an input stream) and two for a global outcome update.
A data processing method according to still another embodiment of the present disclosure is described below with reference to fig. 4.
Fig. 4 shows a flow chart of a data processing method according to yet another embodiment of the present disclosure. The embodiment shown in fig. 4 includes step S410 in addition to steps S110 to S140 shown in the embodiment of fig. 1.
In step S410, the prediction result is stored.
According to the technical scheme provided by the embodiment of the disclosure, a multi-scenario universal prediction and prediction-based service operation framework can be provided by storing the prediction result, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, the prediction results may be stored using a cloud database. In one embodiment of the present disclosure, the prediction results stored by the cloud database may include offline prediction results, and may also include online prediction results. In one embodiment of the present disclosure, the currently stored prediction may be updated with the new prediction.
A data processing method according to still another embodiment of the present disclosure is described below with reference to fig. 5.
The embodiment shown in fig. 5 includes step S510 in addition to steps S110 to S140 and S410 shown in the embodiment of fig. 4.
In step S510, a service is provided according to the stored prediction result.
According to the technical scheme provided by the embodiment of the disclosure, the service is provided according to the stored prediction result, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
In one embodiment of the present disclosure, the prediction results may be saved into a cloud database in an agreed format, and the stored prediction results may be utilized to provide services to the outside. In one embodiment of the present disclosure, the service may be provided externally by an API (application program interface) manner. In one embodiment of the present disclosure, the prediction results may be applied to various business scenarios, e.g., tracking, analysis, recommendation, etc.
In one embodiment of the present disclosure, step S510 includes: the stored prediction results are provided to the application.
According to the technical scheme provided by the embodiment of the disclosure, the service provision method based on the stored prediction result comprises the following steps: the stored prediction result is provided for an application program, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
In one embodiment of the present disclosure, providing the stored prediction result to the application program may be applied to various scenarios such as a recommended service scenario.
In an embodiment of the present disclosure, the features may include user features, item features, user item association features, and the like, and the details are not described here since the features are not a core problem of the present technical solution.
In one embodiment of the disclosure, the first data is processed by a first processing mode to obtain features that can be used as input for model training, the data processing model is selected, and the training is terminated for a desired purpose (e.g., a training error reaches a threshold, etc.).
In one embodiment of the present disclosure, step S120 includes: updating features extracted from the first data and updating the data processing model with the updated features; the step S140 includes: the prediction is made using the updated feature, the updated data processing model and the second data to obtain a prediction result for the updated feature.
In one embodiment of the present disclosure, updating the features extracted from the first data may refer to updating the features extracted from the first data with the streaming data, i.e., updating the features in real time. For example, the first data may include streaming data, and the first data extracted features may be updated with the streaming data.
According to the technical scheme provided by the embodiment of the disclosure, extracting features from first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model comprises the following steps: updating features extracted from the first data and updating the data processing model with the updated features, wherein predicting with the features, the data processing model, and the second data comprises: the updated features, the updated data processing model and the second data are used for prediction to obtain the prediction result aiming at the updated features, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance costs. In addition, the problems of prediction and recommendation of cold start or hot start scenes, scenes with unknown data distribution and the like can be solved by calculating real-time update characteristics and models in real time.
In one embodiment of the present disclosure, step S110 includes: first data is acquired in response to selection of a cold start mode or a warm start mode.
In one embodiment of the present disclosure, a cold start may refer to starting a data processing scheme of an embodiment of the present disclosure when a prediction and/or recommendation is made for a new user, a new object, a new system, etc., and a warm start may refer to starting a data processing scheme of an embodiment of the present disclosure when a prediction and/or recommendation is made for an existing user, an existing object, an existing system, etc. The data processing scheme of the embodiment of the present disclosure can be implemented in the case of cold start or hot start according to actual conditions.
According to the technical scheme provided by the embodiment of the disclosure, the obtaining of the first data comprises: the first data is acquired in response to the selection of the cold start mode or the hot start mode, a prediction-based business framework can be predicted and operated under the cold start or hot start scene, the framework can be applied to various prediction-based business scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost. In addition, the problems of prediction and recommendation of cold start or hot start scenes, scenes with unknown data distribution and the like can be solved by calculating real-time update characteristics and models in real time.
In one embodiment of the present disclosure, providing the stored prediction results to the application comprises: the stored prediction is provided to the application for the updated feature. Namely, the data processing model is updated by using the updated features according to the generated prediction results, so that new prediction results are obtained, and personalized recommendation can be performed according to the prediction results.
According to the technical scheme provided by the embodiment of the disclosure, the application program is provided with the stored prediction result, and the method comprises the following steps: the stored prediction result is provided for the application program according to the updated characteristics, a prediction and prediction-based business framework can be predicted and operated under the scene of cold start or hot start, the framework can be applied to various scenes of prediction and operation of prediction-based business, and a system applying the framework can save a large amount of development, operation and maintenance cost. In addition, personalized recommendation can be performed according to the prediction result.
In one embodiment of the present disclosure, step S120 further includes: the data processing model is stored.
According to the technical scheme provided by the embodiment of the present disclosure, extracting features from first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model, further includes: the data processing model is stored, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, the data processing model generated by batch processing may be stored using cloud storage, and may be continuously updated according to a batch scheduling time requirement. The big data processing cloud platform for extracting the offline batch processing characteristics, generating the data processing model and the like can comprise a data processing model computing framework, the data processing model computing framework can generate the data processing model by utilizing the characteristics and other information obtained by batch processing, and the data processing model can be customized by an open source data processing model. Accordingly, the data processing models stored in step S120 may include data processing models that have not been trained. In one embodiment of the present disclosure, the means for performing the operations of storing the data processing model in step S120 is different from the means for performing the operations of storing the features and the data processing model in step S210, the means for performing the operations of storing the data processing model in step S120 stores the model for offline batch processing, and the means for performing the operations of storing the features and the data processing model in step S210 uses the stored features and the data processing model for real-time computation.
In one embodiment of the present disclosure, the trained data processing model may be input with the features to a cloud database for use in real-time computing (real-time prediction). In one embodiment, the online real-time prediction data processing model may be updated based on the offline data processing model or may be updated in real-time according to the second data, according to the specific scene requirements.
In one embodiment of the present disclosure, extracting features from first data by a first processing manner, and performing model training using the extracted features to generate a data processing model, further includes: selecting a training subset or a training full set from a training candidate pool of the first data; features are extracted from samples in the selected training subset or training ensemble.
According to the technical scheme provided by the embodiment of the present disclosure, extracting features from first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model, further includes: selecting a training subset or a training full set from a training candidate pool of the first data; the characteristics are extracted from the samples in the selected training subset or the training complete set, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, a training candidate pool may be generated from the first data. According to the system requirements, the candidate pool can be an item candidate pool (for example, recommendation scenes such as personalized item recommendation and similar item recommendation) or a user candidate pool (for example, push scenes and friend recommendation), and subsequent processing selects a subset or a full set from the candidate pool to perform calculation and recommendation of related tasks and the like.
In one embodiment of the present disclosure, given a prediction objective, a subset needs to be selected from the candidate pool. In some cases, the candidate pool is very large, with hundreds and hundreds of millions, so a rough recall process by a specific algorithm is required. In some scenarios, the recall may include a person-to-thing recall, an item-to-item recall, a topical recall, and the like. The recall processing is not a core problem of the present solution and is not described in detail here.
In one embodiment of the present disclosure, extracting features from first data by a first processing manner, and performing model training using the extracted features to generate a data processing model, further includes: the selected data processing model is trained by inputting the extracted features into the selected data processing model to obtain a trained data processing model.
According to the technical scheme provided by the embodiment of the present disclosure, extracting features from first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model, further includes: the extracted features are input into the selected data processing model to train the selected data processing model to obtain the trained data processing model, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, step S140 includes: selecting a prediction subset or a prediction complete set from a prediction candidate pool of the first data; extracting features from samples in the selected prediction subset or prediction ensemble; the samples in the prediction subset or the prediction ensemble are predicted by inputting the extracted features into a trained data processing model to obtain a first prediction result.
According to the technical scheme provided by the embodiment of the disclosure, the prediction is carried out by utilizing the characteristics, the data processing model and the second data, and the method comprises the following steps: selecting a prediction subset or a prediction complete set from a prediction candidate pool of the first data; extracting features from samples in the selected prediction subset or prediction ensemble; by inputting the extracted features into the trained data processing model to predict the samples in the prediction subset or the prediction complete set to obtain a first prediction result, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, a prediction candidate pool may be generated from the first data. According to the system requirements, the candidate pool can be an item candidate pool (for example, recommendation scenes such as personalized item recommendation and similar item recommendation) or a user candidate pool (for example, push scenes and friend recommendation), and subsequent processing selects a subset or a full set from the candidate pool to perform calculation and recommendation of related tasks and the like. The prediction candidate pool may or may not be the same as the training candidate pool, as appropriate. For example, regardless of the timeliness of the candidate pool, both tend to be the same, and when combined with the difference in timeliness, the training candidate pool tends to have items or users that are currently in effect.
In one embodiment of the present disclosure, the samples in the prediction subset or the prediction ensemble are predicted using features extracted from the training candidate pool and a trained data processing model.
In one embodiment of the present disclosure, step S140 further includes: updating the features and the data processing model according to the second data; predicting using the updated features and the data processing model and the second data to obtain a second prediction result; and updating the first prediction result by using the second prediction result.
According to the technical scheme provided by the embodiment of the disclosure, the characteristics and the data processing model are updated according to the second data; predicting using the updated features and the data processing model and the second data to obtain a second prediction result; the second prediction result is used for updating the first prediction result, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance cost.
In one embodiment of the present disclosure, the second prediction is a real-time prediction and the first prediction is an offline prediction.
According to the technical scheme provided by the embodiment of the disclosure, the second prediction result is a real-time prediction result, the first prediction result is an off-line prediction result, a multi-scene universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenes, and a system applying the framework can save a large amount of development, operation and maintenance costs.
In one embodiment of the present disclosure, the second data may refer to stream data of the log data, and the log stream data is subscribed when performing the real-time calculation. Once there is a data inflow, it will be passed in and trigger the corresponding computation.
In one embodiment of the present disclosure, after the data stream is input, the features may be updated in the real-time computing device according to the data stream.
In one embodiment of the present disclosure, the accumulation of data stream input to a certain extent may be an update to the stored data processing model.
An application example of the data processing method according to an embodiment of the present disclosure is described below with reference to fig. 6.
Fig. 6 shows a schematic diagram of an exemplary application scenario of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 6, the application scheme can be divided into the following parts:
(1) the application comprises the following steps: the application module receives recommendation results provided by a recommendation product platform on the cloud through the API, displays information to a user, and returns logs of user behaviors and the like to the log acquisition module for use by the algorithm platform, so that a virtuous circle of the whole system is formed.
(2) Collecting logs: the application records the logs required in the process through the embedded point in a transparent transmission mode or an application module writing mode, the logs are recorded at the server side (other modules do not need to pay attention), and the log collection system collects, analyzes and transmits the recorded logs to the following modules for application according to the configuration information.
(3) Big data processing cloud platform: one of the core modules in the application scheme is mainly used for persistently storing log information transmitted by the log acquisition module, performing appropriate batch processing operation by combining various meta-information, and realizing extraction of offline batch processing characteristics, generation of a data processing model and the like. The big data processing cloud platform can comprise the following parts:
(3.1) distributed storage and computation: the method mainly aims at the storage of the persistent log information and part of metadata big data, and carries out batch processing calculation operation on the big data.
(3.2) data processing model calculation framework: and generating a data processing model by utilizing the information such as the characteristics and the like obtained by the module batch processing, wherein the data processing model can be an open source data processing model or can be customized.
(3.3) cloud storage (model): the data processing model generated by batch processing is stored on the cloud and can be continuously updated according to the requirement of batch processing scheduling time.
(4) Cloud database (features, data processing model): the cloud online database is used for storing the characteristics and offline pre-trained data processing models (the data processing models are optional) for the real-time computing module to use.
(5) A streaming data cloud platform: after log collection, besides persistence to offline batch processing, real-time stream data needs to be generated for a real-time calculation module to use.
(6) And (3) calculating in real time: one of the core modules in the structure chart performs real-time calculation according to the feedback and other information acquired in real time, including updating on-line characteristics, data processing models and the like. And the timeliness and the service effect are improved.
(7) Cloud database (results): and an interface between the cloud online database and an external application synchronizes the initial result of off-line batch processing (a big data processing cloud platform) calculation, and continuously updates the result to the cloud online database after real-time calculation according to the real-time feedback of a user and outputs the result to the outside.
The following describes how to provide prediction results by using a network video service scenario as an example.
(1) The application comprises the following steps: and the video application client is positioned on the terminal equipment.
(2) Collecting logs: the cloud service provider provides a log collection product.
(3) Big data processing cloud platform: the system comprises a data processing platform, a storage computing platform and a cloud storage platform provided by a cloud service provider.
(4) Cloud database (features, data processing model): a cloud database provided by a cloud service provider.
(5) A streaming data cloud platform: the streaming data cloud platform provided by the cloud service provider can provide functions of publishing, subscribing and distributing streaming data.
(6) And (3) calculating in real time: the cloud service provider provides a real-time computing platform, and algorithm modules can be developed on the platform.
(7) Cloud database (results): and the cloud database provided by the cloud service provider is used for storing the result in the agreed format and can be output externally in an API form. For example, personalized food programs for the user of the terminal device are presented, i.e. recommended, to the user by the application.
The big data processing cloud platform in the application scenario shown in fig. 6 is further described below based on fig. 7.
Fig. 7 shows a schematic structural diagram of an example of a big data processing cloud platform module in the exemplary application scenario shown in fig. 6.
As shown in fig. 7, the big data processing cloud platform module is one of the core modules of the application scheme shown in fig. 6, and mainly includes two sub-modules, namely, a training module and a prediction module.
The training module includes the following sub-modules:
(1) candidate Pool (Pool) submodule: the training candidate pool may be denoted as P. According to the requirements of a recommendation system, the candidate pool can be an item candidate pool (for example, recommendation scenes such as personalized item recommendation and similar item recommendation) or a user candidate pool (for example, push scenes, friend recommendation and the like), and the following modules all select a subset or a complete set from the candidate pool to perform calculation and recommendation of related tasks and the like.
(2) A recall submodule: given a recommendation target, it can be assumed without loss of generality that a user is recommended related items, and a subset needs to be selected from a candidate pool for pushing. However, the candidate pool is often very large, in the few hundreds, and in the hundreds of millions, so a rough recall process by a specific algorithm is required. In general, the recall includes a person-to-thing recall, a thing-to-thing recall, a hot recall, etc., and is not a core problem of the present technical solution, and thus, the detailed description thereof is omitted. The subset of the final yield candidate pool P is denoted S (the corresponding positive and negative sample labels are denoted Y).
(3) A characteristic submodule: on the premise of recalling the data samples, feature extraction can be carried out on each pair of samples so as to facilitate the subsequent model training. The feature module generally includes user features, item features, and user-item association features, which are not a core problem of the present technical solution and therefore will not be described in detail herein. The feature set of the final yield subset S is denoted X.
(4) A model training submodule: the output of the feature submodule can be used as the input of the model training submodule, the data processing model is selected, and the training is terminated when the expected purpose is achieved (for example, the training error reaches a threshold value). That is, the training obtains a mapping F, and realizes a mapping from X to Y (F (X) ═ Y).
(5) The data processing model output submodule comprises: after training, the data processing model obtained is stored in a cloud storage (model) module to be used as one of important inputs of a prediction module. In addition, the final data processing model may be input to a cloud database (features, model) module for use by a real-time computing module. In one embodiment, the online data processing model may be updated based on the offline data processing model or may use data processing models of different algorithms, according to the specific scene requirements.
The prediction module includes the following sub-modules:
(1) a candidate pool sub-module: and the prediction candidate pool can be an article candidate pool or a user candidate pool according to different application scheme requirements, and the prediction candidate pool can be the same as or different from the training candidate pool according to different situations (the prediction candidate pool and the training candidate pool are often the same regardless of the timeliness of the candidate pool, and the training candidate pool often has currently effective articles or users when the timeliness is combined with the difference of the prediction candidate pool and the training candidate pool).
(2) A recall submodule: except that the input candidate pool is different, other basic principles are the same as the recall submodule in the training module, and are not described herein again.
(3) A characteristic submodule: except that the input recall sample is different, other basic principles are the same as the feature sub-modules in the training module, and are not described herein again.
(4) A sample predictor sub-module: and outputting the corresponding characteristic samples by the sub-modules, and performing prediction calculation on the samples according to a prediction formula (mapping F) by combining a data processing model of the training module.
(5) And a result output submodule: and inputting the prediction result into a cloud database (result) module according to an agreed form, and providing service output for the outside in an API form.
The real-time calculation module in the application scheme shown in fig. 6 is further described below based on fig. 8.
FIG. 8 shows a schematic diagram of an example of a real-time computing module in the exemplary application scenario shown in FIG. 6.
The module is one of the core modules of the application scheme shown in fig. 6, and mainly includes two sub-modules, namely, an update module and an online prediction sub-module.
The update module includes the following sub-modules:
(1) a stream input submodule: and the log reflux is fed back to the streaming data cloud platform, real-time computation subscribes to the subject log stream, and once data flow in, the subject log stream is transmitted, and corresponding computation is triggered.
(2) A feature update submodule: and after the data flow in, updating the corresponding article or user characteristics in the real-time computing module according to the feedback data flow.
(3) A model update submodule: the data inflow is accumulated to a certain extent and the data processing model can be updated.
The updated features and data processing models of the modules can be rewritten into a cloud database (features, models) for use by subsequent modules.
The online prediction submodule comprises the following submodules:
(1) a stream input submodule: the streaming data is the streaming data transmitted by the streaming data cloud platform, which is the same as the streaming input submodule in the updating module.
(2) An online prediction submodule: the method comprises the steps of utilizing the features stored in a cloud database (features and models) and the logic triggered by the flow input through a data processing model (for example, the triggering logic can be set according to specific scenes in a self-defined mode and mainly comprises two types, wherein the first type is that corresponding updating is carried out according to main recommended articles of input flow, and the second type is that global result updating is carried out), carrying out online prediction, and transmitting the results downwards.
The output result of the online prediction submodule can be stored in a cloud database (result) according to an agreed format, and the service is provided for the outside through an API mode.
Exemplary algorithms executed by some of the modules in the application scheme shown in fig. 6 are further described below based on fig. 9-11.
Fig. 9 shows a schematic diagram of an exemplary algorithm executed by a recall submodule in a big data processing cloud platform module in the exemplary application scenario shown in fig. 6.
FIG. 10 shows a schematic diagram of an exemplary algorithm executed by a model update sub-module in the real-time computation module in the exemplary implementation shown in FIG. 6.
FIG. 11 shows a schematic diagram of an exemplary algorithm executed by an online prediction sub-module in the real-time computation module in the exemplary implementation shown in FIG. 6.
Embodiments of the present disclosure are not limited to the exemplary algorithms shown.
In the application scheme shown in fig. 6, a framework for content recommendation may be built according to the data processing technical scheme of the present disclosure. The content rich in behaviors can obtain a better result through batch processing of the big data processing cloud platform, and for new content lacking behaviors, exploration loss is reduced through collection of real-time logs and real-time updating of corresponding data processing models, and the flow is concentrated in related content to achieve improvement of the effect. Without loss of generality, the universal content recommendation framework can be suitable for other similar recommendation scenes and can also be used for personalized scenes. And the personalized scene real-time calculation module needs to update the corresponding user characteristics and is added into an online prediction sub-module for calculation, such as waterfall stream scenes.
The application scheme shown in fig. 6 implements a general real-time computing-based on-cloud recommended product framework. The scene recommendation problems of cold start, unknown data distribution and the like are solved by calculating real-time update characteristics and a data processing model in real time, and the framework can be applied to various recommendation scenes such as personalized recommendation, detail page similar recommendation, hot search recommendation and the like. In addition, the recommendation result is updated in real time through real-time calculation under the condition that an online server is not required to be maintained. Moreover, a great deal of development, operation and maintenance cost is saved by building the recommendation system according to the cloud product framework.
An example of a data processing apparatus according to an embodiment of the present disclosure is described below with reference to fig. 12.
Fig. 12 shows a block diagram of a data processing apparatus 1200 according to an embodiment of the present disclosure. As shown in fig. 12, the data processing apparatus 1200 includes an acquisition module 1210, a first processing module 1220, a second processing module 1230, and a prediction module 1240.
The acquisition module 1210 is configured to acquire first data.
The first processing module 1220 is configured to extract features from the first data through a first processing approach and perform model training using the extracted features to generate a data processing model.
The second processing module 1230 is configured to process the first data by a second processing manner to generate second data.
The prediction module 1240 is configured to make a prediction using the features, the data processing model, and the second data.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition module is configured to acquire first data; a first processing module configured to extract features from the first data by a first processing means and perform model training using the extracted features to generate a data processing model; a second processing module configured to process the first data by a second processing manner to generate second data; the prediction module is configured to predict by utilizing the characteristics, the data processing model and the second data, a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
It can be understood by those skilled in the art that the technical solution described with reference to fig. 12 can be combined with the embodiments described with reference to fig. 1 to 11, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 11. For details, reference may be made to the description made above with reference to fig. 1 to 11, and details thereof are not repeated herein.
An example of a data processing system according to an embodiment of the present disclosure is described below with reference to FIG. 13.
FIG. 13 shows a block diagram of a data processing system 1300 according to an embodiment of the present disclosure. As shown in fig. 13, the data processing system 1300 includes an acquisition device 1310, a first cloud platform 1320, a second cloud platform 1330, and at least one computing device 1340.
The acquisition device 1310 is configured to acquire first data.
The first cloud platform 1320 is configured to extract features from the first data through a first processing manner, and perform model training using the extracted features to generate a data processing model.
The second cloud platform 1330 is configured to process the first data through a second processing manner to generate second data.
The at least one computing device 1340 is configured to make predictions using the features, the data processing model, and the second data.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition device is used for acquiring first data; the first cloud platform is used for extracting features from the first data through a first processing mode and performing model training by using the extracted features to generate a data processing model; the second cloud platform is used for processing the first data through a second processing mode to generate second data; and the at least one computing device is used for predicting by utilizing the characteristics, the data processing model and the second data, so that a multi-scenario universal prediction and prediction-based service operation framework can be provided, the framework can be applied to various prediction and prediction-based service operation scenarios, and a system applying the framework can save a large amount of development, operation and maintenance costs.
It can be understood by those skilled in the art that the technical solution described with reference to fig. 13 can be combined with the embodiments described with reference to fig. 1 to 11, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 11. For details, reference may be made to the description made above with reference to fig. 1 to 11, and details thereof are not repeated herein.
A data processing method according to an embodiment of the present disclosure is described below with reference to fig. 14.
Fig. 14 shows a flow chart of a data processing method according to another embodiment of the present disclosure. As shown in fig. 14, the instruction fusion method includes the following steps S1410, S1420, S1430, S1440, and S1450:
in step S1410, log data of the application program is acquired.
In step S1420, features are extracted from the log data by batch processing, and model training is performed using the extracted features to generate a data processing model.
In step S1430, the log data is processed by a stream processing manner to generate stream data.
In step S1440, a prediction is made using the features, the data processing model, and the flow data.
In step S1450, the prediction result is stored and provided to the application program.
According to the technical scheme provided by the embodiment of the disclosure, log data of an application program is acquired; extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model; processing the log data in a stream processing mode to generate stream data; predicting using the features, the data processing model, and the flow data; the prediction result is stored and provided for the application program, a multi-scene universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenes, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, and a system applying the framework can save a large amount of development, operation and maintenance cost.
It can be understood by those skilled in the art that the technical solution described with reference to fig. 14 can be combined with the embodiments described with reference to fig. 1 to 13, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 13. For details, reference may be made to the description made above with reference to fig. 1 to 13, and details thereof are not repeated herein.
A data processing apparatus according to an embodiment of the present disclosure is described below with reference to fig. 15.
Fig. 15 shows a block diagram of a data processing apparatus 1500 according to another embodiment of the present disclosure. As shown in fig. 15, the data processing apparatus 1500 includes an acquisition module 1510, a batch processing module 1520, a stream processing module 1530, a prediction module 1540, and a storage module 1550.
The obtaining module 1510 is configured to obtain log data of an application.
The batch module 1520 is configured to extract features from the log data in a batch manner and perform model training using the extracted features to generate a data processing model.
The stream processing module 1530 is configured to process the log data by a stream processing manner to generate stream data.
The prediction module 1540 is configured to make predictions using the features, the data processing model, and the flow data.
The storage module 1550 is configured to store the prediction and provide the prediction to the application.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition module is configured to acquire log data of an application program; a batch processing module configured to extract features from the log data in a batch processing manner and perform model training using the extracted features to generate a data processing model; a stream processing module configured to process the log data by a stream processing manner to generate stream data; a prediction module configured to make predictions using the features, the data processing model, and the flow data; the storage module is configured to store the prediction result and provide the prediction result to the application program, a multi-scenario universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenarios, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, and a system applying the framework can save a large amount of development, operation and maintenance cost.
It can be understood by those skilled in the art that the technical solution described with reference to fig. 15 can be combined with the embodiments described with reference to fig. 1 to 14, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 14. For details, reference may be made to the description made above with reference to fig. 1 to 14, and details thereof are not repeated herein.
A data processing system according to an embodiment of the present disclosure is described below with reference to fig. 16.
FIG. 16 shows a block diagram of a data processing system 1600, according to another embodiment of the present disclosure. As shown in fig. 16, data processing device 1600 includes a collection device 1610, a batch processing platform 1620, a stream processing platform 1630, at least one computing device 1640, and a storage device 1650.
The collection device 1610 is used for acquiring log data of an application program.
The batch platform 1620 is configured to extract features from the log data in a batch manner, and perform model training using the extracted features to generate a data processing model.
Stream processing platform 1630 is used to process the log data by stream processing to generate stream data.
At least one computing device 1640 is used for predicting using the features, the data processing model, and the flow data.
Storage 1650 is configured to store the prediction and provide the prediction to the application.
According to the technical scheme provided by the embodiment of the disclosure, the acquisition device is used for acquiring log data of an application program; a batch processing platform for extracting features from the log data in a batch processing manner and performing model training using the extracted features to generate a data processing model; the stream processing platform is used for processing the log data in a stream processing mode to generate stream data; at least one computing device for predicting using the features, the data processing model, and the flow data to predict; the storage device is configured to store the prediction result and provide the prediction result to the application program, a multi-scenario universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenarios, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, and a system applying the framework can save a large amount of development, operation and maintenance cost.
It can be understood by those skilled in the art that the technical solution described with reference to fig. 16 can be combined with the embodiment described with reference to fig. 1 to 15, so as to have the technical effects achieved by the embodiment described with reference to fig. 1 to 15. For details, reference may be made to the description made above with reference to fig. 1 to fig. 15, and details thereof are not repeated herein.
In one embodiment of the present disclosure, an information recommendation method includes:
providing log data of an application program in response to a first instruction;
extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model;
processing the log data in a stream processing mode to generate stream data;
predicting using the features, the data processing model, and the flow data;
storing a prediction result and providing the prediction result to the application;
presenting the prediction result through the application.
The information recommendation method of the embodiment can be implemented by using terminal equipment such as a mobile terminal. For example, when a user clicks on a shopping application on the mobile terminal, the mobile terminal provides log data of the application to a background process. The background may be located remotely, such as in the cloud, or may be located at the mobile terminal, or partially at the remote location and partially at the mobile terminal. After the prediction is completed, the prediction result is provided to the user through an application program of the terminal device, for example, recommendation of products and/or services is performed on the user.
According to the technical scheme provided by the embodiment of the disclosure, log data of an application program is provided by responding to a first instruction; extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model; processing the log data in a stream processing mode to generate stream data; predicting using the features, the data processing model, and the flow data; storing a prediction result and providing the prediction result to the application; the application program presents the prediction result, a multi-scene universal prediction and recommendation framework can be provided, the framework can be applied to various recommendation scenes, the recommendation result is updated in real time under the condition that an online server is not required to be maintained, the prediction result is recommended to the user through the application program, and a system applying the framework can save a large amount of development, operation and maintenance cost.
While the foregoing embodiments describe the internal functions and structure of the data processing apparatus, in one possible design, the structure of the data processing apparatus may be implemented as an electronic device, such as that shown in FIG. 17, the electronic device 1700 may include a processor 1701 and a memory 1702.
The memory 1702 is used for storing a program for supporting the positioning apparatus to execute the data processing method or the code generating method in any of the above embodiments, and the processor 1701 is configured to execute the program stored in the memory 1702.
In one embodiment of the present disclosure, the memory 1702 is used to store one or more computer instructions that are executed by the processor 1701 to perform the steps of:
acquiring first data;
extracting features from the first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model;
processing the first data through a second processing mode to generate second data;
predicting using the feature, the data processing model, and the second data.
In one embodiment of the present disclosure, the one or more computer instructions are further executable by the processor 1701 to perform the steps of:
storing the features and the data processing model.
In one embodiment of the present disclosure, the one or more computer instructions are further executable by the processor 1701 to perform the steps of:
updating the stored features and the data processing model according to the second data;
the updated features and data processing model are stored.
In one embodiment of the present disclosure, the predicting using the feature, the data processing model, and the second data includes:
predicting the second data-triggered event based on the stored features and a data processing model.
In one embodiment of the present disclosure, the one or more computer instructions are further executable by the processor 1701 to perform the steps of:
and storing the prediction result.
In one embodiment of the present disclosure, the one or more computer instructions are further executable by the processor 1701 to perform the steps of:
providing a service based on the stored prediction.
In one embodiment of the present disclosure, the providing a service according to the stored prediction result includes:
the stored prediction results are provided to the application.
In one embodiment of the present disclosure, the extracting features from the first data by a first processing manner and performing model training using the extracted features to generate a data processing model includes:
updating features extracted from the first data and updating the data processing model with the updated features,
wherein said predicting using said features, said data processing model and said second data comprises:
predicting using the updated feature, the updated data processing model, and the second data to obtain a prediction result for the updated feature.
In one embodiment of the present disclosure, the acquiring the first data includes:
first data is acquired in response to selection of a cold start mode or a warm start mode.
In one embodiment of the present disclosure, the providing the stored prediction result to the application program includes:
the stored prediction is provided to the application for the updated feature.
In an embodiment of the present disclosure, the extracting features from the first data by a first processing manner, and performing model training using the extracted features to generate a data processing model, further includes:
the data processing model is stored.
In an embodiment of the present disclosure, the extracting features from the first data by a first processing manner, and performing model training using the extracted features to generate a data processing model, further includes:
selecting a training subset or a training full set from a training candidate pool of the first data;
features are extracted from samples in the selected training subset or training ensemble.
In an embodiment of the present disclosure, the extracting features from the first data by a first processing manner, and performing model training using the extracted features to generate a data processing model, further includes:
the selected data processing model is trained by inputting the extracted features into the selected data processing model to obtain a trained data processing model.
In one embodiment of the present disclosure, the predicting using the feature, the data processing model, and the second data includes:
selecting a prediction subset or a prediction complete set from a prediction candidate pool of the first data;
extracting features from samples in the selected prediction subset or prediction ensemble;
the samples in the prediction subset or the prediction ensemble are predicted by inputting the extracted features into a trained data processing model to obtain a first prediction result.
In an embodiment of the present disclosure, the predicting using the feature, the data processing model, and the second data further comprises:
updating the features and the data processing model according to the second data;
predicting using the updated features and data processing model and the second data to obtain a second prediction result;
and updating the first prediction result by using the second prediction result.
In one embodiment of the present disclosure, the second prediction is a real-time prediction and the first prediction is an offline prediction.
In one embodiment of the present disclosure, the first data includes application-generated log data.
In one embodiment of the present disclosure, the first data further includes meta information.
In an embodiment of the present disclosure, the first processing method is a batch processing method, and the second processing method is a stream processing method.
In one embodiment of the present disclosure, the one or more computer instructions are further executable by the processor 1701 to perform the steps of:
acquiring log data of an application program;
extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model;
processing the log data in a stream processing mode to generate stream data;
predicting using the features, the data processing model, and the flow data;
storing the prediction results and providing the prediction results to the application.
Exemplary embodiments of the present disclosure also provide a computer storage medium for storing computer software instructions for the positioning apparatus, which includes a program for executing any of the above embodiments, thereby providing technical effects brought by the method.
FIG. 18 is a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present disclosure.
As shown in fig. 18, the computer system 1800 includes a Central Processing Unit (CPU)1801, which can execute various processes in the embodiments shown in the above-described drawings in accordance with a program stored in a Read Only Memory (ROM)1802 or a program loaded from a storage portion 1808 into a Random Access Memory (RAM) 1803. In the RAM1803, various programs and data necessary for the operation of the system 1800 are also stored. The CPU1801, ROM1802, and RAM1803 are connected to each other via a bus 1804. An input/output (I/O) interface 1805 is also connected to bus 1804.
The following components are connected to the I/O interface 1805: an input portion 1806 including a keyboard, a mouse, and the like; an output portion 1807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1808 including a hard disk and the like; and a communication section 1809 including a network interface card such as a LAN card, a modem, or the like. The communication section 1809 performs communication processing via a network such as the internet. A driver 1810 is also connected to the I/O interface 1805 as needed. A removable medium 1811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1810 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1808 as necessary.
In particular, according to embodiments of the present disclosure, the methods described above with reference to the figures may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the methods of the figures. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1809, and/or installed from the removable media 1811.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer-readable storage medium stores one or more programs which are used by one or more processors to perform the methods described in the present disclosure, thereby providing technical effects brought by the methods.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (27)

1. A data processing method, comprising:
acquiring first data;
extracting features from the first data through a first processing mode, and performing model training by using the extracted features to generate a data processing model;
processing the first data through a second processing mode to generate second data;
predicting using the feature, the data processing model, and the second data.
2. The method of claim 1, further comprising:
storing the features and the data processing model.
3. The method of claim 2, further comprising:
updating the stored features and the data processing model according to the second data;
the updated features and data processing model are stored.
4. The method of claim 3, wherein said using the features, the data processing model, and the second data for prediction comprises:
predicting the second data-triggered event based on the stored features and a data processing model.
5. The method of claim 1, further comprising:
and storing the prediction result.
6. The method of claim 5, further comprising:
providing a service based on the stored prediction.
7. The method of claim 6, wherein providing a service based on the stored prediction comprises:
the stored prediction results are provided to the application.
8. The method of claim 7, wherein extracting features from the first data by a first processing means and performing model training using the extracted features to generate a data processing model comprises:
updating features extracted from the first data and updating the data processing model with the updated features,
wherein said predicting using said features, said data processing model and said second data comprises:
predicting using the updated feature, the updated data processing model, and the second data to obtain a prediction result for the updated feature.
9. The method of claim 8, wherein the obtaining first data comprises:
first data is acquired in response to selection of a cold start mode or a warm start mode.
10. The method of claim 8 or 9, wherein providing the stored prediction to the application comprises:
the stored prediction is provided to the application for the updated feature.
11. The method of claim 1, wherein extracting features from the first data by a first processing means and performing model training using the extracted features to generate a data processing model, further comprises:
the data processing model is stored.
12. The method of claim 1, wherein extracting features from the first data by a first processing means and performing model training using the extracted features to generate a data processing model, further comprises:
selecting a training subset or a training full set from a training candidate pool of the first data;
features are extracted from samples in the selected training subset or training ensemble.
13. The method of claim 12, wherein extracting features from the first data by a first processing means and performing model training using the extracted features to generate a data processing model, further comprises:
the selected data processing model is trained by inputting the extracted features into the selected data processing model to obtain a trained data processing model.
14. The method of claim 13, wherein said using the features, the data processing model, and the second data for prediction comprises:
selecting a prediction subset or a prediction complete set from a prediction candidate pool of the first data;
extracting features from samples in the selected prediction subset or prediction ensemble;
the samples in the prediction subset or the prediction ensemble are predicted by inputting the extracted features into a trained data processing model to obtain a first prediction result.
15. The method of claim 14, wherein said using said features, said data processing model, and said second data for prediction further comprises:
updating the features and the data processing model according to the second data;
predicting using the updated features and data processing model and the second data to obtain a second prediction result;
and updating the first prediction result by using the second prediction result.
16. The method of claim 15, wherein the second prediction is a real-time prediction and the first prediction is an offline prediction.
17. The method of claim 1, wherein the first data comprises application-generated log data.
18. The method of claim 17, wherein the first data further comprises meta information.
19. The method of claim 1, wherein the first processing mode is a batch processing mode and the second processing mode is a stream processing mode.
20. A data processing apparatus, comprising:
an acquisition module configured to acquire first data;
a first processing module configured to extract features from the first data by a first processing means and perform model training using the extracted features to generate a data processing model;
a second processing module configured to process the first data by a second processing manner to generate second data;
a prediction module configured to make a prediction using the feature, the data processing model, and the second data.
21. A data processing system, comprising:
the acquisition device is used for acquiring first data;
a first platform for extracting features from the first data by a first processing means and performing model training using the extracted features to generate a data processing model;
the second platform is used for processing the first data through a second processing mode to generate second data;
at least one computing device for making predictions using the features, the data processing model and the second data.
22. A data processing method, comprising:
acquiring log data of an application program;
extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model;
processing the log data in a stream processing mode to generate stream data;
predicting using the features, the data processing model, and the flow data;
storing the prediction results and providing the prediction results to the application.
23. A data processing apparatus, comprising:
an acquisition module configured to acquire log data of an application program;
a batch processing module configured to extract features from the log data in a batch processing manner and perform model training using the extracted features to generate a data processing model;
a stream processing module configured to process the log data by a stream processing manner to generate stream data;
a prediction module configured to make predictions using the features, the data processing model, and the flow data;
a storage module configured to store a prediction result and provide the prediction result to the application program.
24. A data processing system, comprising:
the acquisition device is used for acquiring log data of the application program;
a batch processing platform for extracting features from the log data in a batch processing manner and performing model training using the extracted features to generate a data processing model;
the stream processing platform is used for processing the log data in a stream processing mode to generate stream data;
at least one computing device for predicting using the features, the data processing model, and the flow data to predict;
a storage configured to store a prediction result and provide the prediction result to the application.
25. An information recommendation method, comprising:
providing log data of an application program in response to a first instruction;
extracting features from the log data in a batch manner, and performing model training using the extracted features to generate a data processing model;
processing the log data in a stream processing mode to generate stream data;
predicting using the features, the data processing model, and the flow data;
storing a prediction result and providing the prediction result to the application;
presenting the prediction result through the application.
26. An electronic device comprising a memory and a processor; wherein,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of claims 1-19, 22, 25.
27. A readable storage medium having stored thereon computer instructions, which when executed by a processor, carry out the method of any one of claims 1-19, 22, 25.
CN202010027839.XA 2020-01-10 2020-01-10 Data processing method, device, system, equipment and readable storage medium Pending CN113128741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010027839.XA CN113128741A (en) 2020-01-10 2020-01-10 Data processing method, device, system, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010027839.XA CN113128741A (en) 2020-01-10 2020-01-10 Data processing method, device, system, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113128741A true CN113128741A (en) 2021-07-16

Family

ID=76771563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010027839.XA Pending CN113128741A (en) 2020-01-10 2020-01-10 Data processing method, device, system, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113128741A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832913A (en) * 2017-10-11 2018-03-23 微梦创科网络科技(中国)有限公司 The Forecasting Methodology and system to monitoring data trend based on deep learning
CN109919685A (en) * 2019-03-18 2019-06-21 苏州大学 Customer churn prediction method, apparatus, equipment and computer readable storage medium
CN109933306A (en) * 2019-02-11 2019-06-25 山东大学 Mix Computational frame generation, data processing method, device and mixing Computational frame
CN110390425A (en) * 2019-06-20 2019-10-29 阿里巴巴集团控股有限公司 Prediction technique and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832913A (en) * 2017-10-11 2018-03-23 微梦创科网络科技(中国)有限公司 The Forecasting Methodology and system to monitoring data trend based on deep learning
CN109933306A (en) * 2019-02-11 2019-06-25 山东大学 Mix Computational frame generation, data processing method, device and mixing Computational frame
CN109919685A (en) * 2019-03-18 2019-06-21 苏州大学 Customer churn prediction method, apparatus, equipment and computer readable storage medium
CN110390425A (en) * 2019-06-20 2019-10-29 阿里巴巴集团控股有限公司 Prediction technique and device

Similar Documents

Publication Publication Date Title
CN109492772B (en) Method and device for generating information
CN108665064B (en) Neural network model training and object recommending method and device
CN107944481B (en) Method and apparatus for generating information
CN113763093B (en) Article recommending method and device based on user portrait
CN111125574A (en) Method and apparatus for generating information
CN111767466A (en) Recommendation information recommendation method and device based on artificial intelligence and electronic equipment
CN108932625B (en) User behavior data analysis method, device, medium and electronic equipment
CN104731861A (en) Method and device for pushing multimedia data
CN113946754A (en) User portrait based rights and interests recommendation method, device, equipment and storage medium
CN117391810A (en) Client information management system and method of Internet of things
CN113469752A (en) Content recommendation method and device, storage medium and electronic equipment
CN111782937A (en) Information sorting method and device, electronic equipment and computer readable medium
CN115564517A (en) Commodity recommendation method, prediction model training method and related equipment
CN111814050A (en) Tourism scene reinforcement learning simulation environment construction method, system, equipment and medium
CN111787042B (en) Method and device for pushing information
WO2022156589A1 (en) Method and device for determining live broadcast click rate
CN111738766A (en) Data processing method and device for multimedia information and server
CN112115354B (en) Information processing method, device, server and storage medium
WO2020211616A1 (en) Method and device for processing user interaction information
CN116975426A (en) Service data processing method, device, equipment and medium
CN113128741A (en) Data processing method, device, system, equipment and readable storage medium
CN116186395A (en) Resource recommendation and model training method and device, equipment and storage medium
CN114357242A (en) Training evaluation method and device based on recall model, equipment and storage medium
CN113946753A (en) Service recommendation method, device, equipment and storage medium based on position fence
CN114119078A (en) Target resource determination method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210716