CN115409573A - One-stop feature calculation and model training recommendation system intelligent management platform - Google Patents

One-stop feature calculation and model training recommendation system intelligent management platform Download PDF

Info

Publication number
CN115409573A
CN115409573A CN202211025747.3A CN202211025747A CN115409573A CN 115409573 A CN115409573 A CN 115409573A CN 202211025747 A CN202211025747 A CN 202211025747A CN 115409573 A CN115409573 A CN 115409573A
Authority
CN
China
Prior art keywords
data
user
commodity
platform
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211025747.3A
Other languages
Chinese (zh)
Inventor
朱战伟
王继云
罗萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingdang Fast Medicine Technology Group Co ltd
Original Assignee
Dingdang Fast Medicine Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingdang Fast Medicine Technology Group Co ltd filed Critical Dingdang Fast Medicine Technology Group Co ltd
Priority to CN202211025747.3A priority Critical patent/CN115409573A/en
Publication of CN115409573A publication Critical patent/CN115409573A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a recommendation system intelligent management platform for one-stop feature calculation and model training, which comprises a data source platform, a data management module, a feature engineering module, a model training platform, a model management platform, a feature management platform and an online prediction module, wherein the data source platform is connected with the data management module; the data source platform comprises user attribute data, commodity attribute data and user behavior data; the data management module is responsible for storing and extracting user attribute data, commodity attribute data and user behavior data; the characteristic engineering module is used for converting the user attribute data into user characteristics, converting the commodity attribute data into commodity characteristics, and dividing the user behavior data into a training sample and a testing sample; the model training platform is responsible for storing different model training scripts, calling the model to train the training sample, and using the test sample to perform effect test; the model management platform is responsible for storing and managing the generated models; the online prediction module belongs to a module interfacing with the user and the background.

Description

One-stop feature calculation and model training recommendation system intelligent management platform
Technical Field
The invention relates to the technical field of intelligent management, in particular to a recommendation system intelligent management platform for one-stop feature calculation and model training.
Background
In the prior art, problems of complex data source, incomplete data, inconsistency, abnormal data and the like exist in actual work, then different projects are respectively established to process the data, redundancy and errors may occur in the data, and the phenomenon of data crossing is easy to occur in the processing of characteristic projects, so that when online model training is performed, a model is good in performance, and when online application is performed, the model has a poor effect, and characteristics need to be uniformly managed.
Disclosure of Invention
The invention aims to provide a recommendation system intelligent management platform for one-stop feature calculation and model training, and aims to solve the technical problem of how to uniformly manage and intelligently allocate modules of sample engineering, feature calculation, model training, on-line prediction and the like related to a recommendation system, ensure the working quality of the recommendation system and improve the stability and robustness of the system.
The invention aims to solve the defects of the prior art and provides a recommendation system intelligent management platform for one-stop feature calculation and model training, which comprises a data source platform, a data management module, a feature engineering module, a model training platform, a model management platform, a feature management platform and an online prediction module; the data source platform comprises user attribute data, commodity attribute data and user behavior data; the data management module is responsible for storing and extracting the user attribute data, the commodity attribute data and the user behavior data; the characteristic engineering module is used for converting user attribute data into user characteristics aiming at the extracted data, converting commodity attribute data into commodity characteristics, dividing user behavior data into two parts of a training sample and a testing sample, uniformly mapping the user characteristics and the commodity characteristics into vector spaces of the training sample and the testing sample according to user id and commodity id, and dividing the user characteristics and the commodity characteristics into 0.5 minute according to the score of the behavior data, wherein the label of a sample larger than or equal to 0.5 is 1, and the label of a sample smaller than 0.5 is 0; the model training platform is responsible for storing different model training scripts, calling a model to train a training sample, and then using a test sample to perform effect testing; the model management platform is responsible for storing various generated models and managing the models; the characteristic management platform is responsible for storing the generated user characteristics and commodity characteristics and waiting for calling; the online prediction module belongs to a module which is in butt joint with a user and a background and is used for receiving a user id and a recalled commodity id sequence which are transmitted from a front end, reading a user characteristic corresponding to the user id and a commodity characteristic corresponding to the commodity id from a characteristic management platform, splicing and mapping the user characteristic and the commodity characteristic to a vector space, calling a corresponding model through a model management platform, scoring the user and the commodity to predict the interest value of the user to the commodity, sequencing according to the value of the interest value, and outputting a commodity sequencing result of the user.
Preferably, the user attribute data includes user profile data and user profile data, the user profile data includes one or more of user id, gender, age, level, liveness, residence, mobile phone model, network signal, education level, marital status, fertility status, industry and occupation where the work is located, and the user profile data includes one or more of consumption profile data, behavior profile data and user interest profile data of the user.
Preferably, the behavioral representation includes one or more of browsing, praise, buy-up and appraisal.
Preferably, the commodity attribute data comprises one or more of commodity id, commodity name, commodity shelf state, knowledge map classification, b2c classification, b2b classification, import or not, brand id, OTC representation, medicine type, commodity type, applicable crowd and sales volume.
Preferably, the user behavior data includes one or more of user id, commodity id, behavior type, behavior time, behavior duration and scene.
Preferably, the behavior types are divided into positive behaviors and negative behaviors, and corresponding weights are set for the positive behaviors and the negative behaviors; in the forward behavior, the weight of clicking is 0.3, the weight of collecting is 0.5, the weight of clicking searching is 0.4, the weight of commenting is 0.2, the weight of sharing is 0.5, the weight of praise is 0.8, the weight of adding a shopping cart is 0.8, and the weight of consuming is 1; in the negative behavior, the weight of exposure is 0.1, the weight of canceling the shopping cart is 0.2, the weight of bad comment is 0.1, and the weight of dislike is 0.1.
Preferably, the data management module is directed at the data source platform and is led into the data management module through an HDFS, and the data format is csv, json, txt and/or excel; the modification mode of the data source comprises addition, coverage, update and/or deletion; the data management module has the preprocessing functions aiming at data, such as sampling, multi-table merging, filtering, column selection, row selection, null value processing, value replacement, data cleaning, merging, data editing, deduplication, sorting, aggregation analysis, row and column splitting and/or abnormal value processing.
Preferably, the feature engineering module comprises the functions of: feature selection, feature transformation, feature importance calculation, feature discretization, oneHot encoding, feature regularization, normalization, random forest feature selection, automatic feature combination and/or data set splitting.
Preferably, the model training platform configures a classification algorithm for the ranking function of the recommendation system, and the classification algorithm specifically includes: a logic classification algorithm, a decision tree algorithm, a random forest algorithm, a support vector machine algorithm, a gradient lifting tree algorithm, an XGboost algorithm and/or a LightGBM algorithm; the model training platform provides calculation resources of an algorithm, receives training samples to train the model, uses the test samples to verify the effect of the model, and finally outputs the trained model to the model management platform.
Preferably, the model management platform manages the models specifically including adding, replacing and/or deleting.
Advantageous effects
Compared with the prior art, the invention has the beneficial effects that:
the intelligent management platform of the recommendation system for one-stop feature calculation and model training provided by the invention provides a whole set of self-research platform management process, and performs unified management and quality control on the recommendation sequencing system; the online and offline features are managed by the feature management platform system, and consistency of the online and offline features is guaranteed.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic structural diagram of a recommendation system intelligent management platform for one-stop feature calculation and model training according to the present invention.
Detailed Description
The present invention is described in more detail below to facilitate an understanding of the invention.
Aiming at the problems of unclear boundary, unsmooth flow and the like of management of each link of a recommendation system in the prior art, the invention establishes a one-stop recommendation system intelligent platform, and performs unified management and intelligent allocation on related modules such as sample engineering, feature calculation, model training, on-line prediction and the like, thereby ensuring the working quality of the recommendation system and improving the stability and robustness of the system. The online and offline are a set of features, the unified management of the models can ensure the uniformity of the features and the models, and the real-time updating ensures that the latest models after iteration can be provided and better service effect is provided.
As shown in fig. 1, the recommendation system intelligent management platform for one-stop feature calculation and model training according to the present invention includes:
1. a data source platform: including user attribute data, commodity attribute data, and user behavior data.
(1) User attribute data: the basic data includes user id (user id), sex, age, grade, liveness, residence, mobile phone model, network signal, education level, marital condition, fertility condition, work industry and occupation, etc., and the user portrait data refers to consumption portrait data, behavior (browsing, praise, purchase, evaluation) portrait of the user, and user interest portrait data.
(2) Product attribute data: commodity id, commodity name, commodity shelf state, knowledge map classification, b2c classification, b2b classification, import or not, brand id, OTC representation, medicine type, commodity type, applicable crowd, sales volume and the like.
(3) User behavior data: user id, commodity id, behavior type, behavior time, behavior duration and scene.
The behavior types are classified into positive behaviors and negative behaviors, and weights are set thereto. Forward behavior: click (0.3), collect (0.5), click search (0.4), comment (0.2), share (0.5), like (0.8), add car (0.8), consume (1).
Negative going behavior: exposure (0.1), removal of car (0.2), bad comment (0.1), dislike (0.1).
2. A data management module: and the system is responsible for storing and extracting the user attribute data, the commodity attribute data and the user behavior data.
And aiming at a data source platform, importing the data source platform into a data management module through an HDFS (Hadoop distributed File System). The data format may be csv, json, txt, excel, etc.
The modification mode of the data source comprises the following steps: add, overwrite, update, delete.
In data mining, a large amount of incomplete, inconsistent and abnormal data exist in massive original data, the execution efficiency of data mining modeling is seriously influenced, and even deviation of a mining result can be caused, so that the data preprocessing is particularly important.
Preprocessing functions for data: sampling (random sampling, weighted sampling, hierarchical sampling, downsampling, SMOTE), multi-table merging, filtering, column selection, row selection, null processing, value replacement, data cleaning, merging, data editing, deduplication, sorting, aggregate analysis, splitting of rows and columns, and abnormal value processing.
3. A characteristic engineering module: and aiming at the extracted data, converting user attribute data into user characteristics, converting commodity attribute data into commodity characteristics, dividing user behavior data into two parts of training and testing, uniformly mapping the user characteristics and the commodity characteristics into vector spaces of training samples and testing samples according to user ids and commodity ids, dividing the training samples and the testing samples into 0.5 minute according to scores of the behavior data, wherein the labels of the samples greater than or equal to 0.5 are 1, and the labels of the samples smaller than 0.5 are 0.
The feature engineering module includes the functions of: feature selection, feature conversion, a feature importance calculation module, feature dispersion, oneHot coding, feature regularization, standardization, normalization, random forest feature selection, automatic feature combination and data set splitting.
4. A model training platform: and the device is responsible for storing different model training scripts, calling the models to train the training samples, and then using the test samples to perform effect testing.
For the sorting function of the recommendation system, a classification algorithm is configured here, specifically: a logic classification algorithm, a decision tree algorithm, a random forest algorithm, a support vector machine algorithm, a gradient lifting tree algorithm (GBDT), an XGboost algorithm, and a LightGBM algorithm.
The platform provides calculation resources of the algorithm, receives the training samples to train the model, uses the test samples to carry out effect verification on the model, and finally outputs the trained model to the model management platform.
5. A model management platform: and the system is responsible for storing various generated models and managing the models, such as adding, replacing, deleting and the like.
6. A feature management platform: and the system is responsible for storing the generated user characteristics and commodity characteristics and waiting for the calling of other modules.
7. An online prediction module: the system belongs to a module which is in butt joint with a user and a background, receives a user id and a recalled commodity id sequence transmitted from a front end, reads a user characteristic corresponding to the user id and a commodity characteristic corresponding to the commodity id from a characteristic management platform, performs splicing and mapping to a vector space, calls a corresponding model through a model management platform, scores the user-commodity to predict the interest value of the user to the commodity, sorts according to the score value, and outputs a commodity sorting result of the user.
The invention provides a whole set of self-research platform management process; the source data are managed and processed in a unified mode, so that the redundancy of tasks is avoided, and the data processing result is stable; the online and offline features are managed by the feature management platform system, and consistency of the online and offline features is guaranteed.
Aiming at the problems of unclear management limits, unsmooth flow and the like of all links of the recommendation system in the prior art, the invention establishes a one-stop recommendation system intelligent platform, performs unified management and intelligent allocation on related modules such as sample engineering, feature calculation, model training, on-line prediction and the like, ensures the working quality of the recommendation system, and improves the stability and robustness of the system. The online and offline are a set of features, the unified management of the models can ensure the uniformity of the features and the models, and the real-time updating ensures that the latest models after iteration can be provided and better service effect is provided.
The foregoing describes preferred embodiments of the present invention, but is not intended to limit the invention thereto. Modifications and variations of the embodiments disclosed herein may be made by those skilled in the art without departing from the scope and spirit of the invention.

Claims (10)

1. A recommendation system intelligent management platform for one-stop feature calculation and model training is characterized by comprising a data source platform, a data management module, a feature engineering module, a model training platform, a model management platform, a feature management platform and an online prediction module; the data source platform comprises user attribute data, commodity attribute data and user behavior data; the data management module is responsible for storing and extracting the user attribute data, the commodity attribute data and the user behavior data; the characteristic engineering module is used for converting user attribute data into user characteristics aiming at the extracted data, converting commodity attribute data into commodity characteristics, dividing the user behavior data into two parts of a training sample and a testing sample, uniformly mapping the user characteristics and the commodity characteristics into vector spaces of the training sample and the testing sample according to the user id and the commodity id, and dividing the vector spaces by 0.5 minute according to the score of the behavior data, wherein the sample label of more than or equal to 0.5 is 1, and the sample label of less than 0.5 is 0; the model training platform is responsible for storing different model training scripts, calling a model to train a training sample, and then using a test sample to perform effect testing; the model management platform is responsible for storing various generated models and managing the models; the characteristic management platform is responsible for storing the generated user characteristics and commodity characteristics and waiting for calling; the online prediction module belongs to a module which is in butt joint with a user and a background and is used for receiving a user id and a recalled commodity id sequence which are transmitted from a front end, reading a user characteristic corresponding to the user id and a commodity characteristic corresponding to the commodity id from a characteristic management platform, splicing and mapping the user characteristic and the commodity characteristic to a vector space, calling a corresponding model through a model management platform, scoring the user and the commodity to predict the interest value of the user to the commodity, sequencing according to the value of the interest value, and outputting a commodity sequencing result of the user.
2. The platform for intelligent management of recommendation system for one-stop feature calculation and model training according to claim 1, wherein the user attribute data comprises user basic data and user representation data, the basic data is one or more of user id, gender, age, level, liveness, residence, mobile phone model, network signal, education level, marital status, fertility status, industry and occupation where work is located, and the user representation data is one or more of consumption representation data, behavior representation and user interest representation data of the user.
3. The one-stop feature computation and model training recommendation system intelligent management platform of claim 2, wherein the behavior portraits comprise one or more of browsing, praise, buyback, and appraisal.
4. The platform for intelligent management of a recommendation system for one-stop feature computation and model training according to claim 1, wherein the commodity attribute data comprises one or more of commodity id, commodity name, commodity shelf status, knowledge graph classification, b2c classification, b2b classification, import or not, brand id, OTC representation, drug type, commodity type, applicable population and sales volume.
5. The platform of claim 1, wherein the user behavior data includes one or more of user id, commodity id, behavior type, behavior time, behavior duration, and scenario.
6. The platform of claim 5, wherein the behavior types are classified into positive behavior and negative behavior, and the positive behavior and the negative behavior are weighted correspondingly; in the forward behavior, the weight of clicking is 0.3, the weight of collecting is 0.5, the weight of clicking searching is 0.4, the weight of commenting is 0.2, the weight of sharing is 0.5, the weight of praise is 0.8, the weight of adding a shopping cart is 0.8, and the weight of consuming is 1; in the negative behavior, the weight of exposure is 0.1, the weight of canceling the shopping cart is 0.2, the weight of bad comment is 0.1, and the weight of dislike is 0.1.
7. The platform for intelligent management of a recommendation system for one-stop feature computation and model training according to claim 1, wherein the data management module is directed to the data source platform and is imported into the data management module through an HDFS, and the data format is csv, json, txt and/or excel; the modification mode of the data source comprises addition, coverage, update and/or deletion; the data management module has the preprocessing functions aiming at data, such as sampling, multi-table merging, filtering, column selection, row selection, null value processing, value replacement, data cleaning, merging, data editing, deduplication, sorting, aggregation analysis, row and column splitting and/or abnormal value processing.
8. The one-stop feature computation and model training recommendation system intelligent management platform of claim 1, wherein the feature engineering module comprises functions of: feature selection, feature transformation, feature importance calculation, feature dispersion, oneHot encoding, feature regularization, normalization, random forest feature selection, automatic feature combination, and/or data set splitting.
9. The recommendation system intelligent management platform for one-stop feature computation and model training according to claim 1, wherein the model training platform configures a classification algorithm for a ranking function of the recommendation system, and the classification algorithm specifically comprises: a logic classification algorithm, a decision tree algorithm, a random forest algorithm, a support vector machine algorithm, a gradient lifting tree algorithm, an XGboost algorithm and/or a LightGBM algorithm; the model training platform provides calculation resources of an algorithm, receives training samples to train the model, uses the test samples to verify the effect of the model, and finally outputs the trained model to the model management platform.
10. The one-stop feature computation and model training recommendation system intelligent management platform of claim 1, wherein the model management platform manages models specifically comprising adding, replacing and/or deleting.
CN202211025747.3A 2022-08-25 2022-08-25 One-stop feature calculation and model training recommendation system intelligent management platform Pending CN115409573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211025747.3A CN115409573A (en) 2022-08-25 2022-08-25 One-stop feature calculation and model training recommendation system intelligent management platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211025747.3A CN115409573A (en) 2022-08-25 2022-08-25 One-stop feature calculation and model training recommendation system intelligent management platform

Publications (1)

Publication Number Publication Date
CN115409573A true CN115409573A (en) 2022-11-29

Family

ID=84160709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211025747.3A Pending CN115409573A (en) 2022-08-25 2022-08-25 One-stop feature calculation and model training recommendation system intelligent management platform

Country Status (1)

Country Link
CN (1) CN115409573A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823408A (en) * 2023-08-29 2023-09-29 小舟科技有限公司 Commodity recommendation method, device, terminal and storage medium based on virtual reality
CN117709914A (en) * 2024-02-05 2024-03-15 天津徙木科技有限公司 Post matching method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823408A (en) * 2023-08-29 2023-09-29 小舟科技有限公司 Commodity recommendation method, device, terminal and storage medium based on virtual reality
CN116823408B (en) * 2023-08-29 2023-12-01 小舟科技有限公司 Commodity recommendation method, device, terminal and storage medium based on virtual reality
CN117709914A (en) * 2024-02-05 2024-03-15 天津徙木科技有限公司 Post matching method and system
CN117709914B (en) * 2024-02-05 2024-05-10 台州徙木数字服务有限公司 Post matching method and system

Similar Documents

Publication Publication Date Title
CN115409573A (en) One-stop feature calculation and model training recommendation system intelligent management platform
CN113626719A (en) Information recommendation method, device, equipment, storage medium and computer program product
CN112307762B (en) Search result sorting method and device, storage medium and electronic device
EP2988230A1 (en) Data processing method and computer system
CN105894336A (en) Mobile Internet-based big data mining method and system
CN110490685A (en) A kind of Products Show method based on big data analysis
US9882949B1 (en) Dynamic detection of data correlations based on realtime data
CN111369344B (en) Method and device for dynamically generating early warning rules
CN112182362A (en) Method and device for training model for online click rate prediction and recommendation system
CN111967971A (en) Bank client data processing method and device
CN108628882A (en) Method and system for prejudging problem
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
CN114328277A (en) Software defect prediction and quality analysis method, device, equipment and medium
CN113144624B (en) Data processing method, device, equipment and storage medium
CN112508440B (en) Data quality evaluation method, device, computer equipment and storage medium
CN116340643B (en) Object recommendation adjustment method and device, storage medium and electronic equipment
Gezici et al. Neural sentiment analysis of user reviews to predict user ratings
CN117312657A (en) Abnormal function positioning method and device for financial application, computer equipment and medium
CN115344794A (en) Scenic spot recommendation method based on knowledge map semantic embedding
CN115599871A (en) Lake and bin integrated data processing system and method
CN115393098A (en) Financing product information recommendation method and device
CN110443646B (en) Product competition relation network analysis method and system
CN114004513A (en) Demand prediction method, system and storage medium
CN113722487A (en) User emotion analysis method, device and equipment and storage medium
CN113553501A (en) Method and device for user portrait prediction based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination