CN116485449A - Data prediction method and system applied to electronic commerce - Google Patents

Data prediction method and system applied to electronic commerce Download PDF

Info

Publication number
CN116485449A
CN116485449A CN202310446184.3A CN202310446184A CN116485449A CN 116485449 A CN116485449 A CN 116485449A CN 202310446184 A CN202310446184 A CN 202310446184A CN 116485449 A CN116485449 A CN 116485449A
Authority
CN
China
Prior art keywords
data
flow data
user
value
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310446184.3A
Other languages
Chinese (zh)
Inventor
马靖航
石继刚
杨�远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Pinxing Huayue Trading Co ltd
Original Assignee
Guangzhou Pinxing Huayue Trading Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Pinxing Huayue Trading Co ltd filed Critical Guangzhou Pinxing Huayue Trading Co ltd
Priority to CN202310446184.3A priority Critical patent/CN116485449A/en
Publication of CN116485449A publication Critical patent/CN116485449A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/24Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/1396Protocols specially adapted for monitoring users' activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data prediction method and a system applied to electronic commerce, wherein the method comprises the following steps: acquiring a flow data set in a period to be predicted, and extracting first data characteristics and first user characteristics of each flow data in the flow data set; classifying each piece of flow data to obtain a historical flow data set, extracting second data features and second user features of each piece of flow data in the historical flow data set, and clustering according to the historical flow data; calculating the order success rate corresponding to each historical flow data cluster respectively; and screening out target yield with the corresponding similarity larger than a set threshold, calculating corresponding yield according to products of the target yield and corresponding current flow data clusters, and calculating total yield according to sum of the yields respectively corresponding to the current flow data clusters. By applying the embodiment of the invention, the predicted data can be more accurate.

Description

Data prediction method and system applied to electronic commerce
Technical Field
The invention relates to the technical field of electronic commerce, in particular to a data prediction method and system applied to electronic commerce.
Background
Electronic commerce is not limited by geography and time, so that the electronic commerce has high requirement on timeliness, for example, the response of inventory data is required to be accurate and rapid.
The prior art discloses a method for generating a turn-up quantity suggestion of an online commodity, which comprises the following steps: acquiring the flow and the conversion rate of the commodity in the preset number of days on line of the online store, and acquiring the total flow and the total conversion rate of the commodity in the whole life cycle according to the flow and the conversion rate of the commodity in the preset number of days on line of the online store, wherein the life cycle is that the flow of the commodity after the commodity is on line meets a preset flow threshold condition or the time of the commodity on line meets a preset time threshold condition; calculating the water flow of the commodity according to the total flow and the total conversion rate; acquiring the return rate of the commodity, and generating sales volume data of the commodity according to the return rate and the water flow volume; and generating a turn-over quantity suggestion according to the sales quantity data.
In the prior art, the total sales volume data of the commodity is calculated by acquiring the flow and the conversion rate in the preset days after the commodity is online, so that the bill turning volume suggestion can be generated in the preset days after the commodity is online, however, the conversion rates corresponding to different crowds and different channels are different, and therefore, the bill of delivery calculated in the prior art is not accurate enough.
Disclosure of Invention
The invention aims to provide a data prediction method and a system applied to electronic commerce.
The invention solves the technical problems through the following technical scheme:
the invention provides a data prediction method applied to electronic commerce, which comprises the following steps:
acquiring a flow data set in a period to be predicted, and extracting first data features and first user features of each flow data in the flow data set, wherein the first data features comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;
classifying each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;
acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;
calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.
Optionally, the acquiring the flow data set in the period to be predicted includes:
and acquiring flow data in a period to be predicted, and performing authentication and cleaning on the flow data to obtain a data flow set.
Optionally, the classifying processing is performed on each piece of traffic data according to the first data feature and the first user feature to obtain a current traffic data cluster, including:
for each piece of flow data, marking the flow data by taking the first data characteristic and the first user characteristic as tag values to obtain marked flow data;
and obtaining the category number of the tag value corresponding to the first data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the current flow data cluster.
Optionally, the obtaining the number of kinds of tag values corresponding to the first data feature includes:
splicing the first data characteristic and the first user characteristic into a tag value;
and performing de-duplication treatment on the tag value to obtain a de-duplicated tag value, and performing counting treatment on the de-duplicated tag value.
Optionally, the splicing the first data feature and the first user feature into the tag value includes:
splicing the first data characteristic before the first user characteristic into a first label value;
splicing the first data feature after the first user feature into a second tag value;
the first tag value and the second tag value are collected as tag values.
Optionally, the performing deduplication processing on the tag value includes:
adding each tag value into a tag value set, and judging whether a first tag value corresponding to a current tag value is the same as the first tag value of other tag values or not according to the current tag value;
if yes, deleting the current tag value;
if not, judging whether the first label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value and taking the next label value as the current label value, and returning to execute the step of judging whether the first label value corresponding to the current label value is the same as the first label value of other label values; if not, executing the next step;
judging whether a second label value corresponding to a current label value is the same as a first label value of other label values or not according to the current label value;
if yes, deleting the current tag value;
if not, judging whether the second label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value and taking the next label value as the current label value, returning to execute the step of judging whether the first label value corresponding to the current label value is the same as the first label value of other label values or not, and if not, reserving the current label value.
Optionally, the classifying the historical traffic data according to the second data feature and the second user feature to obtain a historical traffic data cluster includes:
for each piece of flow data, marking the historical flow data by taking the second data characteristic and the second user characteristic as tag values to obtain marked historical flow data;
and obtaining the category number of the tag value corresponding to the second data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the historical flow data cluster.
Optionally, the screening the target success rate with the corresponding similarity greater than the set threshold value from the order success rates according to the similarity between the first data feature and the second data feature and the similarity between the first user feature and the second user feature includes:
screening out a first success rate with the corresponding similarity larger than a set threshold value from the success rates of all orders according to the similarity of the first data characteristic and the second data characteristic;
screening second success rates with the corresponding similarity larger than a set threshold value from the order success rates according to the similarity of the first user characteristics and the second user characteristics;
and taking the set of the first success rate and the second success rate as a target success rate.
Optionally, the setting the set of the first success rate and the second success rate as the target success rate includes:
taking the success rate that the similarity of the first data characteristic and the second data characteristic and the similarity of the first user characteristic and the second user characteristic are larger than a set threshold value as a third success rate;
calculating a target yield using the formula, t=w1 a+w2 b+w3 c, wherein,
t is the target yield; w1 is the weight corresponding to the first success rate; a is the average value of the first yield; w2 is the weight corresponding to the first success rate; b is the average value of the second yield; w3 is the weight corresponding to the third traffic rate; c is the average of the third yield.
The invention provides a data prediction system applied to electronic commerce, which comprises:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a flow data set in a period to be predicted, and extracting first data characteristics and first user characteristics of each piece of flow data in the flow data set, wherein the first data characteristics comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;
the classification module is used for respectively carrying out classification processing on each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;
acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;
the calculation module is used for calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.
Compared with the prior art, the invention has the following advantages:
the method comprises the steps of clustering flow data sets in a period to be predicted to obtain current flow data clusters, and clustering historical flow data to obtain historical flow data clusters; and then screening out corresponding target yield according to the similarity of the clustering result, calculating corresponding yield according to the product of the target yield and the current flow data clustering, and taking the yield as the basis for inventory data adjustment.
Drawings
FIG. 1 is a schematic flow chart of a data prediction method applied to electronic commerce according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a process for de-duplication of a current tag value in a data prediction method applied to electronic commerce according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data prediction system applied to electronic commerce according to an embodiment of the present invention.
Detailed Description
The following describes in detail the examples of the present invention, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of protection of the present invention is not limited to the following examples.
Example 1
Fig. 1 is a flow chart of a data prediction method applied to electronic commerce according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s101: acquiring a flow data set in a period to be predicted, and extracting first data features and first user features of each flow data in the flow data set, wherein the first data features comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;
and obtaining flow data in a period to be predicted, and carrying out authentication and cleaning on the flow data to obtain a flow data set. Specifically, data cleansing refers to deleting, correcting, or deleting erroneous, incomplete, misformatted, or redundant data in a database. Data cleansing not only corrects errors, but also enhances consistency between different data from each individual information system. The special data cleaning software can automatically detect the data file, correct error data and integrate the data in a format consistent with the whole enterprise.
Each piece of traffic data in the traffic data set is processed as follows:
and processing information contained in each flow data according to the storage position of each field in the flow data and the stored field content, for example, the 10 th to 30 th bits in the flow data are stored as IP addresses, extracting the 10 th to 30 th bits of characters, checking whether the characters are IP addresses or not, and taking the 10 th to 30 th bits of characters as the IP addresses after the characters pass the check.
Extracting data characteristics of each first user according to the method, wherein the login equipment type can comprise a computer, a mobile phone, other intelligent terminal equipment and the like; such as login port types may include: HTTP proxy port, socks proxy port, FTP proxy port, telent proxy port, etc.; the data types may include a numerical type, a string type, and a date-time type.
The first user characteristic comprises: one or a combination of user identification information and user behavior information, wherein the user behavior information may include: logging in data, viewing data, praise data, joining shopping cart data, payment data, etc.
S102: classifying each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;
for each piece of flow data, marking the flow data by taking the first data characteristic and the first user characteristic as tag values to obtain marked flow data:
the flow data 1 corresponds to the first data characteristic 1 and the first user characteristic 1;
the flow data 2 corresponds to the first data characteristic 2 and the first user characteristic 2;
the flow data 3 corresponds to the first data characteristic 1 and the first user characteristic 2;
the flow data 4 corresponds to the first data characteristic 1 and the first user characteristic 2;
the flow data 5 corresponds to the first data characteristic 3 and the first user characteristic 3;
then, in order to avoid the inversion of the storage positions of the first data feature and the first user feature in some data protocols, the first data feature may be spliced into a first tag value before the first user feature; splicing the first data feature after the first user feature into a second tag value; the first tag value and the second tag value are set as tag values, for example,
the first tag value corresponding to the flow data 1 is: first data feature 1-first user feature 1;
the second tag value corresponding to the flow data 1 is: first user feature 1-first data feature 1;
the first tag value corresponding to the flow data 2 is: first data feature 2-first user feature 2;
the second tag value corresponding to the flow data 2 is: first user feature 2-first data feature 2;
the first tag value corresponding to the flow data 3 is: first data feature 1-first user feature 2;
the second tag value corresponding to the flow data 3 is: first user feature 2-first data feature 1;
the first tag value corresponding to the flow data 4 is: first data feature 1-first user feature 2;
the second tag value corresponding to the flow data 4 is: first user feature 2-first data feature 1;
thus, each flow data has a label value, and the number of label values of each flow data is 2.
Then, in embodiment 1 of the present invention, the tag value may be subjected to a duplication removal process to obtain a duplication-removed tag value, and the duplication-removed tag value may be subjected to a counting process. For example, since the tag values of the flow data 3 and the flow data 4 overlap, if the tag value of the flow data 3 and the tag value of the flow data 4 are identical, the tag values corresponding to the flow data 3 and the flow data 4 are subjected to the deduplication process, and the number of tag values after the deduplication is obtained.
And taking the number of the categories as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the current flow data cluster.
Further, in order to reduce the number of K values and improve the clustering efficiency, the number of kinds of tag values corresponding to the first data feature is obtained.
Fig. 2 is a schematic diagram of a process of de-duplication of a current tag value in a data prediction method applied to electronic commerce according to an embodiment of the present invention, where as shown in fig. 2, the method includes:
s201: adding each tag value into a tag value set, and judging whether a first tag value corresponding to a current tag value is the same as the first tag value of other tag values or not according to the current tag value;
s202: if the judgment result in the step S201 is yes, deleting the current tag value; for example, if the first tag values of the traffic data 1 and the traffic data 3 are the same, the tag value of the traffic data 1 or the traffic data 3 is deleted, and only one tag value is reserved, so that the first data feature can be subjected to de-duplication processing, and the type number of the first data feature can be obtained. After the deletion, taking the next tag value as the current tag value, and returning to execute the step S201;
s203: if the judgment result in the step S201 is no, judging whether the first label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value; if not, executing S204;
s204: judging whether a second label value corresponding to a current label value is the same as a first label value of other label values or not according to the current label value;
s205: if the judgment result of the step S204 is yes, deleting the current tag value;
s206: if the judgment result in the step S204 is no, judging whether the second tag value corresponding to the current tag value is the same as the second tag values of other tag values, if so, deleting the current tag value, taking the next tag value as the current tag value after deleting, and returning to the step S201, otherwise, reserving the current tag value.
S103: acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;
specifically, for each piece of flow data, marking the historical flow data by taking the second data characteristic and the second user characteristic as tag values to obtain marked historical flow data;
and obtaining the category number of the tag value corresponding to the second data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the historical flow data cluster.
It can be understood that the method for marking and clustering the historical traffic data is the same as the process in step S102, and the embodiment of the present invention will not be described herein.
S104: calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.
Firstly, carrying out similarity calculation on a first data characteristic of each flow data in each current flow data cluster and a second data characteristic of each historical flow data cluster aiming at each current flow data cluster to obtain a plurality of similarity values, and screening out a first target similarity with similarity larger than a set threshold value from the plurality of similarity values; the historical traffic clusters corresponding to the first target similarity have a first success rate.
Similarly, performing similarity calculation on the first user characteristic of each flow data in the current flow data cluster and the second user characteristic of each historical flow data cluster to obtain a plurality of similarity values, and screening a second target similarity with similarity larger than a set threshold value from the plurality of similarity values; the historical traffic clusters corresponding to the second target similarity have a second success rate.
And taking the set of the first success rate and the second success rate as a target success rate. And then multiplying the target yield by the number of the flow data in the current flow data cluster to obtain the yield corresponding to the current flow data cluster. And then, summing the corresponding traffic volumes of each current flow data cluster to obtain the total traffic volume.
In a specific implementation manner of the embodiment of the present invention, since the number of values included in the first success rate may be more than one, and similarly, the number of values included in the second success rate may be more than one, the step of taking the set of the first success rate and the second success rate as the target success rate may include:
taking the success rate that the similarity of the first data characteristic and the second data characteristic and the similarity of the first user characteristic and the second user characteristic are larger than a set threshold value as a third success rate;
calculating a target yield using the formula, t=w1 a+w2 b+w3 c, wherein,
t is the target yield; w1 is the weight corresponding to the first success rate; a is the average value of the first yield; w2 is the weight corresponding to the first success rate; b is the average value of the second yield; w3 is the weight corresponding to the third traffic rate; c is the average of the third yield.
Further, after the period to be predicted has elapsed, the flow data in the period to be predicted may be added to the historical flow data, and then step S101 may be performed for the next period to be predicted.
Example 2
Corresponding to embodiment 1 of the present invention, embodiment 2 of the present invention provides a data prediction system applied to electronic commerce.
Fig. 3 is a schematic structural diagram of a data prediction system applied to electronic commerce according to an embodiment of the present invention, where, as shown in fig. 3, the system includes:
the obtaining module 201 is configured to obtain a flow data set in a period to be predicted, and extract a first data feature and a first user feature of each piece of flow data in the flow data set, where the first data feature includes: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;
the classification module 202 is configured to perform classification processing on each piece of traffic data according to the first data feature and the first user feature, so as to obtain a current traffic data cluster;
acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;
the calculating module 203 is configured to calculate an order success rate corresponding to each historical traffic data cluster; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (10)

1. A data prediction method applied to electronic commerce, the method comprising:
acquiring a flow data set in a period to be predicted, and extracting first data features and first user features of each flow data in the flow data set, wherein the first data features comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;
classifying each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;
acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;
calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.
2. The method for predicting data applied to electronic commerce according to claim 1, wherein the acquiring the set of traffic data in the period to be predicted comprises:
and acquiring flow data in a period to be predicted, and performing authentication and cleaning on the flow data to obtain a data flow set.
3. The method for predicting data applied to electronic commerce according to claim 1, wherein the classifying each piece of traffic data according to the first data feature and the first user feature to obtain a current traffic data cluster includes:
for each piece of flow data, marking the flow data by taking the first data characteristic and the first user characteristic as tag values to obtain marked flow data;
and obtaining the category number of the tag value corresponding to the first data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the current flow data cluster.
4. A data prediction method applied to electronic commerce according to claim 3, wherein the obtaining the number of kinds of tag values corresponding to the first data feature comprises:
splicing the first data characteristic and the first user characteristic into a tag value;
and performing de-duplication treatment on the tag value to obtain a de-duplicated tag value, and performing counting treatment on the de-duplicated tag value.
5. The method for predicting data to be applied to electronic commerce according to claim 4, wherein the stitching the first data feature and the first user feature into the tag value comprises:
splicing the first data characteristic before the first user characteristic into a first label value;
splicing the first data feature after the first user feature into a second tag value;
the first tag value and the second tag value are collected as tag values.
6. The method for predicting data to be applied to electronic commerce according to claim 4, wherein the performing the de-duplication process on the tag value comprises:
adding each tag value into a tag value set, and judging whether a first tag value corresponding to a current tag value is the same as the first tag value of other tag values or not according to the current tag value;
if yes, deleting the current tag value;
if not, judging whether the first label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value and taking the next label value as the current label value, and returning to execute the step of judging whether the first label value corresponding to the current label value is the same as the first label value of other label values; if not, executing the next step;
judging whether a second label value corresponding to a current label value is the same as a first label value of other label values or not according to the current label value;
if yes, deleting the current tag value;
if not, judging whether the second label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value and taking the next label value as the current label value, returning to execute the step of judging whether the first label value corresponding to the current label value is the same as the first label value of other label values or not, and if not, reserving the current label value.
7. The method for predicting data applied to electronic commerce according to claim 1, wherein the classifying the historical traffic data according to the second data feature and the second user feature to obtain the historical traffic data cluster comprises:
for each piece of flow data, marking the historical flow data by taking the second data characteristic and the second user characteristic as tag values to obtain marked historical flow data;
and obtaining the category number of the tag value corresponding to the second data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the historical flow data cluster.
8. The method for predicting data applied to electronic commerce according to claim 1, wherein the step of screening out target achievement rates with a corresponding similarity greater than a set threshold from the various order achievement rates according to the similarity between the first data feature and the second data feature and the similarity between the first user feature and the second user feature comprises:
screening out a first success rate with the corresponding similarity larger than a set threshold value from the success rates of all orders according to the similarity of the first data characteristic and the second data characteristic;
screening second success rates with the corresponding similarity larger than a set threshold value from the order success rates according to the similarity of the first user characteristics and the second user characteristics;
and taking the set of the first success rate and the second success rate as a target success rate.
9. The method for predicting data to be used in electronic commerce according to claim 8, wherein the step of setting the set of the first and second rates as the target rate of the transaction comprises:
taking the success rate that the similarity of the first data characteristic and the second data characteristic and the similarity of the first user characteristic and the second user characteristic are larger than a set threshold value as a third success rate;
calculating a target yield using the formula, t=w1 a+w2 b+w3 c, wherein,
t is the target yield; w1 is the weight corresponding to the first success rate; a is the average value of the first yield; w2 is the weight corresponding to the first success rate; b is the average value of the second yield; w3 is the weight corresponding to the third traffic rate; c is the average of the third yield.
10. A data prediction system for use in electronic commerce, the system comprising:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a flow data set in a period to be predicted, and extracting first data characteristics and first user characteristics of each piece of flow data in the flow data set, wherein the first data characteristics comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;
the classification module is used for respectively carrying out classification processing on each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;
acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;
the calculation module is used for calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.
CN202310446184.3A 2023-04-23 2023-04-23 Data prediction method and system applied to electronic commerce Pending CN116485449A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310446184.3A CN116485449A (en) 2023-04-23 2023-04-23 Data prediction method and system applied to electronic commerce

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310446184.3A CN116485449A (en) 2023-04-23 2023-04-23 Data prediction method and system applied to electronic commerce

Publications (1)

Publication Number Publication Date
CN116485449A true CN116485449A (en) 2023-07-25

Family

ID=87220880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310446184.3A Pending CN116485449A (en) 2023-04-23 2023-04-23 Data prediction method and system applied to electronic commerce

Country Status (1)

Country Link
CN (1) CN116485449A (en)

Similar Documents

Publication Publication Date Title
CN107341716B (en) Malicious order identification method and device and electronic equipment
US11023889B2 (en) Enhanced merchant identification using transaction data
CN109300003B (en) Enterprise recommendation method, enterprise recommendation device, computer equipment and storage medium
CN109191226B (en) Risk control method and device
CN107909178B (en) Electronic device, loss of association repair rate prediction method, and computer-readable storage medium
US20230306449A1 (en) Pre-processing financial market data prior to machine learning training
CN116823409B (en) Intelligent screening method and system based on target search data
WO2020140681A1 (en) Numerical value calculation method and apparatus, computer device, and storage medium
CN114116802A (en) Data processing method, device, equipment and storage medium of Flink computing framework
CN114090601B (en) Data screening method, device, equipment and storage medium
CN111311381A (en) Commodity recommendation method and system
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
CN114756669A (en) Intelligent analysis method and device for problem intention, electronic equipment and storage medium
US20240134860A1 (en) Order searching method, apparatus, computer device, and storage medium
CN111353874B (en) Intelligent service system of bank outlets
CN113128218A (en) Key field extraction method and device for bidding information
CN116485449A (en) Data prediction method and system applied to electronic commerce
CN114840660A (en) Service recommendation model training method, device, equipment and storage medium
CN112966504B (en) Name identification and association recommendation method and device, computer equipment and storage medium
CN115795408B (en) Intelligent community commodity big data intelligent bill spelling and settlement method based on blockchain
CN112507079B (en) Document case situation matching method, device, equipment and storage medium
CN113157788B (en) Big data mining method and system
US20220343358A1 (en) Automated auditing and recommendation systems and methods
CN116486413A (en) Label processing method and device, electronic equipment and storage medium
CN111581512A (en) Webpage visitor number statistical method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20230725