CN116320626B - Method and system for calculating live broadcast heat of electronic commerce - Google Patents

Method and system for calculating live broadcast heat of electronic commerce Download PDF

Info

Publication number
CN116320626B
CN116320626B CN202310525800.4A CN202310525800A CN116320626B CN 116320626 B CN116320626 B CN 116320626B CN 202310525800 A CN202310525800 A CN 202310525800A CN 116320626 B CN116320626 B CN 116320626B
Authority
CN
China
Prior art keywords
live broadcast
data
heat
live
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310525800.4A
Other languages
Chinese (zh)
Other versions
CN116320626A (en
Inventor
王樱颐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xingyiteng Technology Electronics Co ltd
Original Assignee
Shenzhen Xingyiteng Technology Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xingyiteng Technology Electronics Co ltd filed Critical Shenzhen Xingyiteng Technology Electronics Co ltd
Priority to CN202310525800.4A priority Critical patent/CN116320626B/en
Publication of CN116320626A publication Critical patent/CN116320626A/en
Application granted granted Critical
Publication of CN116320626B publication Critical patent/CN116320626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44204Monitoring of content usage, e.g. the number of times a movie has been viewed, copied or the amount which has been watched
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4665Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4667Processing of monitored end-user data, e.g. trend analysis based on the log file of viewer selections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/47815Electronic shopping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/16Customisation or personalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a method and a system for calculating live broadcast heat of an electronic commerce. The method comprises the following steps: and collecting live broadcast data for preprocessing of data cleaning, de-duplication and downsampling, calculating a real-time live broadcast heat index according to the live broadcast data by adopting a weighted increment model, establishing a historical data analysis model by adopting a support vector regression algorithm, judging the content of interest of a viewer on the live broadcast, and predicting the future live broadcast heat. According to the method and the system, the real-time live broadcast heat index can be calculated more accurately, so that the live broadcast heat index is more representative and accurate, the marketing strategy and effect of live broadcast of an electronic commerce are improved, all categories are ordered according to the interest degree according to the comment quantity, a live broadcast person can answer according to the priority of a large number of consultation problems conveniently, the demands of most viewers are met, the live broadcast quality is improved, and the personalized customization of a live broadcast heat evaluation system is realized.

Description

Method and system for calculating live broadcast heat of electronic commerce
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a system for calculating live broadcast heat of an electronic commerce.
Background
With the rapid development of online operation of electronic commerce, live electronic commerce is rapidly becoming a big tuyere of electronic commerce industry, and the live electronic commerce combines traditional offline shopping with online shopping to provide strong online shopping experience, so that a large amount of resources are invested to develop the live electronic commerce, one of the important indexes of live electronic commerce is live broadcast heat, live broadcast quality can be effectively evaluated through accurate calculation and analysis of the live broadcast heat, a marketing scheme is optimized, and the purchase conversion rate of users is improved;
however, at present, when calculating and analyzing the live broadcast heat, the following problems exist:
firstly, the traditional live broadcast hotness index is rough and limited, the actual condition of live broadcast hotness cannot be reflected, and the actual requirement of a live broadcast electric business is difficult to meet;
secondly, live E-commerce faces a large number of user consultation and service works, quick response and processing are needed, but the traditional manual processing mode is difficult to be qualified;
thirdly, a great deal of time and labor are consumed in calculation and analysis of live broadcast heat, and real-time tracking and effective evaluation cannot be realized; in view of the above, a method and a system for calculating live broadcast heat of an electronic commerce are provided, which solve the problems of traditional live broadcast heat calculation and analysis and have positive effects on promoting the development of live broadcast electronic commerce industry.
Disclosure of Invention
The invention aims to provide a method and a system for calculating live broadcast heat of an electronic commerce, so as to solve the problems in the background technology.
In order to solve the above technical problems, one of the purposes of the present invention is to provide a method for calculating live broadcast heat of an electronic commerce, which includes the following steps:
s1, acquiring live broadcast data of an electronic commerce live broadcast platform, wherein the live broadcast data comprise live broadcast titles, watching times, praise numbers, comment numbers and sharing numbers;
s2, preprocessing data cleaning, de-duplication and downsampling is carried out on the collected live broadcast data;
s3, calculating a real-time live broadcast heat index according to live broadcast data by adopting a weighted increment model;
s4, a support vector regression algorithm is applied, a historical data analysis model is established, and future live broadcast heat is predicted;
s5, judging the content of interest of the viewer on the live broadcast, establishing an analysis table of the content of interest of the live broadcast, and realizing personalized customization of a live broadcast heat evaluation system according to the predicted future live broadcast heat.
Preferably, the step of collecting live broadcast data in the step S1 is to obtain live broadcast data of the live broadcast platform of the electronic commerce in real time through an API interface.
Preferably, the step of performing data cleansing on the live data in S2 includes the following steps:
space and blank line in the data are removed: removing blank spaces at two ends of a character string by using a strip function or a regular expression, and deleting blank lines;
processing outliers in the data;
processing missing values in the data: for the missing value, adopting an interpolation method and a deletion method to process;
the step S2 of carrying out data deduplication on the live broadcast data comprises the following steps:
detecting unique attributes of live broadcast data by using a drop_redundant function in a Pandas library, and removing repeated recorded data;
the step of downsampling the live data in S2 includes the following steps:
defining a time interval;
for the data in each time interval, an average or median is used instead of the original data.
Preferably, the step S3 includes the following steps:
giving weight to the index of the live broadcast data;
and calculating the real-time live broadcast heat index according to the weight and the increment of each index.
Preferably, specifically, the expression of the weighted incremental model is: defining a plurality of time periods [ ti, t { i+1} ];
in the time period [ ti, t { i+1} ], the calculation of the heat index H (t) of the living room adopts a weighted average method, namely: h (t) =α H (t-1) + (1- α) W (t-1) W (t);
wherein H (t-1) represents the heat index of the last time period, W (t-1) represents the weight of the user to the live broadcasting room interaction in the last time period, W (t) represents the weight of the user to the live broadcasting room interaction in the time period at the current moment, and alpha is a smoothing coefficient, and the function of the smoothing coefficient is to smooth the historical weight.
Preferably, in the step S5, the content that the viewer is interested in the live broadcast is determined to adopt a keyword classification and sorting algorithm, and the method includes the following steps:
word segmentation processing is carried out on the comment content, and each comment is divided into words;
a list is created containing keywords, which are classified into different categories.
Matching all words in the keyword list by using a fuzzy character string matching algorithm aiming at each word in the comments to find out comments associated with the classification;
calculating a weight score for each comment for the associated category;
classifying each comment into related classifications, counting the occurrence times of each classification in all comments, and sorting according to the occurrence times to obtain the most interesting classification of the viewer.
Preferably, the expression of the keyword classification ordering algorithm is:
for each comment Cj, a weight score W { i, j } is calculated for each category Ti, for representing the semantic relevance of the comment to each category, with the formula:
wherein mi (c { j, k }) represents the weight of the classification Ti corresponding to the kth word in the jth comment Cj, and n represents the number of words in the comment Cj;
for each category Ti, the weight score associated with it in all comments is counted, representing the overall interest level Ii of the viewer in that category, with the formula:
where m represents the number of comments.
The second object of the present invention is to provide a system for calculating live broadcast heat of an electronic commerce, which is applied to the method for calculating live broadcast heat of an electronic commerce described in any one of the above, and includes a live broadcast data collection module, a data preprocessing module, a live broadcast heat calculation module, a prediction analysis module and a custom analysis module;
the live broadcast data collection module is used for collecting live broadcast data of the live broadcast platform of the consumer;
the data preprocessing module is used for preprocessing the collected live broadcast data through data cleaning, de-duplication and downsampling;
the live broadcast heat calculation module is used for calculating a live broadcast heat index in real time according to live broadcast data;
the prediction analysis module is used for establishing a historical data analysis model and predicting future live broadcast heat;
the customization analysis module is used for judging the content of interest of the viewer for live broadcast, establishing an analysis table of the content of interest of the live broadcast, and realizing personalized customization of a live broadcast heat evaluation system according to the predicted future live broadcast heat.
Compared with the prior art, the invention has the beneficial effects that:
1. the live broadcast data of the commercial live broadcast platform is collected, the collected live broadcast data is subjected to data cleaning, de-duplication and downsampling pretreatment, then a weighted increment model is adopted, the real-time live broadcast heat index is calculated according to the live broadcast data, and the weighted increment model is utilized, so that the real-time live broadcast heat index can be calculated more accurately. Through analyzing and weighting a plurality of indexes such as the number of watching, the number of praise, the number of comments, the number of shares and the like, the live broadcast heat index is more representative and accurate, a support vector regression algorithm is applied, a historical data analysis model is established, the content of interest of a viewer on live broadcast is judged, and the future live broadcast heat is predicted, so that better analysis and decision basis is provided for enterprises, and the marketing strategy and effect of live broadcast of electronic commerce are improved;
2. according to the method, all the classifications are ordered according to the interest degree, the most interesting classification of audiences is obtained, the direct broadcast player can answer according to the priority of more consultation problems, the needs of most viewers are met, the live broadcast quality is improved, meanwhile, the consultation problems are classified, missing of the answer of the consultation problems is avoided, the improvement of live broadcast heat is facilitated, meanwhile, when the next live broadcast is carried out, the problem which is not timely answered can be solved independently, live broadcast is established for customizing interesting contents of the masses, and personalized customization of a live broadcast heat evaluation system is achieved.
Drawings
FIG. 1 is an overall flow block diagram of embodiment 1;
FIG. 2 is a flow chart of a weighted delta model of example 1;
fig. 3 is a flowchart of the keyword classification ranking algorithm of embodiment 1.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1-3, a first embodiment of the present invention is to provide a method for calculating live heat of an electronic commerce, which includes the following steps:
s1, acquiring live broadcast data of an electronic commerce live broadcast platform, wherein the live broadcast data comprise live broadcast titles, watching times, praise numbers, comment numbers and sharing numbers;
the step S1 of acquiring live broadcast data is to acquire live broadcast data of an e-commerce live broadcast platform in real time through an API interface, wherein the API is an application program interface, which refers to a communication mode between software systems or between different software modules, and the live broadcast data in the live broadcast platform can be acquired through the API interface, and the live broadcast data acquisition principle of the e-commerce live broadcast platform is as follows: the electronic commerce live broadcast platform can provide an API interface for third party developers, so that the third party developers can acquire relevant data of a live broadcast room, such as live broadcast titles, watching times, praise numbers, comment numbers, sharing numbers and the like through the API interface; the developer programs the API interface through a program, sends a request, acquires returned data and stores the data into a database or a data warehouse; the live broadcast data of the live broadcast platform of the electronic commerce can be rapidly and accurately obtained, so that the method is more accurately and effectively supported in the aspects of strategy formulation, live broadcast optimization, future development trend prediction and the like.
S2, preprocessing data cleaning, de-duplication and downsampling is carried out on the collected live broadcast data, so that occupied space of the live broadcast data is reduced, and data analysis is more accurate conveniently;
the step S2 of cleaning the live broadcast data comprises the following steps:
space and blank line in the data are removed: removing blank spaces at two ends of a character string by using a strip function or a regular expression, and deleting blank lines;
processing outliers in the data: for example, changing an abnormal variable of 0 in the data to NAN;
processing missing values in the data: for the missing values, an interpolation method and a deletion method can be adopted, wherein the interpolation method comprises linear interpolation, lagrange interpolation and other methods; the deletion method comprises methods such as sample deletion, variable deletion and the like, and realizes the identification of subsequent data;
the step S2 of carrying out data deduplication on the live broadcast data comprises the following steps:
detecting unique attributes of live broadcast data by using a drop_redundant function in a Pandas library, and removing repeated recorded data;
the step of downsampling the live data in S2 includes the following steps:
defining a time interval, for example 1 hour, 6 hours or 1 day;
for the data in each time interval, the average value or the median is adopted to replace the original data, the downsampling can be to delete the data directly, reduce the detection quantity and improve the data processing efficiency, and if the service problem needs to analyze the data spanning a period of time, the downsampling can be adopted to reduce the data density to improve the data processing efficiency, and the data can be smoother.
And S3, calculating the real-time live broadcast heat index according to the live broadcast data by adopting a weighted increment model, and more accurately calculating the real-time live broadcast heat index by utilizing the weighted increment model. The live broadcast heat index is more representative and accurate by analyzing and weighting a plurality of indexes such as the number of watching times, the number of praise, the number of comments, the number of shares and the like;
the step S3 comprises the following steps:
giving weight to the index of the live broadcast data and calculating;
and calculating the real-time live broadcast heat index according to the weight and the increment of each index.
Specifically, the expression of the weighted increment model is: defining a plurality of time periods [ ti, t { i+1} ], such as one or five minutes as one time period, and calculating a heat index of the living room in each time period, wherein two basic indexes are required to be counted for each time period:
the live broadcast duration (second is the unit) in the current time period represents the interaction degree of the user and the anchor in the time period;
the interaction times in the current time period comprise praise, comment and share of the user, and the interaction degree of the user in the time period is represented;
in the time period [ ti, t { i+1} ], the calculation of the heat index H (t) of the living room may use a weighted average method, that is: h (t) =α H (t-1) + (1- α) W (t-1) W (t);
wherein H (t-1) represents the heat index of the last time period, W (t-1) represents the weight of the user to the live broadcasting room interaction in the last time period, W (t) represents the weight of the user to the live broadcasting room interaction in the time period at the current moment, and alpha is a smoothing coefficient, and the function of the smoothing coefficient is to smooth the historical weight;
the calculation mode of the weight can be defined according to the actual service condition, for example, the weight of operations such as praise, comment, share and the like can be set to different values, in the actual application, the weight can be adaptively adjusted through a machine learning method so as to achieve a better effect, the calculation of the real-time heat index of the living broadcast room can be realized through the calculation formula of the weighted increment model, and the calculation result of the heat index is greatly affected by the early processing such as cleaning, de-duplication, downsampling and the like of the living broadcast data, so that the fineness of the early processing is important.
S4, a support vector regression algorithm is applied, a historical data analysis model is established, and future live broadcast heat is predicted, so that better analysis and decision basis are provided for enterprises, and live broadcast marketing strategies and effects of electronic commerce are improved;
s5, judging the content of interest of the viewer on the live broadcast, establishing an analysis table of the content of interest of the live broadcast, and realizing personalized customization of a live broadcast heat evaluation system according to the predicted future live broadcast heat;
the core idea of the SVR is to map the input space to the high-dimensional space by searching an optimal hyperplane based on the principle of interval maximization, so as to construct a nonlinear regression prediction model to establish a historical data analysis model based on a support vector regression machine and predict the future live broadcast heat index, thereby providing effective support for the development of live broadcast services.
Because when the electronic commerce is live broadcast, a viewer can consult some questions under the live broadcast, if the live broadcast person replies the questions one by one, time and labor are wasted, the unified answer is inconvenient to be carried out according to the user consulting the same question, the content questions are more, the live broadcast person cannot determine which of the questions is more to consult, and the more consultation questions cannot be preferentially answered, so that the loss of people and the live broadcast heat are influenced, therefore, the second embodiment of the invention is shown, and the second embodiment is different from the first embodiment in that: and in the step S5, judging that the content of interest of the viewer to the live broadcast adopts a keyword classification and sorting algorithm, and comprising the following steps:
the comment content is subjected to word segmentation processing, and each comment is divided into words;
next, a list or dictionary is created containing keywords, which are classified into different categories, such as "clean", "size", "time of use", etc.
Matching all words in the keyword list by using a fuzzy character string matching algorithm aiming at each word in the comments to find out comments associated with the classification;
separately calculating a weight score for each comment for the associated category, e.g., the weight score may be calculated based on the number of times a keyword for a topic appears in the comment;
classifying each comment into related classifications, counting the occurrence times of each classification in all comments, and sorting according to the occurrence times to obtain the most interesting classification of the viewer.
Specifically, the expression of the keyword classification and sorting algorithm is as follows:
for each comment Cj, a weight score W { i, j } is calculated for each category Ti, for representing the semantic relevance of the comment to each category, with the formula:
wherein mi (c { j, k }) represents the weight of the classification Ti corresponding to the kth word in the jth comment Cj, and n represents the number of words in the comment Cj;
for each category Ti, the weight score associated with it in all comments is counted, representing the overall interest level Ii of the viewer in that category, with the formula:
in S5, the direct broadcast player can answer according to the priority of the large number of consultation questions, the demands of most viewers are met, the live broadcast quality is improved, meanwhile, the consultation questions are classified, the missing of the answering of the consultation questions is avoided, the improvement of live broadcast heat is facilitated, meanwhile, in the next live broadcast, the questions which are not timely answered can be singly answered, and live broadcast meeting the content of interest of the public is customized.
The second object of the invention is to provide a system for calculating live broadcast heat of an electronic commerce, which comprises the method for calculating live broadcast heat of an electronic commerce according to any one of the above, and comprises a live broadcast data collection module, a data preprocessing module, a live broadcast heat calculation module, a prediction analysis module and a custom analysis module;
the live broadcast data collection module is used for collecting live broadcast data of the live broadcast platform of the consumer;
the data preprocessing module is used for preprocessing the collected live broadcast data through data cleaning, de-duplication and downsampling;
the live broadcast heat calculation module is used for calculating a live broadcast heat index in real time according to live broadcast data;
the prediction analysis module is used for establishing a historical data analysis model and predicting future live broadcast heat;
the customization analysis module is used for judging the content of interest of the viewer for live broadcast, establishing an analysis table of the content of interest of the live broadcast, and realizing personalized customization of a live broadcast heat evaluation system according to the predicted future live broadcast heat.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. The method for calculating the live broadcast heat of the electronic commerce is characterized by comprising the following steps of: the method comprises the following steps:
s1, acquiring live broadcast data of an electronic commerce live broadcast platform, wherein the live broadcast data comprise live broadcast titles, watching times, praise numbers, comment numbers and sharing numbers;
s2, preprocessing data cleaning, de-duplication and downsampling is carried out on the collected live broadcast data;
s3, calculating a real-time live broadcast heat index according to live broadcast data by adopting a weighted increment model;
s4, a support vector regression algorithm is applied, a historical data analysis model is established, and future live broadcast heat is predicted;
s5, judging the content of interest of the viewer on the live broadcast, establishing an analysis table of the content of interest of the live broadcast, and realizing personalized customization of a live broadcast heat evaluation system according to the predicted future live broadcast heat;
and in the step S5, judging that the content of interest of the viewer to the live broadcast adopts a keyword classification and sorting algorithm, and comprising the following steps:
word segmentation processing is carried out on the comment content, and each comment is divided into words;
establishing a list containing keywords, and classifying the keywords into different classifications;
matching all words in the keyword list by using a fuzzy character string matching algorithm aiming at each word in the comments to find out comments associated with the classification;
calculating a weight score for each comment for the associated category;
classifying each comment into a related class respectively, counting the occurrence times of each class in all comments, and sorting according to the occurrence times to obtain the class most interesting to the viewer;
the expression of the keyword classification and sorting algorithm is as follows:
for each comment Cj, a weight score W { i, j } is calculated for each category Ti, for representing the semantic relevance of the comment to each category, with the formula:
wherein mi (c { j, k }) represents the weight of the classification Ti corresponding to the kth word in the jth comment Cj, and n represents the number of words in the comment Cj;
for each category Ti, the weight score associated with it in all comments is counted, representing the overall interest level Ii of the viewer in that category, with the formula:
where m represents the number of comments.
2. The method for calculating live hotness of an electronic commerce according to claim 1, wherein: and the step S1 of acquiring live broadcast data is to acquire live broadcast data of the live broadcast platform of the electronic commerce in real time through an API interface.
3. The method for calculating live hotness of an electronic commerce according to claim 2, wherein: the step S2 of cleaning the live broadcast data comprises the following steps:
space and blank line in the data are removed: removing blank spaces at two ends of a character string by using a strip function or a regular expression, and deleting blank lines;
processing outliers in the data;
processing missing values in the data: for the missing value, adopting an interpolation method and a deletion method to process;
the step S2 of carrying out data deduplication on the live broadcast data comprises the following steps:
detecting unique attributes of live broadcast data by using a drop_redundant function in a Pandas library, and removing repeated recorded data;
the step of downsampling the live data in S2 includes the following steps:
defining a time interval;
for the data in each time interval, an average or median is used instead of the original data.
4. The method for calculating live hotness of an electronic commerce according to claim 3, wherein: the step S3 comprises the following steps:
giving weight to the index of the live broadcast data;
and calculating the real-time live broadcast heat index according to the weight and the increment of each index.
5. A system for realizing the calculation of the live hotness of an electronic commerce, which is applied to the method for calculating the live hotness of the electronic commerce according to claim 1, and is characterized in that: the live broadcast heat calculation system comprises a live broadcast data collection module, a data preprocessing module, a live broadcast heat calculation module, a prediction analysis module and a customization analysis module;
the live broadcast data collection module is used for collecting live broadcast data of the live broadcast platform of the consumer;
the data preprocessing module is used for preprocessing the collected live broadcast data through data cleaning, de-duplication and downsampling;
the live broadcast heat calculation module is used for calculating a live broadcast heat index in real time according to live broadcast data;
the prediction analysis module is used for establishing a historical data analysis model and predicting future live broadcast heat;
the customization analysis module is used for judging the content of interest of the viewer for live broadcast, establishing an analysis table of the content of interest of the live broadcast, and realizing personalized customization of a live broadcast heat evaluation system according to the predicted future live broadcast heat.
CN202310525800.4A 2023-05-11 2023-05-11 Method and system for calculating live broadcast heat of electronic commerce Active CN116320626B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310525800.4A CN116320626B (en) 2023-05-11 2023-05-11 Method and system for calculating live broadcast heat of electronic commerce

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310525800.4A CN116320626B (en) 2023-05-11 2023-05-11 Method and system for calculating live broadcast heat of electronic commerce

Publications (2)

Publication Number Publication Date
CN116320626A CN116320626A (en) 2023-06-23
CN116320626B true CN116320626B (en) 2023-11-14

Family

ID=86829074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310525800.4A Active CN116320626B (en) 2023-05-11 2023-05-11 Method and system for calculating live broadcast heat of electronic commerce

Country Status (1)

Country Link
CN (1) CN116320626B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886998B (en) * 2023-07-19 2023-12-22 中教畅享(北京)科技有限公司 Interactive processing method for simulating live broadcast environment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763214A (en) * 2018-05-30 2018-11-06 河海大学 A kind of sentiment dictionary method for auto constructing for comment on commodity
KR20190078684A (en) * 2017-12-13 2019-07-05 한국과학기술원 Prefetching based cloud broker apparatus for live streaming and method thereof
CN112991017A (en) * 2021-03-26 2021-06-18 刘秀萍 Accurate recommendation method for label system based on user comment analysis
CN113901226A (en) * 2021-12-08 2022-01-07 阿里巴巴达摩院(杭州)科技有限公司 Real-time live broadcast data processing method and computer storage medium
CN114727125A (en) * 2022-03-28 2022-07-08 北京高途云集教育科技有限公司 Auxiliary live broadcasting method and device
CN115480832A (en) * 2021-05-27 2022-12-16 北京字节跳动网络技术有限公司 Live broadcast room service configuration method, device, equipment and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200007934A1 (en) * 2018-06-29 2020-01-02 Advocates, Inc. Machine-learning based systems and methods for analyzing and distributing multimedia content

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190078684A (en) * 2017-12-13 2019-07-05 한국과학기술원 Prefetching based cloud broker apparatus for live streaming and method thereof
CN108763214A (en) * 2018-05-30 2018-11-06 河海大学 A kind of sentiment dictionary method for auto constructing for comment on commodity
CN112991017A (en) * 2021-03-26 2021-06-18 刘秀萍 Accurate recommendation method for label system based on user comment analysis
CN115480832A (en) * 2021-05-27 2022-12-16 北京字节跳动网络技术有限公司 Live broadcast room service configuration method, device, equipment and medium
CN113901226A (en) * 2021-12-08 2022-01-07 阿里巴巴达摩院(杭州)科技有限公司 Real-time live broadcast data processing method and computer storage medium
CN114727125A (en) * 2022-03-28 2022-07-08 北京高途云集教育科技有限公司 Auxiliary live broadcasting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
中文情感分析综述;王庆福;;电脑知识与技术(16);133-134 *

Also Published As

Publication number Publication date
CN116320626A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN111177575B (en) Content recommendation method and device, electronic equipment and storage medium
Zhao et al. Deep learning with XGBoost for real estate appraisal
Pan et al. Study on convolutional neural network and its application in data mining and sales forecasting for E-commerce
US7062477B2 (en) Information-processing apparatus, information-processing method and storage medium
CN110543598B (en) Information recommendation method and device and terminal
US20060293950A1 (en) Automatic ad placement
TWI793412B (en) Consumption prediction system and consumption prediction method
CN116320626B (en) Method and system for calculating live broadcast heat of electronic commerce
CN111949887A (en) Item recommendation method and device and computer-readable storage medium
CN112149352B (en) Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN111652735A (en) Insurance product recommendation method based on user behavior label characteristics and commodity characteristics
CN115760202A (en) Product operation management system and method based on artificial intelligence
CN116468460A (en) Consumer finance customer image recognition system and method based on artificial intelligence
CN106997371B (en) Method for constructing single-user intelligent map
CN114371946B (en) Information push method and information push server based on cloud computing and big data
CN115310722A (en) Agricultural product price prediction method based on data statistics
CN110717089A (en) User behavior analysis system and method based on weblog
Borges et al. A survey on recommender systems for news data
CN117131261A (en) E-commerce content recommendation system based on AI and big data analysis
CN111753151A (en) Service recommendation method based on internet user behaviors
CN116010696A (en) News recommendation method, system and medium integrating knowledge graph and long-term interest of user
Wu et al. Design and research of insurance survey claims system based on Big Data analysis
CN113377640B (en) Method, medium, device and computing equipment for explaining model under business scene
Patil et al. Providing highly accurate service recommendation for semantic clustering over big data
CN114329167A (en) Hyper-parameter learning, intelligent recommendation, keyword and multimedia recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant