CN117371876A - Index data analysis method and system based on keywords - Google Patents
Index data analysis method and system based on keywords Download PDFInfo
- Publication number
- CN117371876A CN117371876A CN202311671020.7A CN202311671020A CN117371876A CN 117371876 A CN117371876 A CN 117371876A CN 202311671020 A CN202311671020 A CN 202311671020A CN 117371876 A CN117371876 A CN 117371876A
- Authority
- CN
- China
- Prior art keywords
- target
- index data
- parameter
- initial
- target user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000007405 data analysis Methods 0.000 title claims abstract description 29
- 230000013016 learning Effects 0.000 claims abstract description 155
- 238000009826 distribution Methods 0.000 claims abstract description 126
- 239000002245 particle Substances 0.000 claims abstract description 91
- 238000004458 analytical method Methods 0.000 claims abstract description 60
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 60
- 238000005457 optimization Methods 0.000 claims abstract description 34
- 230000006399 behavior Effects 0.000 claims description 57
- 230000006870 function Effects 0.000 claims description 51
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 6
- 238000002156 mixing Methods 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000001133 acceleration Effects 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 230000008569 process Effects 0.000 description 22
- 230000000694 effects Effects 0.000 description 11
- 230000003993 interaction Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 6
- 230000003542 behavioural effect Effects 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 230000009326 social learning Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Educational Technology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Marketing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Primary Health Care (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
Abstract
The application relates to the technical field of data analysis and discloses an index data analysis method and system based on keywords. The method comprises the following steps: acquiring initial user learning index data of a plurality of target users based on a cloud platform, and performing keyword index analysis to acquire the target user learning index data; classifying users through an EM algorithm to obtain a plurality of target user groups; carrying out mixed Weibull distribution parameter operation through a mixed Weibull distribution model to obtain a distribution scale parameter and a distribution shape parameter; constructing a variable set to obtain a target variable set; performing variable relation analysis through an initial Bayesian network to obtain target influence factors; and performing model parameter optimization through a particle swarm optimization algorithm to obtain an optimal network parameter combination, and performing network parameter updating on the initial Bayesian network through the optimal network parameter combination to obtain a target Bayesian network, thereby improving the analysis accuracy of index data.
Description
Technical Field
The present disclosure relates to the field of data analysis technologies, and in particular, to a method and a system for analyzing index data based on keywords.
Background
With the popularization of online education platforms, a large amount of learning data is recorded, including learning behaviors, interaction patterns, learning progress, and the like of students. These data contain important insights into education strategies, course designs, personalized learning paths, etc. However, how to extract valuable information from these huge and complex data sets, thereby improving teaching methods and improving learning efficiency is a challenge facing the current education technical field.
By analyzing the learning index data of the user, learning habits, preferences and potential difficulties of the student can be revealed, thereby providing support for the educator, helping them to better understand and meet the demands of the student. For example, by analyzing the attention and participation of students in a particular course, the educational program can be adjusted by the learner to more closely follow the interests and learning patterns of the students, but the accuracy of the existing solution is low.
Disclosure of Invention
The method and the system for analyzing the index data based on the keywords further improve the analysis accuracy of the index data.
The first aspect of the present application provides a keyword-based index data analysis method, which includes:
Acquiring initial user learning index data of a plurality of target users based on a preset cloud platform, and performing keyword index analysis on the initial user learning index data to obtain target user learning index data;
user classification is carried out on the plurality of target users according to the target user learning index data through a preset EM algorithm, so that a plurality of target user groups are obtained;
inputting the target user learning index data into a preset mixed Weibull distribution model to perform mixed Weibull distribution parameter operation to obtain a distribution scale parameter and a distribution shape parameter;
according to the multiple target user groups, carrying out variable set construction on the distribution scale parameters and the distribution shape parameters to obtain a target variable set of each target user group;
inputting a target variable set of each target user group into a preset initial Bayesian network for variable relation analysis to obtain target influence factors of each target user group;
and carrying out model parameter optimization on the initial Bayesian network according to target influence factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and carrying out network parameter updating on the initial Bayesian network through the optimal network parameter combination to obtain a target Bayesian network.
A second aspect of the present application provides a keyword-based index data analysis system, the keyword-based index data analysis system comprising:
the acquisition module is used for acquiring initial user learning index data of a plurality of target users based on a preset cloud platform, and carrying out keyword index analysis on the initial user learning index data to obtain target user learning index data;
the classification module is used for carrying out user classification on the plurality of target users according to the target user learning index data through a preset EM algorithm to obtain a plurality of target user groups;
the operation module is used for inputting the target user learning index data into a preset mixed Weibull distribution model to carry out mixed Weibull distribution parameter operation so as to obtain a distribution scale parameter and a distribution shape parameter;
the construction module is used for constructing variable sets of the distribution scale parameters and the distribution shape parameters according to the plurality of target user groups to obtain target variable sets of each target user group;
the analysis module is used for inputting the target variable set of each target user group into a preset initial Bayesian network to perform variable relation analysis so as to obtain target influence factors of each target user group;
The updating module is used for carrying out model parameter optimization on the initial Bayesian network according to target influence factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and carrying out network parameter updating on the initial Bayesian network through the optimal network parameter combination to obtain a target Bayesian network.
According to the technical scheme, the keyword index analysis is carried out on the user learning index data, so that the user behavior can be accurately identified and quantified. This not only improves the accuracy of the data analysis, but also makes understanding of the user's behavior more thorough and specific. By applying the EM algorithm, this approach can efficiently divide users into different populations, each population having unique behavioral characteristics. Such classification not only helps identify different types of users, but also can be used to formulate targeted strategies, such as customized teaching methods. By using the hybrid weibull distribution model, complex patterns of user behavior can be modeled effectively, especially when behavior data with multiple influencing factors are processed. Such a model can reveal deep features of the user's behavior, such as duration and frequency of learning activities. By inputting the variable set into the bayesian network for analysis, the relationships and interactions between the variables can be comprehensively identified. Such analysis helps understand the variety of factors that affect user behavior, providing support for developing more efficient strategies. The performance of the model can be remarkably improved by optimizing the parameters of the Bayesian network by using a particle swarm optimization algorithm. The optimization ensures high accuracy and reliability of the model on the complex data set, so that the prediction and analysis results are more accurate, and the analysis accuracy of index data is further improved.
Drawings
FIG. 1 is a schematic diagram of one embodiment of a keyword-based index data analysis method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an embodiment of a keyword-based index data analysis system according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides an index data analysis method and system based on keywords, so that the analysis accuracy of index data is improved.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, the following describes a specific flow of an embodiment of the present application, referring to fig. 1, and one embodiment of a keyword-based index data analysis method in the embodiment of the present application includes:
step 101, acquiring initial user learning index data of a plurality of target users based on a preset cloud platform, and performing keyword index analysis on the initial user learning index data to obtain target user learning index data;
it may be understood that the execution body of the present application may be a keyword-based index data analysis system, or may be a terminal or a server, which is not limited herein. The embodiment of the present application will be described by taking a server as an execution body.
Specifically, first, based on a preset cloud platform, a plurality of target users are subjected to learning index monitoring to obtain initial user learning index data. These data cover the online learning duration, activity frequency, selection of learning content, etc. of the user, reflecting the learning habit and preference of the user. After the data are acquired, the data are processed by using a predefined keyword analysis function so as to calculate the frequency data of the keywords. The keyword analysis function is based on a TF-IDF algorithm, namely a word frequency-inverse document frequency algorithm, wherein the word frequency (TF) reflects the frequency of occurrence of a keyword in certain user learning index data, and the Inverse Document Frequency (IDF) is an index for measuring the uniqueness of the keyword in the whole data set. Specifically, the frequency of occurrence of each keyword in the individual user learning index data is calculated, and then multiplied by the inverse proportion of the frequency of occurrence of the keyword in all the user learning index data, so as to obtain the weight of the keyword. Thus, not only the words frequently appearing in the single user data can be identified, but also words which are unique in the whole data set can be identified, so that the learning characteristics and preferences of each user can be more accurately grasped. Then, based on the keyword frequency data, the system further performs association rule learning, and the relationship between different keywords is analyzed, so that a user behavior mode is revealed. For example, if certain keywords are found to occur often simultaneously, it is indicated that these learning content or activities are related to each other or equally important to the user. Finally, according to the user behavior mode obtained through association rule learning, the system further analyzes and screens the initial user learning index data to extract key index data reflecting learning habits and preferences of the target user. These data will provide key information for subsequent user analysis, content recommendation, or personalized educational path design.
102, carrying out user classification on a plurality of target users according to target user learning index data through a preset EM algorithm to obtain a plurality of target user groups;
in particular, the EM algorithm is an iterative algorithm for maximum likelihood estimation of probability model parameters containing hidden variables (variables). In this embodiment, according to the learning index data of the target user, the user is effectively classified, so as to obtain different user groups. First, category probabilities are calculated for a plurality of target users using the EM algorithm. The learning index data of each user is regarded as observation data, and the probability that each user belongs to a different category is calculated based on these data. In step E (the desired step) of the EM algorithm, the algorithm estimates the probability that each user belongs to each category under the current parameters. These probabilities can be understood as posterior probability scores of hidden variablesAnd (3) the cloth reflects the property that each user belongs to each category under the current model parameters. The users are then classified based on the calculated class probabilities, resulting in a plurality of initial user groups. In this process, each user is divided into the categories to which it most belongs, forming a preliminary user population division. And then, carrying out parameter updating calculation on the target user learning index data through a preset parameter updating function so as to optimize the classification performance of the model. This step is the M step (maximization step) of the EM algorithm, in which the algorithm updates the model parameters with the posterior probability distribution obtained in the previous step. In particular, the purpose of this parameter update function is to maximize the log-likelihood function of the observed data, which takes into account the posterior probability of each observed data point under the current parameters. In this process, θ (new) Representing the updated parameters, θ (old) Representing the parameters before updating, each iteration updates new parameter values based on the last parameter and data. And then, the classification performance of the model is gradually optimized and refined by carrying out iterative updating on the updated parameters. In each iteration, the model parameters are adjusted according to the newly calculated probability distribution and the observed data. This process is repeated until the parameters converge to a stable value, at which point the server obtains the target updated parameters. Based on the target updating parameters, finally, further group optimization is carried out on the initial user group, and a final target user group is obtained. In this way, the EM algorithm can effectively process complex data sets containing hidden variables, provides a powerful user classification tool for keyword-based index data analysis, and further provides a basis for subsequent data analysis and decision.
Step 103, inputting target user learning index data into a preset mixed Weibull distribution model to carry out mixed Weibull distribution parameter operation, so as to obtain a distribution scale parameter and a distribution shape parameter;
first, the target user learning index data is subjected to a mixed component analysis, and different behavior patterns or user groups existing in the data are identified. The purpose is to decompose the overall data into subsets, each subset Representing a particular user behavior pattern or characteristic. For example, different blending components represent different learning time preferences, course selection habits, or interactions. And then, carrying out user behavior time distribution modeling on the target user learning index data through a preset mixed Weibull distribution model. The weibull distribution is a probability distribution widely used for survival analysis, reliability analysis, and risk assessment, whereas the mixed weibull distribution combines multiple weibull distributions together to better accommodate diverse data sets. The hybrid weibull distribution model is used to capture different characteristics of the user behavior time distribution, such as the distribution of learning duration, the distribution of active time periods, and the like. Then, a distribution parameter operation function is used to calculate a distribution scale parameter and a distribution shape parameter. These parameters help understand and describe the user behavior patterns. The distribution scale parameter (λ) describes the extent of the distribution, while the distribution shape parameter (β) describes the shape of the distribution, such as the degree of deflection. A specially designed function L is used, which combines the weights of the mixed components (pi k ) And a scale parameter and a shape parameter corresponding to each of the mixed components to describe a time value (t i ). From this function, a mixed weibull distribution parameter describing the entire dataset can be calculated. Finally, the mixed Weibull distribution which can accurately reflect the learning behavior characteristics of the target user is obtained. These distribution parameters not only help to understand the learning behavior of the user, but also provide powerful support for subsequent data analysis, user grouping, personalized recommendation, etc.
104, constructing variable sets of distribution size parameters and distribution shape parameters according to a plurality of target user groups to obtain target variable sets of each target user group;
specifically, first, initial feature data is constructed for each target user group according to the obtained distribution scale parameters and distribution shape parameters. The characteristics of each user population are defined using parameters that have been calculated from the hybrid weibull distribution model. The distribution scale parameters and shape parameters reflect specific aspects of the user's learning behavior, such as learning frequency, duration, etc., which facilitate understanding of behavior patterns for different user populations. Next, these initial feature data are normalized to ensure consistency and comparability of the data. Normalization typically involves scaling the data to a standard range, such as 0 to 1 or-1 to 1, or Z-score normalization to eliminate the effects of different dimensions and ranges. This step ensures that the comparison and combination between the different features is fair and efficient. And then, carrying out feature clustering on the target feature data by using a preset feature extraction function to obtain a target variable set of each target user group. Feature clustering is a data mining technique that identifies groups of similar or related features from a large amount of data. The feature extraction function is based on a fuzzy clustering algorithm, and the core idea is to consider the membership degree of data points to different groups. This approach differs from traditional hard clustering algorithms (e.g., k-means) that divide each data point into a single group. Fuzzy clustering allows data points to belong to multiple groups to varying degrees, thereby providing a more flexible and fine view of data analysis.
Step 105, inputting a target variable set of each target user group into a preset initial Bayesian network for variable relation analysis to obtain target influence factors of each target user group;
specifically, first, a set of target variables for each target user group is input into a preset initial bayesian network. A bayesian network is a graphical model that represents probabilistic relationships between variables through nodes and directed edges. In this network, each node represents a variable, and the edges represent probabilistic dependencies between the variables. The target conditional probabilities for each set of target variables can be calculated separately by means of a probability inference function in the initial bayesian network. This probability inference function is based on the bayesian theorem, which infers the conditional probability of each variable in the set of target variables by taking into account the conditional dependencies between the variables. This process involves calculating the conditional probability of each variable given its parent node value, which represents other variables that affect the variable. By doing this for the whole set of target variables, an overall target conditional probability distribution can be obtained, reflecting the interrelationship and influence between the variables. Next, probability inference is performed for each target user population based on the resulting target conditional probabilities. The behavior patterns and features of the individual user groups are inferred from the conditional probability distribution of the bayesian network. Probability inference can help the server understand how much the values of other variables change given the values of some variables. This not only helps to identify the main features of the individual user population, but also reveals the interrelationship between these features. And finally, identifying influence factors for each target user group according to the result of the probability inference. Based on the previous analysis, key influencing factors of each user group are further deeply mined. By identifying these factors, the server better understands the behavior and preferences of different user groups, thereby providing an important basis for formulating more accurate and efficient user service policies.
And 106, performing model parameter optimization on the initial Bayesian network according to target influence factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and performing network parameter update on the initial Bayesian network through the optimal network parameter combination to obtain the target Bayesian network.
Specifically, first, a preset Particle Swarm Optimization (PSO) algorithm is used. Particle swarm optimization algorithms are a population-based optimization tool for finding optimal solutions in a complex search space by simulating the movement of a population of particles in the solution space. The algorithm generates particles of model parameters for the initial Bayesian network based on target influence factors of each target user group, and a plurality of initial network parameter particles are obtained. These particles represent the parameter configuration of the bayesian network, and each particle's location corresponds to a particular combination of network parameters. The goal of the particles is to find a combination of network parameters that best interprets the target user population data. Next, an initialization of the particles is performed, determining the position and velocity of each initial network parameter particle. This step is to set the initial state of the particle in the search space, where the position represents one potential parameter combination of the bayesian network and the velocity determines the direction and velocity of the particle search. Then, the position and speed of the particles are updated by a preset particle update function to guide the particles to move towards the optimal solution direction. This update process includes two parts: update of speed and update of location. The update of the speed is affected by three factors: the current velocity (inertia) of the particles, the best position the particles found so far (individual learning component), and the best position found in the whole population of particles (social learning component). This updating mechanism enables the particles to find a balance between individual experience and population experience, thus effectively exploring the solution space. Finally, the optimal combination of network parameters found in this way is used to update the network nodes and conditional probability tables of the initial bayesian network, resulting in the target bayesian network. The optimized Bayesian network can reflect the behavior mode and the characteristics of the target user group more accurately, thereby providing more accurate and effective support for the analysis of index data based on keywords.
According to the method, the user behavior can be accurately identified and quantified by carrying out keyword index analysis on the user learning index data. This not only improves the accuracy of the data analysis, but also makes understanding of the user's behavior more thorough and specific. By applying the EM algorithm, this approach can efficiently divide users into different populations, each population having unique behavioral characteristics. Such classification not only helps identify different types of users, but also can be used to formulate targeted strategies, such as customized teaching methods. By using the hybrid weibull distribution model, complex patterns of user behavior can be modeled effectively, especially when behavior data with multiple influencing factors are processed. Such a model can reveal deep features of the user's behavior, such as duration and frequency of learning activities. By inputting the variable set into the bayesian network for analysis, the relationships and interactions between the variables can be comprehensively identified. Such analysis helps understand the variety of factors that affect user behavior, providing support for developing more efficient strategies. The performance of the model can be remarkably improved by optimizing the parameters of the Bayesian network by using a particle swarm optimization algorithm. The optimization ensures high accuracy and reliability of the model on the complex data set, so that the prediction and analysis results are more accurate, and the analysis accuracy of index data is further improved.
In a specific embodiment, the process of executing step 101 may specifically include the following steps:
(1) Based on a preset cloud platform, carrying out learning index monitoring on a plurality of target users to obtain corresponding initial user learning index data;
(2) Performing keyword frequency calculation on the initial user learning index data through a preset keyword analysis function to obtain keyword frequency data, wherein the keyword analysis function is as follows:
t is a keyword, d is initial user learning index data, T t,d For the keyword frequency data of the keyword t in the initial user learning index data d, f t,d For the frequency of occurrence of the keyword t in the initial user learning index data d,/for the key word t>For the total frequency of occurrence of all keywords in the initial user learning index data d, N is the total number of initial user learning index data in the initial user learning index data set,learning the number of index data for the initial user containing the keyword t;
(3) Performing association rule learning on the initial user learning index data according to the keyword frequency data to obtain a user behavior mode;
(4) And extracting key word indexes from the initial user learning index data according to the user behavior mode to obtain target user learning index data.
Specifically, first, a plurality of target users are monitored for learning indexes through a preset cloud platform, and initial user learning index data is obtained. Such data typically includes a record of the user's activities on the online learning platform, such as when the lesson video was viewed, how frequently the discussion was engaged, how well the job was completed, and so forth. These data provide the basis for subsequent analysis. And then, performing keyword frequency calculation on the initial user learning index data through a preset keyword analysis function, so as to obtain keyword frequency data. This function is based on TF-IDF (word frequency-inverse document frequency) algorithm for evaluating the importance of a word in a data set. Specifically, the Term Frequency (TF) section calculates the frequency of occurrence of a word in the individual user learning index data, while the Inverse Document Frequency (IDF) section considers the frequency of occurrence of the word in the entire dataset, thereby helping to identify both common and unique words. For example, if most users on a learning platform frequently watch a certain type of lesson video, this type of lesson becomes a high frequency keyword. Next, association rule learning may be performed according to the obtained keyword frequency data to identify a user behavior pattern. Association rule learning is a method of finding relationships between variables in a large dataset. In this process, the algorithm may analyze the relevance between different keywords, for example, to find that users viewing a particular type of course also tend to participate in the relevant online discussion. In this way, the inherent patterns and trends in user behavior can be revealed, for example, finding that users who frequently participate in programming course discussions often also behave aggressively in courses of algorithms and data structures. And finally, extracting key word indexes from the initial user learning index data according to the identified user behavior mode to obtain target user learning index data. The most representative and predictive value information is extracted from the data. For example, if the analysis results indicate that users participating in a particular discussion group will generally achieve better performance in the final exam, the frequency and liveness of participation in the discussion may be extracted as an important learning indicator. Likewise, if the viewing duration of certain lesson videos is found to be highly correlated with the lesson completion rate, the viewing data of these videos becomes a key learning indicator.
In a specific embodiment, the process of executing step 102 may specifically include the following steps:
(1) Carrying out class probability calculation on a plurality of target users through a preset EM algorithm to obtain class probability of each target user;
(2) User classification is carried out on a plurality of target users according to the category probability, and a plurality of initial user groups are obtained;
(3) Updating parameter calculation is carried out on the target user learning index data through a preset parameter updating function, updated parameters are obtained, and the parameter updating function is as follows:
,θ (new) representing the updated parameters, θ (old) Representing parameters before update, x i Learning the ith observation data point in the index data for the target user, wherein z is a hidden variable, and ++>Representing a given observed data point x under pre-update parameters i Posterior probability of time-hidden variable z, +.>Representing updated under-parameter observed data point x i And a joint probability of the hidden variable z;
4) And carrying out iterative updating on the updated parameters to obtain target updating parameters, and carrying out group optimization on a plurality of initial user groups according to the target updating parameters to obtain a plurality of target user groups.
In particular, the EM algorithm, i.e. the expectation maximization algorithm, is a method for statistical model parameter estimation with hidden variables. In the user classification, the EM algorithm may help the server estimate the categories to which it belongs from the user's behavioral data, even if these categories are not explicitly defined in advance. First, category probabilities are calculated for a plurality of target users by an EM algorithm. In this process, the algorithm randomly initializes the probability that each user belongs to a respective potential category. For example, in an online learning platform, these potential categories are "beginner", "intermediate user" and "advanced user". The preliminary category probability assignment is based on learning activities of the user, such as course viewing time, quiz achievements, forum participation, and the like. Next, the algorithm enters an iterative process, continually updating these class probabilities to more accurately reflect the user's true class. The EM algorithm is divided into two steps: e step (desired step) and M step (maximize step). In step E, the algorithm calculates the expected probability that each user belongs to each category, which is based on the probability of the user data under the current parameters. Then, in step M, the algorithm updates the model parameters, i.e., the features of each category, to maximize the likelihood of the observed data. This process is repeated until the category probabilities converge. And then, carrying out parameter updating calculation on the target user learning index data by utilizing a preset parameter updating function. The core of this process is to optimize the model parameters to more accurately reflect the user's behavior and class. The objective of the parameter update function is to find parameter values that maximize the likelihood of the model. Likelihood here refers to the probability of the model parameters given the observed data. In this way, it can be ensured that the model describes the user data as accurately as possible. Finally, the server obtains more accurate user classification by iteratively updating the updated parameters. This process is dynamic and as the user's behavior changes and new data is added, the model will continually adjust to better accommodate the user's actual behavior. After these steps are completed, the server may perform population optimization on the initial user population according to the target update parameters, thereby obtaining a plurality of target user populations. The optimization process not only considers the current behaviors of the users, but also can adapt to the variation trend of the behaviors of the users, and provides more accurate user grouping and personalized recommendation basis for the platform.
In a specific embodiment, the process of executing step 103 may specifically include the following steps:
(1) Performing mixed component analysis on the target user learning index data to obtain a plurality of different mixed components;
(2) Modeling the user behavior time distribution of the target user learning index data through a preset mixed Weibull distribution model to obtain corresponding target mixed Weibull distribution;
(3) Carrying out mixed Weibull distribution parameter operation on the target mixed Weibull distribution according to a plurality of different mixed components through a preset distribution parameter operation function to obtain a distribution scale parameter and a distribution shape parameter, wherein the distribution parameter operation function is as follows:λ is the scale parameter, β is the shape parameter, +.>Distribution scale parameters representing different mixing elements, < ->Representing the distribution shape parameters of the different blend components, < >>Weight for kth mixed component, +.>For the time value of the ith data point in the target user learning index data, K is the number of the mixed components, and w is the mixed distribution coefficient.
Specifically, first, the target user learning index data is subjected to mixed component analysis to obtain a plurality of different mixed components. The purpose of the mixed component analysis is to decompose the user learning index data into a plurality of sub-components, each sub-component representing a particular pattern of user behavior. For example, in an online learning platform, these mixed components represent different learning activities, course completion speeds, or interaction patterns, etc. A clustering algorithm or other statistical analysis method may be applied to identify potential patterns in the data. Users can be divided into several different groups by analyzing their learning time, course selection, forum participation, etc. behavioral data. Each group exhibits different learning characteristics and habits, thus constituting a mixed component of data. Then, these are subjected to a preset mixed Weibull distribution model The learning index data is modeled. The weibull distribution is a flexible probability distribution that can be used to model the time distribution of user behavior, such as learning duration, course completion time, etc., in user behavior analysis. For example, some users tend to complete courses centrally in a short period of time (representing one form of weibull distribution), while others tend to complete the distribution over a longer period of time (another form of weibull distribution). Hybrid weibull distribution by combining these different weibull distribution forms, the time distribution characteristics of the overall user behavior can be more fully described. Then, parameters of the mixed weibull distribution, including the scale parameter (λ) and the shape parameter (β) of the distribution, are calculated using a preset distribution parameter arithmetic function. The scale parameters determine the extent of the distribution, while the shape parameters determine the shape of the distribution. For example, in a scenario of learning time distribution, the scale parameter represents an average learning time, and the shape parameter represents a variation range of the learning time. The distribution parameter arithmetic function is calculated by taking into account the weights (pi k ) And the scale and shape parameters of each component to calculate the overall mixing profile. Such a calculation takes into account not only the characteristics of the individual sub-components themselves, but also their relative importance in the overall data.
In a specific embodiment, the process of executing step 104 may specifically include the following steps:
(1) Respectively constructing initial characteristic data of each target user group according to the distribution scale parameters and the distribution shape parameters;
(2) Normalizing the initial characteristic data to obtain target characteristic data;
(3) Carrying out feature clustering on the target feature data through a preset feature extraction function to obtain a target variable set of each target user group, wherein the feature extraction function is as follows:
,V group for the set of target variables, v group Representing the variable, u ij Representing membership degree of ith data point to jth group in target characteristic data, p represents weighting index in fuzzy clustering, x ij And the j-th characteristic value of the i-th data point in the target characteristic data is represented.
Specifically, first, initial feature data of each target user group are respectively constructed according to the distribution scale parameters and the distribution shape parameters. These parameters are obtained from a prior mixed weibull distribution model reflecting the characteristics of different user populations in learning behavior time distribution. Next, these initial feature data are normalized. The purpose of the normalization processing is to eliminate the dimension influence among different features, so that the data are more unified and standardized, and the subsequent analysis is convenient. For example, if the scale parameter ranges from 0 to 100 and the shape parameter ranges from 0 to 10, it would not make sense to directly compare the two parameters. By normalization, these parameters can be converted to the same scale, e.g. they are all in the range of 0 to 1. In this way, the contribution of each feature is equal for subsequent analysis. And then, carrying out feature clustering on the target feature data by adopting a preset feature extraction function. The purpose of feature clustering is to categorize users with similar features into the same group. The feature extraction function used herein is based on a fuzzy clustering algorithm, which is different from conventional hard clustering algorithms (e.g., K-means), which allows one data point to belong to multiple clusters, but with different membership. In this function, the membership of each data point to each group is calculated based on its distance from the center of the group. For example, if a user is very close to the center of a "frequent learner" population in learning time, but is closer to the center of a "diverse learner" population in learning content diversity, then the user belongs to both populations at the same time, but with different membership. The advantage of this approach is that it provides a more flexible and detailed way to understand and characterize the user behavior, which means that the learning patterns and preferences of the user can be more accurately identified. For example, by this method, the server may find that some users, although not frequently logged into the learning platform, concentrate on learning for a longer period of time each time they log in, which behavior pattern is quite different from other users who are more frequent but have shorter learning times each time.
In a specific embodiment, the process of executing step 105 may specifically include the following steps:
(1) Inputting the target variable sets of each target user group into a preset initial Bayesian network, and respectively calculating the target conditional probability of each target variable set through a probability inference function in the initial Bayesian network, wherein the probability inference function is as follows:
,v group representing the variables, V group For the set of target variables, entries (V group ) A set of parent nodes representing each variable in the set of target variables,representing variable v given parent node value group Is a function of the conditional probability of (1),representing a target conditional probability;
(2) According to the target conditional probability, probability inference is respectively carried out on each target user group, and a probability inference result of each target user group is obtained;
(3) And respectively carrying out influence factor identification on each target user group according to the probability inference result to obtain target influence factors of each target user group.
Specifically, first, a set of target variables for each target user group is input into a preset initial bayesian network. Bayesian networks are a powerful probabilistic graph model for representing the dependency between variables and making complex probabilistic inferences. Such networks represent variables by nodes, and directed edges between nodes represent probabilistic relationships between the variables. In this process, the structure of each set of variables, i.e., which variables are parent nodes of the other variables, is determined. Next, a target conditional probability for each set of target variables is calculated by a probability inference function in the bayesian network. The probability inference function used herein is based on the bayesian theorem, which is able to calculate the conditional probability of each variable given the parent node value. Further, probability inference is performed for each target user population based on the calculated target conditional probabilities. A bayesian network is queried to predict or infer behavior exhibited by a user population. Finally, according to the probability inference results, the identification of the influence factors can be carried out on each target user group. The objective is to extract the factors with the most influence on each user group from the analysis of the Bayesian network. For example, the server may find that for those groups of users seeking professional development, their learning time is more affected by work pressure and availability time, while for those groups of predominantly interested users, their learning time is more affected by personal interests and learning resources.
In a specific embodiment, the process of executing step 106 may specifically include the following steps:
(1) Generating model parameter particles for the initial Bayesian network according to target influence factors of each target user group by a preset particle swarm optimization algorithm to obtain a plurality of initial network parameter particles;
(2) Initializing a plurality of initial network parameter particles to obtain the particle position and the particle speed of each initial network parameter particle;
(3) The particle positions and particle speeds of a plurality of initial network parameter particles are updated in a searching direction through a preset particle updating function, so that an optimal network parameter combination is obtained, and the particle updating function comprises:;/>;/>indicating the particle velocity of particle i at time t+1, -/->Represents the particle velocity of particle i at time t, w represents the inertial weight, c 1 And c 2 Represents the acceleration constant, r 1 And r 2 Representing random numbers, pbest i Representing the historical best position of the initial network parameter particles, gbest i Representing a global optimum position;
(4) And updating the network nodes and the conditional probability table of the initial Bayesian network through the optimal network parameter combination to obtain the target Bayesian network.
Specifically, firstly, generating model parameter particles for an initial Bayesian network through a preset particle swarm optimization algorithm according to target influence factors of each target user swarm. Each particle represents a set of parameter settings of the bayesian network. For example, in analyzing user behavior of an online learning platform, these parameters include various factors that affect user learning duration and activity, such as course difficulty, user base knowledge level, and the like. The particles are randomly placed in the search space, the location of each particle representing a combination of network parameters. Next, particle initialization is performed on these initial network parameter particles, which includes determining the position and velocity of each particle. The position of the particle represents the current parameter setting of the bayesian network, while the velocity represents the direction and magnitude of the parameter setting change. The initialization phase provides a starting point for the particle to start searching. For example, if the initial position of a particle corresponds to a network parameter of high user activity, then the particle will start searching for a more optimal combination of parameters from this point. The position and velocity of the particles are then updated with a preset particle update function. The update function in the particle swarm optimization algorithm is mainly based on three components: the current velocity of the particles (inertia), the best position the particles found so far (individual optimal solution), the best position found in the whole population of particles (global optimal solution). The velocity update of the particles depends on a balance of these three factors, which allows the particles to find a balance between exploration (global search) and development (local search). For example, if an individual best location of a particle corresponds to a particularly effective set of bayesian network parameters, and a global best location corresponds to another set of parameters, the next location of the particle will be weighted between the two best locations. By continuously updating the position and speed of each particle, the particle swarm algorithm searches the whole parameter space for the optimal combination of network parameters. In this process, each particle adjusts its search path based on its own experience and the experience of the population. As the iteration proceeds, the entire population of particles will typically gradually approach the optimal solution. Finally, after the algorithm converges and finds the optimal combination of network parameters, the parameters are used to update the network nodes and the conditional probability table of the initial Bayesian network, thereby obtaining the target Bayesian network. This optimized network more accurately reflects the behavior pattern of the target user population. For example, in the context of an online learning platform, the final bayesian network will accurately reveal complex relationships between course difficulty, user interaction frequency, and learning outcome.
The method for analyzing index data based on keywords in the embodiment of the present application is described above, and the system for analyzing index data based on keywords in the embodiment of the present application is described below, referring to fig. 2, an embodiment of the system for analyzing index data based on keywords in the embodiment of the present application includes:
the acquisition module 201 is configured to acquire initial user learning index data of a plurality of target users based on a preset cloud platform, and perform keyword index analysis on the initial user learning index data to obtain target user learning index data;
the classification module 202 is configured to classify the plurality of target users according to the target user learning index data by using a preset EM algorithm, so as to obtain a plurality of target user groups;
the operation module 203 is configured to input the target user learning index data into a preset mixed weibull distribution model to perform mixed weibull distribution parameter operation, so as to obtain a distribution scale parameter and a distribution shape parameter;
a construction module 204, configured to construct a variable set of the distribution scale parameter and the distribution shape parameter according to the multiple target user groups, so as to obtain a target variable set of each target user group;
The analysis module 205 is configured to input a target variable set of each target user group into a preset initial bayesian network to perform variable relationship analysis, so as to obtain a target influence factor of each target user group;
the updating module 206 is configured to perform model parameter optimization on the initial bayesian network according to the target influencing factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and perform network parameter updating on the initial bayesian network through the optimal network parameter combination to obtain a target bayesian network.
Through the cooperation of the components, the method can accurately identify and quantify the user behaviors by carrying out keyword index analysis on the user learning index data. This not only improves the accuracy of the data analysis, but also makes understanding of the user's behavior more thorough and specific. By applying the EM algorithm, this approach can efficiently divide users into different populations, each population having unique behavioral characteristics. Such classification not only helps identify different types of users, but also can be used to formulate targeted strategies, such as customized teaching methods. By using the hybrid weibull distribution model, complex patterns of user behavior can be modeled effectively, especially when behavior data with multiple influencing factors are processed. Such a model can reveal deep features of the user's behavior, such as duration and frequency of learning activities. By inputting the variable set into the bayesian network for analysis, the relationships and interactions between the variables can be comprehensively identified. Such analysis helps understand the variety of factors that affect user behavior, providing support for developing more efficient strategies. The performance of the model can be remarkably improved by optimizing the parameters of the Bayesian network by using a particle swarm optimization algorithm. The optimization ensures high accuracy and reliability of the model on the complex data set, so that the prediction and analysis results are more accurate, and the analysis accuracy of index data is further improved.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.
Claims (8)
1. The index data analysis method based on the keywords is characterized by comprising the following steps of:
acquiring initial user learning index data of a plurality of target users based on a preset cloud platform, and performing keyword index analysis on the initial user learning index data to obtain target user learning index data;
user classification is carried out on the plurality of target users according to the target user learning index data through a preset EM algorithm, so that a plurality of target user groups are obtained;
inputting the target user learning index data into a preset mixed Weibull distribution model to perform mixed Weibull distribution parameter operation to obtain a distribution scale parameter and a distribution shape parameter;
According to the multiple target user groups, carrying out variable set construction on the distribution scale parameters and the distribution shape parameters to obtain a target variable set of each target user group;
inputting a target variable set of each target user group into a preset initial Bayesian network for variable relation analysis to obtain target influence factors of each target user group;
and carrying out model parameter optimization on the initial Bayesian network according to target influence factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and carrying out network parameter updating on the initial Bayesian network through the optimal network parameter combination to obtain a target Bayesian network.
2. The keyword-based index data analysis method of claim 1, wherein the acquiring initial user learning index data of a plurality of target users based on a preset cloud platform and performing keyword index analysis on the initial user learning index data to obtain target user learning index data comprises:
based on a preset cloud platform, carrying out learning index monitoring on a plurality of target users to obtain corresponding initial user learning index data;
Performing keyword frequency calculation on the initial user learning index data through a preset keyword analysis function to obtain keyword frequency data, wherein the keyword analysis function is as follows:
t is a keyword, d is initial user learning index data,keyword frequency data for keyword t in the initial user learning index data d, ++>For the frequency of occurrence of the keyword t in the initial user learning index data d,/for the key word t>For the total frequency of occurrence of all keywords in the initial user learning index data d, N is the total number of initial user learning index data in the initial user learning index data set,learning the number of index data for the initial user containing the keyword t;
performing association rule learning on the initial user learning index data according to the keyword frequency data to obtain a user behavior mode;
and extracting key word indexes from the initial user learning index data according to the user behavior mode to obtain target user learning index data.
3. The keyword-based index data analysis method of claim 1, wherein the classifying the plurality of target users according to the target user learning index data by a preset EM algorithm to obtain a plurality of target user groups comprises:
Carrying out category probability calculation on the plurality of target users through a preset EM algorithm to obtain the category probability of each target user;
according to the category probability, classifying the plurality of target users to obtain a plurality of initial user groups;
updating parameter calculation is carried out on the target user learning index data through a preset parameter updating function, updated parameters are obtained, and the parameter updating function is as follows:
,/>representing updated parameters ∈ ->Representing parameters before update ∈ ->Learning the ith observation data point in the index data for the target user, wherein z is a hidden variable, and ++>Representing a given observed data point x under pre-update parameters i Posterior probability of the time-hidden variable z,representing the updated observation data point under the parameter +.>And a joint probability of the hidden variable z;
and carrying out iterative updating on the updated parameters to obtain target updating parameters, and carrying out group optimization on the plurality of initial user groups according to the target updating parameters to obtain a plurality of target user groups.
4. The keyword-based index data analysis method of claim 1, wherein the step of inputting the target user learning index data into a preset mixed weibull distribution model to perform mixed weibull distribution parameter operation to obtain a distribution scale parameter and a distribution shape parameter includes:
Performing mixed component analysis on the target user learning index data to obtain a plurality of different mixed components;
modeling the user behavior time distribution of the target user learning index data through a preset mixed Weibull distribution model to obtain corresponding target mixed Weibull distribution;
carrying out mixed Weibull distribution parameter operation on the target mixed Weibull distribution according to the plurality of different mixed components through a preset distribution parameter operation function to obtain a distribution scale parameter and a distribution shape parameter, wherein the distribution parameter operation function is as follows:λ is the scale parameter, β is the shape parameter, +.>Distribution scale parameters representing different mixing elements, < ->Representing the distribution shape parameters of the different blend components, < >>Weight for kth mixed component, +.>And for the time value of the ith data point in the target user learning index data, K is the number of the mixed components, and w is the mixed distribution coefficient.
5. The keyword-based index data analysis method of claim 1, wherein the performing variable set construction on the distribution scale parameter and the distribution shape parameter according to the plurality of target user groups to obtain a target variable set of each target user group includes:
Respectively constructing initial characteristic data of each target user group according to the distribution scale parameters and the distribution shape parameters;
normalizing the initial characteristic data to obtain target characteristic data;
performing feature clustering on the target feature data through a preset feature extraction function to obtain a target variable set of each target user group, wherein the feature extraction function is as follows:,for the set of target variables>Representing the variables->Representing the membership degree of the ith data point to the jth group in the target characteristic data, and p represents the weighted index in fuzzy clustering, < >>And the j-th characteristic value of the i-th data point in the target characteristic data is represented.
6. The keyword-based index data analysis method of claim 1, wherein the inputting the target variable set of each target user group into a preset initial bayesian network for variable relation analysis to obtain the target influencing factor of each target user group comprises:
inputting a target variable set of each target user group into a preset initial Bayesian network, and respectively calculating target conditional probability of each target variable set through a probability inference function in the initial Bayesian network, wherein the probability inference function is as follows:
,/>The variables are represented by the values of the variables,for the set of target variables>A set of parent nodes representing each variable in the set of target variables,representing variable v given parent node value group Is a function of the conditional probability of (1),representing a target conditional probability;
according to the target conditional probability, probability inference is respectively carried out on each target user group, and a probability inference result of each target user group is obtained;
and respectively carrying out influence factor identification on each target user group according to the probability inference result to obtain target influence factors of each target user group.
7. The keyword-based index data analysis method of claim 1, wherein the performing model parameter optimization on the initial bayesian network according to the target influencing factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and performing network parameter update on the initial bayesian network through the optimal network parameter combination to obtain a target bayesian network comprises:
generating model parameter particles of the initial Bayesian network according to target influence factors of each target user group by a preset particle swarm optimization algorithm to obtain a plurality of initial network parameter particles;
Carrying out particle initialization on the plurality of initial network parameter particles to obtain the particle position and the particle speed of each initial network parameter particle;
the plurality of initial network parameters are updated by a preset particle updating functionThe particle position and the particle speed of the particles are updated in the searching direction to obtain the optimal network parameter combination, and the particle updating function comprises the following steps:;/>indicating the particle velocity of particle i at time t+1, -/->Represents the particle velocity of particle i at time t, w represents the inertial weight, c 1 And c 2 Represents the acceleration constant, r 1 And r 2 Representing random numbers, pbest i Representing the historical best position of the initial network parameter particles, gbest i Representing a global optimum position;
and updating the network nodes and the conditional probability table of the initial Bayesian network through the optimal network parameter combination to obtain a target Bayesian network.
8. A keyword-based index data analysis system, the keyword-based index data analysis system comprising:
the acquisition module is used for acquiring initial user learning index data of a plurality of target users based on a preset cloud platform, and carrying out keyword index analysis on the initial user learning index data to obtain target user learning index data;
The classification module is used for carrying out user classification on the plurality of target users according to the target user learning index data through a preset EM algorithm to obtain a plurality of target user groups;
the operation module is used for inputting the target user learning index data into a preset mixed Weibull distribution model to carry out mixed Weibull distribution parameter operation so as to obtain a distribution scale parameter and a distribution shape parameter;
the construction module is used for constructing variable sets of the distribution scale parameters and the distribution shape parameters according to the plurality of target user groups to obtain target variable sets of each target user group;
the analysis module is used for inputting the target variable set of each target user group into a preset initial Bayesian network to perform variable relation analysis so as to obtain target influence factors of each target user group;
the updating module is used for carrying out model parameter optimization on the initial Bayesian network according to target influence factors of each target user group through a preset particle swarm optimization algorithm to obtain an optimal network parameter combination, and carrying out network parameter updating on the initial Bayesian network through the optimal network parameter combination to obtain a target Bayesian network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311671020.7A CN117371876B (en) | 2023-12-07 | 2023-12-07 | Index data analysis method and system based on keywords |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311671020.7A CN117371876B (en) | 2023-12-07 | 2023-12-07 | Index data analysis method and system based on keywords |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117371876A true CN117371876A (en) | 2024-01-09 |
CN117371876B CN117371876B (en) | 2024-04-02 |
Family
ID=89391391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311671020.7A Active CN117371876B (en) | 2023-12-07 | 2023-12-07 | Index data analysis method and system based on keywords |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117371876B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814342A (en) * | 2020-07-16 | 2020-10-23 | 中国人民解放军空军工程大学 | Complex equipment reliability hybrid model and construction method thereof |
CN112733273A (en) * | 2021-01-14 | 2021-04-30 | 齐齐哈尔大学 | Method for determining Bayesian network parameters based on genetic algorithm and maximum likelihood estimation |
CN114314228A (en) * | 2020-09-29 | 2022-04-12 | 思维实创(哈尔滨)科技有限公司 | Calculation method of elevator maintenance period model based on big data |
WO2022179241A1 (en) * | 2021-02-24 | 2022-09-01 | 浙江师范大学 | Gaussian mixture model clustering machine learning method under condition of missing features |
CN115115159A (en) * | 2021-09-03 | 2022-09-27 | 电子科技大学 | TF-IDF and fuzzy Bayesian network-based risk prediction method |
CN115169579A (en) * | 2022-06-29 | 2022-10-11 | 上海浦东发展银行股份有限公司 | Method and device for optimizing machine learning model parameters and storage medium |
-
2023
- 2023-12-07 CN CN202311671020.7A patent/CN117371876B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814342A (en) * | 2020-07-16 | 2020-10-23 | 中国人民解放军空军工程大学 | Complex equipment reliability hybrid model and construction method thereof |
CN114314228A (en) * | 2020-09-29 | 2022-04-12 | 思维实创(哈尔滨)科技有限公司 | Calculation method of elevator maintenance period model based on big data |
CN112733273A (en) * | 2021-01-14 | 2021-04-30 | 齐齐哈尔大学 | Method for determining Bayesian network parameters based on genetic algorithm and maximum likelihood estimation |
WO2022179241A1 (en) * | 2021-02-24 | 2022-09-01 | 浙江师范大学 | Gaussian mixture model clustering machine learning method under condition of missing features |
CN115115159A (en) * | 2021-09-03 | 2022-09-27 | 电子科技大学 | TF-IDF and fuzzy Bayesian network-based risk prediction method |
CN115169579A (en) * | 2022-06-29 | 2022-10-11 | 上海浦东发展银行股份有限公司 | Method and device for optimizing machine learning model parameters and storage medium |
Non-Patent Citations (1)
Title |
---|
王小英 等: "正态分布和瑞利分布混合情形下的参数估计及分类问题", 数学建模及其应用, vol. 05, no. 03, pages 25 - 30 * |
Also Published As
Publication number | Publication date |
---|---|
CN117371876B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Quercia et al. | Tweetlda: supervised topic classification and link prediction in twitter | |
Al Hassanieh et al. | Similarity measures for collaborative filtering recommender systems | |
CN111831905B (en) | Recommendation method and device based on team scientific research influence and sustainability modeling | |
CN111242310B (en) | Feature validity evaluation method and device, electronic equipment and storage medium | |
Hofmann et al. | Fidelity, soundness, and efficiency of interleaved comparison methods | |
Liu et al. | Question quality analysis and prediction in community question answering services with coupled mutual reinforcement | |
WO2009005905A2 (en) | Granular data for behavioral targeting | |
CN109902823B (en) | Model training method and device based on generation countermeasure network | |
CN111611486B (en) | Deep learning sample labeling method based on online education big data | |
US11720808B2 (en) | Feature removal framework to streamline machine learning | |
CN109933720A (en) | A kind of dynamic recommendation method based on user interest Adaptive evolution | |
CN113435101B (en) | Particle swarm optimization-based power failure prediction method for support vector machine | |
Zhou et al. | Intelligent exploration for user interface modules of mobile app with collective learning | |
Shi et al. | Variable Selection for Mediators under a Bayesian Mediation Model | |
Carballo | Masters’ courses recommendation: Exploring collaborative filtering and singular value decomposition with student profiling | |
Ho et al. | Multi-objective parallel test-sheet composition using enhanced particle swarm optimization | |
CN117371876B (en) | Index data analysis method and system based on keywords | |
Kang et al. | Friend relationships recommendation algorithm in online education platform | |
Hutt et al. | How clumpy is my image? Evaluating crowdsourced annotation tasks | |
Babas et al. | You are what you consume: a Bayesian method for personalized recommendations | |
Kim et al. | Latent ranking analysis using pairwise comparisons in crowdsourcing platforms | |
Cornforth et al. | Cluster evaluation, description, and interpretation for serious games: player profiling in Minecraft | |
Klawonn et al. | Exploiting class learnability in noisy data | |
Sloan et al. | Iterative expectation for multi period information retrieval | |
Ramadhan et al. | Music Recommender System Based on Play Count Using Singular Value Decomposition++ |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |