CN106951471B - SVM-based label development trend prediction model construction method - Google Patents

SVM-based label development trend prediction model construction method Download PDF

Info

Publication number
CN106951471B
CN106951471B CN201710127478.4A CN201710127478A CN106951471B CN 106951471 B CN106951471 B CN 106951471B CN 201710127478 A CN201710127478 A CN 201710127478A CN 106951471 B CN106951471 B CN 106951471B
Authority
CN
China
Prior art keywords
label
labels
days
tag
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710127478.4A
Other languages
Chinese (zh)
Other versions
CN106951471A (en
Inventor
傅晨波
郑永立
李诗迪
宣琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201710127478.4A priority Critical patent/CN106951471B/en
Publication of CN106951471A publication Critical patent/CN106951471A/en
Application granted granted Critical
Publication of CN106951471B publication Critical patent/CN106951471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A construction method of a label development trend prediction model based on an SVM (support vector machine) comprises the following steps: (1) preprocessing a data set, counting post data of a website, and removing non-related data information; (2) selecting sample labels, counting the frequency of the labels after two years of new appearance, and extracting a popular label set and a non-popular label set; (3) constructing a directed network of tags; (4) extracting label characteristic data including network characteristics and related attribute characteristics of the label as training test data; (5) and training the data by adopting a Support Vector Machine (SVM) method, and constructing a label prevalence trend prediction model. The method considers the correlation among the labels, carries out prediction classification on the future development trend of the labels by combining the attribute characteristics with the network characteristics, and has higher precision for predicting potential popular labels. The method is not only beneficial to guiding the user to select reasonable labels, but also beneficial to providing higher-quality labels for website builders.

Description

SVM-based label development trend prediction model construction method
Technical Field
The invention relates to data mining and data analysis technologies, in particular to a construction method of a label development trend prediction model based on an SVM (support vector machine).
Background
With the rapid development of networks, more and more people choose to exchange information through the networks, but a large amount of information is simultaneously poured in, so that users are difficult to rapidly and efficiently screen the information, and therefore, network tags appear. The advent of network tags has greatly solved this problem. The label is composed of keywords closely related to the content, and can help people to conveniently describe and classify the content and facilitate information retrieval and sharing.
Meanwhile, the development trend and classification prediction of the tags are more and more concerned by people, and the popularity trend of the new tags after being proposed is often representative of the popularity trend of hotspots or directions in the field, which is a problem of great attention of website communities. For a website, the trend prediction and the label recommendation of a new label are effectively carried out, and the development of topics or emerging fields can be promoted. For the user, searching the content according to the popularity trend of the label can accurately find the development trend of the current field.
At present, the main basis for selecting the label of the information is the correlation degree of the information and the character of the label, the self attribute of the information initiator and the like. However, there are some disadvantages, mainly expressed in: (1) neglecting the potential prevalence trend of new tags; (2) correlation between tags is ignored; (3) cold content results in cold tags, making the information effectively searchable; (4) only a few characteristics are considered, so that the selection of partial labels tends to be one-sided.
Therefore, in order to enable the user to better select the tags when publishing information, the tags with potential popularity are selected as much as possible. The invention provides a construction method of a label development trend prediction model based on an SVM (support vector machine), which solves the following two basic problems: (1) extracting network characteristics and related attribute characteristics at the initial stage of label formation to quantitatively depict the development trend of the label; (2) and predicting the future development trend of the new label.
Disclosure of Invention
In order to improve the management of a website on network community tags and the prediction of the development trend of new tags, the defect of the current prediction on the popularity of the tags is overcome. The invention provides a construction method of a label development trend prediction model based on an SVM (support vector machine), which not only combines network characteristics among labels, but also extracts attribute characteristics of the labels in an early stage to train and predict.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a construction method of a label development trend prediction model based on an SVM (support vector machine) comprises the following steps:
step 1: data preprocessing, namely collecting information content of a website community and corresponding label data, sequencing the data content according to time, and taking data after the community is formed for N days to ensure that a label network of the community is formed preliminarily;
step 2, selecting sample labels, counting the data set, obtaining community label frequency and sequencing, taking the labels with the first α% as popular labels, and recording the collection as Upop(ii) a Selecting a label which is compared with the time of the popular label as a non-popular label from the rest labels;
and step 3: and constructing a label network, and regarding a plurality of labels appearing in the same information content, considering that the labels have a relationship, so that a connecting edge is formed between every two labels. Traversing all information in the website community to obtain a label network graph G with the right to have network without directionTagWherein, the node is a new label, the connecting edge is the relation between labels, and the weight of the network is the frequency of the common appearance of the node and the label;
and 4, step 4: extracting characteristic data, and setting the sample label set U as { U ═ Upop,UunpopExtracting network characteristics and attribute characteristics M days after the first creation of the inner label, and establishing a sample training data set;
and 5: and (3) adopting a machine learning classifier model to support a vector machine (SVM), selecting a kernel function, training to generate a popular label prediction model based on the SVM, and performing cross validation by ten folds to obtain the test precision.
Further, in the step 1, data after N days is selected as preprocessed data, wherein the selection of N follows the rule: it is ensured that the first 10% of the tag data in the web site has been generated within N days, i.e. the tag network in the web site has been preliminarily formed.
Further, in the step 2, the sample tag data is selected, the tags are arranged in a frequency descending order, and the set is recorded as
Figure BDA0001238889800000022
Selecting
Figure BDA0001238889800000021
The labels with the middle proportion of top α% serve as popular labels, and the set of the popular labels is marked as UpopTaking all the labels with the label proportion of post β% as a non-popular label set, and recording the set as Qunpop. For each popular label tpop∈UpopSearch and tag tpopThe tag with the latest creation time of (1) is marked as tunpopWhile satisfying tunpop∈QunpopAs a non-popular label, the label,in contrast to the popular tag data, the set is denoted as Uunpop
Further, in step 4, for extracting the network features of the tags, M is 30, and the network features mainly include:
1) relative centrality within 30 days after new label submission: label tiValue D ofiThe calculation adopts a mode of removing isolated nodes, and the calculation formula is as follows:
Figure BDA0001238889800000031
wherein N represents the total number of tags in the network; a isijElements representing the network adjacency matrix, if the label tiAnd tjWith connecting edges, then aij1 is ═ 1; otherwise aij=0;
Label tiCalculating the characteristic of degree centrality, and taking the label t in the networkiRelative centrality of (a):
Figure BDA0001238889800000032
wherein D isiIndicates the label tiA value of (d);
2) neighbor mean centrality, tag t, within 30 days after new tag is proposediOf (2) neighbor mean degree NCiIs calculated as follows:
Figure BDA0001238889800000033
wherein N isneighborIndicates the label tiThe number of the neighbor nodes of (1),
Figure BDA0001238889800000034
indicates the label tiThe sum of the neighbor node values;
3) relative recentness of approach within 30 days after new label extraction, label tiThe approximate centrality of the label t is also calculatediRelative recenterness of (d):
Figure BDA0001238889800000035
wherein d isijIndicates the label tiAnd a label tjThe distance of (a) to (b),
Figure BDA0001238889800000036
indicates the label tiAverage geodesic distance to neighbor tag nodes;
4) feature vector centrality, tag t, within 30 days after new tag extractioniThe feature vector centrality of (2) is calculated as follows:
Figure BDA0001238889800000037
wherein η is a proportionality constant, and A ═ aijwij) Is a weighted network adjacency matrix, where wijIndicates the label tiAnd tjAnd has a weight of wij=wji. Let x be ═ x1x2… xN]TThen equation (5) can be written in the form of a matrix as follows:
x=ηAx, (6)
x is the maximum eigenvalue η of the modulus of the matrix A-1The feature vector under the correspondence is also called as feature vector centrality;
5) node clustering coefficient within 30 days after new label extraction, label tiThe clustering coefficient of (c) is calculated as follows:
Figure BDA0001238889800000041
wherein E isiIndicates the label tiK of (a)iNumber of edges, k, actually existing between the neighboring label nodesi(ki-1)/2 represents a label tiK of (a)iThe maximum number of edges that may exist between neighboring nodes.
In step 4, the attribute feature extraction includes: 4.1) all answers to the question are included within 30 days after the new label is presented; 4.2) average number of answers and average number of questions and average time lapse before all the contributors and respondents participating in the tag for 30 days; 4.3) average question answer response time of the label within 30 days; 4.4) the number of all participating users of the tag within 30 days, i.e. the sum of the questioners and the respondents of the question; 4.5) average number of words containing all the questions of the tag within 30 days; 4.6) counting the number of praise of all problems in 30 days of the label;
the calculation method of the average answer response time of the questions of the labels is as follows:
let 30 days contain the label tiThe number of problems of
Figure BDA0001238889800000047
Label t within 30 daysiThe number of answers to the s-th question of
Figure BDA0001238889800000042
Label tiS question creation time
Figure BDA0001238889800000043
Counting the creation time of the v-th answer
Figure BDA0001238889800000044
Calculating the response time difference, and averaging the difference values of all the questions and answers
Figure BDA0001238889800000045
The calculation formula is as follows:
Figure BDA0001238889800000046
in the step 5, the support vector machine SVM two-classification model is constructed by the following process:
first, the selection of the kernel function is determined using a Gaussian kernel RBF, i.e., sample tiAnd tjBy using the inner products of the feature space to pass through the original sample spaceFunction k (t)i,tj) Calculated, the expression is as follows:
Figure BDA0001238889800000051
where δ represents the bandwidth of the gaussian kernel.
And then searching the optimal parameter value of the SVM model through a grid algorithm, performing ten-fold cross validation, performing multiple tests and averaging to obtain the precision index of the SVM-based label prevalence trend prediction model.
The invention has the beneficial effects that: compared with the prior art, the SVM-based label development trend prediction model can predict the development trend of the newly appeared label, the problem of neglecting the newly appeared cold label in label recommendation is solved, and the label recommendation is more reasonable and effective.
Drawings
FIG. 1 is a flow chart of the programming of the present invention;
FIG. 2 is a construction process of a label trend prediction model based on SVM in the invention.
Detailed Description
The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings.
Referring to fig. 1 and 2, the invention provides a construction method of a label development trend prediction model based on an SVM (support vector machine). according to the method, instance analysis is performed on a Stackoverflow data set, and original data comprises information such as creation time of each post, post ID, user ID, post label and the like. Taking a label of a problem as an example in the patent, we extract the first creation time of the label, the ID of a label presenter, information of its neighbor labels, and the like.
The invention is divided into the following five steps:
step 1: screening and preprocessing a data set;
step 2: selecting sample label data;
and step 3: constructing a label network;
and 4, step 4: extracting characteristic data of the sample label;
and 5: and constructing and training a prediction model based on the SVM label prevalence trend.
In the step 1, the specific operation process is as follows: selecting information content and corresponding label data of the website, selecting the website to start 3 months after the website is established, preliminarily forming a label network of the website, counting the frequency of newly appeared labels, and then sequencing;
in the step 2, the specific operation process of the screening of the sample label data is as follows:
firstly, selecting popular label samples, sorting label frequencies in a descending order, and recording the label frequencies as a set
Figure BDA0001238889800000052
Selecting
Figure BDA0001238889800000061
The labels with the middle proportion of the first 5 percent are taken as popular labels, and the set of the popular labels is marked as Upop
Secondly, selecting non-popular label samples and taking a set
Figure BDA0001238889800000062
The tags with the middle proportion of the last 85 percent are taken as a non-popular tag set, and the set is marked as Qunpop. For each popular label tpop∈UpopSearch and tag tpopTag t with the latest creation time ofunpopWhile satisfying tunpop∈QunpopAs a non-popular label, i.e. forming a temporal contrast, the set is denoted as Uunpop(ii) a Finally, taking U as { U ═ U-pop,UunpopAs sample label data;
in the step 3, the specific operation process of constructing the tag network is as follows: traversing all information contents of the community data, and if the tags appear in the same information record at the same time, indicating that the two tags have connection, namely the two tags have connection edges, so as to construct the authorized and undirected network G of the tagsTagThe weight represents the number of times two tags appear simultaneously.
Said step (c) isIn 4, extracting the tag network feature data, as shown in fig. 1, the sample tag set U ═ Upop,UunpopIn the direction of the network G with the right to have no rightTagUpper, extract its inner label tiAt the time of first proposal
Figure BDA0001238889800000063
The network characteristics of the next M days, M is 30,. The specific operation process is as follows:
1) relative centrality within 30 days after new label submission: label tiValue D ofiThe calculation adopts a mode of removing isolated nodes, and the calculation formula is as follows:
Figure BDA0001238889800000064
wherein N represents the total number of tags of the network; a isijElements representing the network adjacency matrix, if the label tiAnd tjWith connecting edges, then aij1 is ═ 1; otherwise aij=0;
Label tiCalculating the characteristic of degree centrality, and taking the label t in the networkiRelative centrality of (a):
Figure BDA0001238889800000065
wherein D isiIndicates the label tiA value of (d);
2) neighbor mean centrality, tag t, within 30 days after new tag is proposediOf (2) neighbor mean degree NCiIs calculated as follows:
Figure BDA0001238889800000071
wherein N isneighborIndicates the label tiThe number of the neighbor nodes of (1),
Figure BDA0001238889800000072
indicates the label tiSum of neighbor node degree values。
3) Relative recentness of approach within 30 days after new label extraction, label tiThe approximate centrality of the label t is also calculatediRelative recenterness of (d):
Figure BDA0001238889800000073
wherein d isijIndicates the label tiAnd a label tjThe distance of (a) to (b),
Figure BDA0001238889800000074
indicates the label tiAverage geodesic distance to neighboring tag nodes.
4) Feature vector centrality, tag t, within 30 days after new tag extractioniThe feature vector centrality of (2) is calculated as follows:
Figure BDA0001238889800000075
wherein η is a proportionality constant, and A ═ aijwij) Is a weighted network adjacency matrix, where wijIndicates the label tiAnd tjAnd has a weight of wij=wji. Let x be ═ x1x2… xN]TThen equation (14) can be written in the form of a matrix as follows:
x=ηAx, (6)
x is the maximum eigenvalue η of the modulus of the matrix A-1The feature vector under the correspondence is also referred to as feature vector centrality.
5) Node clustering coefficient within 30 days after new label extraction, label tiThe clustering coefficient of (c) is calculated as follows:
Figure BDA0001238889800000076
wherein E isiIndicates the label tiK of (a)iNumber of edges, k, actually existing between the neighboring nodesi(ki-1)/2 represents a label tiK of (a)iThe maximum number of edges that may exist between neighboring nodes.
In the step 4, the sample label data attribute features are extracted, and the label t is subjected to extractioniE.g. U, extracting its first extraction time
Figure BDA0001238889800000077
The following characterization procedure for 30 days was as follows:
4.1) including the tag t within 30 days of extractioniAll questions in (1), the collection of which is noted
Figure BDA0001238889800000081
4.2) finding a set of problems
Figure BDA0001238889800000082
The problem in (1) is solved, the set is recorded as
Figure BDA0001238889800000083
All respondents in question, in the aggregate
Figure BDA0001238889800000084
The total number of praise in all problems is recorded as
Figure BDA0001238889800000085
4.3) statistics include the tag tiProblem of all the problems mentioned
Figure BDA0001238889800000086
And respondents
Figure BDA0001238889800000087
Average answer number before the current time, average question data;
4.4) statistics of the tag tiThe corresponding average number of praise questions, and the average number of participators of the label, namely the sum of the number of respondents and the number of presenters.
4.5) SystemCalculating the average answer response time of the questions corresponding to the labels within 30 days, and including the labels t within 30 daysiThe number of problems of
Figure BDA0001238889800000088
Tag t within 30 daysiThe number of answers to the s-th question of
Figure BDA0001238889800000089
Label tiS question creation time
Figure BDA00012388898000000810
Counting the creation time of the v-th answer
Figure BDA00012388898000000811
Calculating the response time difference, and averaging the difference values of all the questions and answers
Figure BDA00012388898000000812
The calculation formula is as follows:
Figure BDA00012388898000000813
in the step 5, the construction and training of the label prevalence trend prediction model based on the SVM have the following specific operation processes: first, the selection of the kernel function is determined using a Gaussian kernel RBF, i.e., sample tiAnd tjInner products between feature spaces using them in the original sample space through a function k (t)i,tj) Calculated, the expression is as follows:
Figure BDA00012388898000000814
where δ represents the bandwidth of the gaussian kernel;
and then searching the optimal parameter value of the SVM model through a grid algorithm, and then performing a ten-fold cross validation mode, namely randomly dividing the data into 10 parts, sequentially taking 1 part as a test sample, and taking the remaining 9 parts as training samples to obtain the SVM-based label prevalence trend prediction model.
As described above, by constructing the label network and then extracting the network characteristics and the attribute characteristics of the labels within 30 days after the labels are firstly proposed, the prediction model of the future development trend of the labels based on the SVM is constructed, so that the reasonable prediction is provided for the newly appeared labels in the websites, and the future label recommendation and knowledge information propagation have important significance.

Claims (5)

1. A construction method of a label development trend prediction model based on an SVM is characterized by comprising the following steps:
step 1: data preprocessing, namely collecting information content of a website community and corresponding label data, sequencing the data content according to time, and taking data after the community is formed for N days to ensure that a label network of the community is formed preliminarily;
step 2, selecting sample labels, counting the data set, obtaining community label frequency and sequencing, taking the labels with the first α% as popular labels, and recording the collection as Upop(ii) a Selecting a label which is compared with the time of the popular label as a non-popular label from the rest labels;
and step 3: constructing a label network, regarding labels appearing in the same information content, namely considering that the labels have a relationship, and forming a connecting edge between every two labels; traversing all the information to obtain a label network graph G which is entitled to have undirected networkTagWherein, the node is a new label, the connecting edge is the relation between labels, and the weight of the network is the frequency of the common appearance of the node and the label;
and 4, step 4: extracting characteristic data, and setting the sample label set U as { U ═ Upop,UunpopExtracting network characteristics and attribute characteristics of M days after the first creation of the inner label, and establishing a sample training data set;
in the step 4, extracting the network characteristics of the sample label, wherein M is 30, and the network characteristics include the following modes:
1) after new label is proposedCentrality of relativity over 30 days: label tiValue D ofiThe calculation adopts a mode of removing isolated nodes, and the calculation formula is as follows:
Figure FDA0002373497950000011
wherein N represents the total number of tags in the network; a isijElements representing the network adjacency matrix, i.e. if the label tiAnd tjWith connecting edges, then aij1, otherwise aij=0;
Label tiCalculating the characteristic of degree centrality, and taking the label t in the networkiRelative centrality of (a):
Figure FDA0002373497950000012
wherein D isiIndicates the label tiA value of (d);
2) neighbor mean centrality, tag t, within 30 days after new tag is proposediOf (2) neighbor mean degree NCiIs calculated as follows:
Figure FDA0002373497950000021
wherein N isneighborIndicates the label tiThe number of the neighbor nodes of (1),
Figure FDA0002373497950000022
indicates the label tiThe sum of the neighbor node values;
3) relative recentness of approach within 30 days after new label extraction, label tiThe approximate centrality of the label t is also calculatediRelative recenterness of (d):
Figure FDA0002373497950000023
wherein d isijIndicates the label tiAnd a label tjThe distance of (a) to (b),
Figure FDA0002373497950000024
indicates the label tiAverage geodesic distance to neighbor tag nodes;
4) feature vector centrality, tag t, within 30 days after new tag extractioniThe feature vector centrality of (2) is calculated as follows:
Figure FDA0002373497950000025
wherein η is a proportionality constant, and A ═ aijwij) Is a weighted network adjacency matrix, where wijIndicates the label tiAnd tjAnd has a weight of wij=wjiLet x be ═ x1x2…xN]TThen equation (5) can be written in the form of a matrix as follows:
x=ηAx, (6)
x is the matrix A is the eigenvalue η-1The feature vector under the correspondence is also called as feature vector centrality;
5) node clustering coefficient within 30 days after new label extraction, label tiThe clustering coefficient of (c) is calculated as follows:
Figure FDA0002373497950000026
wherein E isiIndicates the label tiK of (a)iNumber of edges, k, actually existing between the neighboring label nodesi(ki-1)/2 represents a label tiK of (a)iThe maximum number of edges possibly existing between the neighbor nodes;
and 5: and (3) adopting a machine learning classifier model to support a vector machine (SVM), selecting a kernel function, training to generate a label prevalence trend prediction model based on the SVM, and performing cross validation by ten folds to obtain a model result.
2. The construction method of the SVM-based label development trend prediction model as claimed in claim 1, wherein: in the step 1, data after N days is selected as preprocessed data, wherein the selection of N follows the following rule: it is ensured that the first 10% of the tag data in the web site has been generated within N days, i.e. the tag network in the web site has been preliminarily formed.
3. The construction method of the SVM-based label development trend prediction model according to claim 1 or 2, characterized in that: in the step 2, the sample label data is selected, the labels are arranged in a descending order of frequency, and the set is recorded as
Figure FDA0002373497950000036
Selecting
Figure FDA0002373497950000037
The labels with the middle proportion of top α% serve as popular labels, and the set of the popular labels is marked as UpopTaking all the labels with the label proportion of rear β% as a non-popular label set, and recording the set as QunpopFor each popular label tpop∈UpopSearch and tag tpopThe tag with the latest creation time of (1) is marked as tunpopWhile satisfying tunpop∈QunpopAs a non-popular label, to be a comparison of popular labels, the set thereof is denoted as Uunpop
4. The method for constructing the SVM-based label development trend prediction model according to claim 1 or 2, wherein in the step 4, the attribute features of the sample labels are extracted, and the extracting of the attribute features comprises the following steps:
4.1) all answers to the question containing the new label within 30 days after the new label is presented;
4.2) average number of answers and average number of questions and average time lapse before all the contributors and respondents participating in the tag for 30 days;
4.3) flatness of the label within 30 daysAnswer response time of uniform question
Figure FDA0002373497950000031
The calculation method is as follows:
let 30 days contain the label tiThe number of problems of
Figure FDA0002373497950000038
Tag t within 30 daysiThe number of answers to the s-th question of
Figure FDA0002373497950000032
Label tiS question creation time
Figure FDA0002373497950000033
Counting the creation time of the v-th answer
Figure FDA0002373497950000034
Calculating the response time difference, and averaging the difference values of all the questions and answers
Figure FDA0002373497950000035
The calculation formula is as follows:
Figure FDA0002373497950000041
4.4) the number of all participating users of the tag within 30 days, i.e. the sum of the questioners and the respondents of the question;
4.5) average word length of all the problems containing the tag within 30 days;
4.6) the number of praise containing all the problems of the tag within 30 days.
5. The construction method of the SVM-based label development trend prediction model according to claim 1 or 2, characterized in that: in the step 5, the support vector machine SVM binary model is constructed by the following process:
first, the selection of the kernel function is determined using a Gaussian kernel RBF, i.e., sample tiAnd tjInner products between feature spaces using them in the original sample space through a function k (t)i,tj) Calculated, the expression is as follows:
Figure FDA0002373497950000042
where δ represents the bandwidth of the gaussian kernel;
and then searching the optimal parameter value of the SVM model through a grid algorithm, performing ten-fold cross validation, performing multiple tests and averaging to obtain the precision index of the SVM-based label prevalence trend prediction model.
CN201710127478.4A 2017-03-06 2017-03-06 SVM-based label development trend prediction model construction method Active CN106951471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710127478.4A CN106951471B (en) 2017-03-06 2017-03-06 SVM-based label development trend prediction model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710127478.4A CN106951471B (en) 2017-03-06 2017-03-06 SVM-based label development trend prediction model construction method

Publications (2)

Publication Number Publication Date
CN106951471A CN106951471A (en) 2017-07-14
CN106951471B true CN106951471B (en) 2020-05-05

Family

ID=59466669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710127478.4A Active CN106951471B (en) 2017-03-06 2017-03-06 SVM-based label development trend prediction model construction method

Country Status (1)

Country Link
CN (1) CN106951471B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544944B (en) * 2017-09-04 2020-06-02 江西理工大学 Graph theory-based support vector machine kernel function selection method and application thereof
CN107644268B (en) * 2017-09-11 2021-08-03 浙江工业大学 Open source software project incubation state prediction method based on multiple features
CN108681585A (en) * 2018-05-14 2018-10-19 浙江工业大学 A kind of construction method of the multi-source transfer learning label popularity prediction model based on NetSim-TL
CN108764537B (en) * 2018-05-14 2021-11-23 浙江工业大学 A-TrAdaboost algorithm-based multi-source community label development trend prediction method
CN110413657B (en) * 2019-07-11 2021-08-17 东北大学 Average response time evaluation method for seasonal non-stationary concurrency
CN112988978B (en) * 2021-04-27 2024-03-26 河南金明源信息技术有限公司 Case trend analysis system in important field of public service litigation
CN113220855B (en) * 2021-05-27 2022-07-22 浙江大学 Computer technology field development trend analysis method based on IT technical question-answering website
CN113869609A (en) * 2021-10-29 2021-12-31 北京宝兰德软件股份有限公司 Method and system for predicting confidence of frequent subgraph of root cause analysis
CN114580588B (en) * 2022-05-06 2022-08-12 江苏省质量和标准化研究院 UHF RFID group tag type selection method based on probability matrix model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183887A (en) * 2015-09-28 2015-12-23 北京奇虎科技有限公司 Data processing method based on browser and browser device
CN105550275A (en) * 2015-12-09 2016-05-04 中国科学院重庆绿色智能技术研究院 Microblog forwarding quantity prediction method
CN105654122A (en) * 2015-12-28 2016-06-08 江南大学 Spatial pyramid object identification method based on kernel function matching
CN105787049A (en) * 2016-02-26 2016-07-20 浙江大学 Network video hotspot event finding method based on multi-source information fusion analysis
CN106447505A (en) * 2016-09-26 2017-02-22 浙江工业大学 Implementation method for effective friend relationship discovery in social network
CN106446191A (en) * 2016-09-30 2017-02-22 浙江工业大学 Logistic regression based multi-feature network popular tag prediction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9875237B2 (en) * 2013-03-14 2018-01-23 Microsfot Technology Licensing, Llc Using human perception in building language understanding models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183887A (en) * 2015-09-28 2015-12-23 北京奇虎科技有限公司 Data processing method based on browser and browser device
CN105550275A (en) * 2015-12-09 2016-05-04 中国科学院重庆绿色智能技术研究院 Microblog forwarding quantity prediction method
CN105654122A (en) * 2015-12-28 2016-06-08 江南大学 Spatial pyramid object identification method based on kernel function matching
CN105787049A (en) * 2016-02-26 2016-07-20 浙江大学 Network video hotspot event finding method based on multi-source information fusion analysis
CN106447505A (en) * 2016-09-26 2017-02-22 浙江工业大学 Implementation method for effective friend relationship discovery in social network
CN106446191A (en) * 2016-09-30 2017-02-22 浙江工业大学 Logistic regression based multi-feature network popular tag prediction method

Also Published As

Publication number Publication date
CN106951471A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
CN106951471B (en) SVM-based label development trend prediction model construction method
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
Yang et al. Tag-based expert recommendation in community question answering
Wang et al. A data-driven network analysis approach to predicting customer choice sets for choice modeling in engineering design
CN111523055B (en) Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency
CN108415913A (en) Crowd's orientation method based on uncertain neighbours
CN109582875A (en) A kind of personalized recommendation method and system of online medical education resource
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN107391577B (en) Work label recommendation method and system based on expression vector
Gu et al. [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management
CN115809376A (en) Intelligent recommendation method based on big teaching data
Caruana et al. Mining citizen science data to predict orevalence of wild bird species
CN109783805A (en) A kind of network community user recognition methods and device
CN116738066A (en) Rural travel service recommendation method and device, electronic equipment and storage medium
CN111581435A (en) Video cover image generation method and device, electronic equipment and storage medium
CN109597944B (en) Single-classification microblog rumor detection model based on deep belief network
CN112835960B (en) Data analysis method and system for digital exhibition
CN116910628B (en) Creator expertise portrait assessment method and system
CN116633639B (en) Network intrusion detection method based on unsupervised and supervised fusion reinforcement learning
CN104516873A (en) Method and device for building emotion model
CN112632275B (en) Crowd clustering data processing method, device and equipment based on personal text information
CN112364171A (en) Novel knowledge graph entity portrait method
CN114218445A (en) Anomaly detection method based on dynamic heterogeneous information network representation of metagraph
CN104317912B (en) Image meaning automatic marking method based on neighborhood and learning distance metric

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant