CN106776868A - A kind of restaurant score in predicting method based on multiple linear regression model - Google Patents
A kind of restaurant score in predicting method based on multiple linear regression model Download PDFInfo
- Publication number
- CN106776868A CN106776868A CN201611071151.1A CN201611071151A CN106776868A CN 106776868 A CN106776868 A CN 106776868A CN 201611071151 A CN201611071151 A CN 201611071151A CN 106776868 A CN106776868 A CN 106776868A
- Authority
- CN
- China
- Prior art keywords
- restaurant
- comment
- data
- star
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012417 linear regression Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 33
- 230000008451 emotion Effects 0.000 claims abstract description 18
- 238000011156 evaluation Methods 0.000 claims description 26
- 238000012552 review Methods 0.000 claims description 19
- 230000001419 dependent effect Effects 0.000 claims description 12
- 235000013305 food Nutrition 0.000 claims description 7
- 235000010627 Phaseolus vulgaris Nutrition 0.000 claims description 6
- 244000046052 Phaseolus vulgaris Species 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 206010068052 Mosaicism Diseases 0.000 claims description 4
- 238000007689 inspection Methods 0.000 claims description 4
- 210000003765 sex chromosome Anatomy 0.000 claims description 4
- 230000003455 independent Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000007935 neutral effect Effects 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000007418 data mining Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
A kind of restaurant score in predicting method based on multiple linear regression model, belongs to Data Mining.The analysis of the factors such as the characteristics of by the length evaluated content of text, evaluate to user, the emotion value evaluated, restaurant current average star, user, extracts the characteristic value based on above-mentioned analysis.With multiple linear regression model, the relation between the star that each feature and end user are given is obtained.The method of the present invention is:According to data set, selected characteristic sets up forecast model by linear regression method.The analysis of the factors such as the characteristics of present invention is by the length evaluated content of text, evaluate to appraiser, the emotion value evaluated, restaurant current average star, appraiser, the relation between the star that each factor and last appraiser provide is obtained, the restaurant for not yet having star such that it is able to deduce can obtainable star.
Description
Technical field
It is more particularly to a kind of based on multiple linear regression model the present invention relates to data mining and data analysis technique
Restaurant score in predicting method.
Background technology
Star is the overall merit to restaurant, and the star in restaurant depends greatly on appraiser and the subjectivity in restaurant is commented
Valency.Therefore by evaluate text analysis come prediction and evaluation people by star to be given, by the evaluation text to appraiser
The analysis of the factors such as the characteristics of content, the length evaluated, the emotion value evaluated, current average star, appraiser in restaurant, so that
Obtain the relation between the star that each factor and last appraiser provide.
Linear regression algorithm is important algorithm in Data Mining, and it passes through data-oriented collection D={ (x1,
y1),(x2,y2),...,(xm,ym), wherein xi=(xi1;xi2;...;xid),Attempt to obtain a linear model with
Real-valued output token is predicted as precisely as possible.
With sharply increasing for data volume, on UGC (User Generated Content users original content) website
Comment and other objective condition that user leaves, these data are basic as the scoring for constituting UGC, by these data, I
Restaurant star can be made prediction, the general method that we can take linear regression.Simple linear regression method be for
One independent variable of measurement is to the influence degree of dependent variable.
The content of the invention
In order to the reliability for overcoming the shortcomings of existing restaurant score in predicting mode is poor, the present invention proposes one kind and is based on
The restaurant score in predicting method of multiple linear regression model.On UGC classes website, user can be carried out by the experience of itself to trade company
Scoring and evaluation.And each user can provide comment after scoring.The length of the comment word of each user, attached feelings
The characteristics of sense, current star in restaurant and user itself, all have impact on the scoring situation that user can provide.User is finally given
Scoring have direct relation with its comment write, so by analyze its comment on each feature can predict to a certain extent
Scoring (star in other words).The method (is added by choosing some indexs in the website of restaurant from the feature directly provided in website
Our semantic analyses upper obtain subjectivity and polarity), carry out equation of linear regression modeling so that for the star in restaurant is provided can
For the formula predicted.
The technical solution adopted for the present invention to solve the technical problems is as follows:
A kind of restaurant star evaluation method based on linear regression, comprises the following steps:
S1:Data are captured from food and drink website, and data are analyzed, finally obtain three tables of data of correlation, point
It is not user, business, review this three tables;
S2:The user comment data of correlation are extracted in review tables, the sentimental polarity and subjectivity of comment text is analyzed,
The sentimental polarity includes commendation, neutral or derogatory sense;
S3:In the subjectivity and polarity that feature and semantic analysis that website provides are obtained, while considering user and restaurant
Influence to score in predicting, selects the characteristic variable for needing;
S4:Related tables of data is imported in database, the data of the characteristic variable that we select are obtained with SQL statement
Collection, and data set is divided into several smaller data sets;
S5:For the data for obtaining, the Confidence Analysis of Cronbach equalization data are carried out, obtain confidence level higher
Data as the data sample of analysis, the data set of alpha coefficients more than predetermined threshold value is chosen, if in the absence of such data
Collection is transferred to S3;
S6:Structural theory model, the relation between each independent variable and dependent variable that sets be it is linear, it is polynary so as to set up
Linear regression model (LRM), carries out multiple linear regression treatment and obtains data by instrument;
S7:Model is tested, an index is degree of fitting, set fit threshold as fitting degree is very high, second
Index is checked for DW, and metrics-thresholds are more than by T significant indexes, and the index to setting is screened, and obtains regression equation, no
If cannot then obtain the model that we want, S3 is transferred to;
S8:Moving model, carries out collinearity diagnostics, checks VIF variance expansion factors, judges if VIF is less than threshold value
Do not exist synteny between independent variable, otherwise we need to carry out principal component analysis and process conllinear sex chromosome mosaicism, post analysis residual error,
If residual error is unsatisfactory for requirement and is transferred to S3;
S9:If meeting the requirement of above-mentioned steps, illustrate that the equation of linear regression model meets the data set, using obtaining
Equation of linear regression, in combination with user and restaurant information, draw the evaluation star in the restaurant for not yet having star.
Technology design of the invention is:Multiple linear regression has multiple independents variable or returns unit.Commented for influence restaurant
The characteristic variable divided, by linear regression, it becomes possible to predict corresponding scoring.
In multivariate regression models, we also need to carry out statistical diagnosis to model, typically there is residual values
(residuals), lever value (leverage), studentized residuals (residuals of studentized) and influential cases
(cook), corresponding statistic is optimized to model., it is necessary to numeric type data, nominal type data will when with the Return Law
Two-value type data are changed into, therefore we evaluate user and have done a semantic analysis.
On website is evaluated, user can be evaluated and be provided scoring to the restaurant patronized, and the comment that they are given is very
Last scoring is influenceed in big degree, and user often values the scoring in restaurant when restaurant is found.Evaluate text and user
The star for being given is closely related, and the comment of user belongs to a kind of natural language, when the evaluation text to user is analyzed, I
By python natural language bag, obtain evaluate text length and evaluate emotion value.User will necessarily in comment
The adjective of emotion is described using some, adverbial word, the punctuation mark of emotion intensity is showed, by capturing this series of keyword
The emotion value that remittance can quantize included in comment, the emotion of such user is melted into qualitatively data with regard to energy.Natural language
Kit (Natural Language Toolkit), it is one and Academic word technology is applied into text data set
Python storehouses.We can obtain the polarity (commendation, neutrality or derogatory sense) and subjectivity the two attributes of user's evaluation.
Beneficial effects of the present invention are as follows:By the length evaluated content of text, evaluate to appraiser, the emotion evaluated
The analysis of the factors such as the characteristics of value, current average star, appraiser in restaurant, obtains what each factor and last appraiser provided
Relation between star, the restaurant for not yet having star such that it is able to deduce can obtainable star.
Brief description of the drawings
Fig. 1 is the regression modeling flow chart of steps of the restaurant star evaluation method based on linear regression model (LRM);
Fig. 2 is standardized residual histogram;
Fig. 3 is standardization predicted value-standardized residual scatter diagram;
Fig. 4 is the normal state Q-Q figures for returning standardized residual.
Specific embodiment
The present invention will be further described below in conjunction with the accompanying drawings.
1~Fig. 4 of reference picture, a kind of restaurant star evaluation method based on linear regression model (LRM), this patent is studying yelp
In user and restaurant as a example by, the characteristics of the original data record information in each restaurant, user and user evaluates text
Information, individual features carry out the modeling analysis of restaurant star.
Implementation below combination accompanying drawing is described in detail to the present invention, as shown in figure 1, the present invention includes following step
Suddenly:
S1:We capture data from food and drink website, and data are analyzed, and finally obtain three data of correlation
Table, is respectively user, business, review this three tables;
S2:The user comment data of correlation are extracted in review tables, the sentimental polarity and subjectivity of comment text is analyzed,
Sentimental polarity includes commendation, neutral or derogatory sense;
S3:In the subjectivity and polarity that feature and semantic analysis that website provides are obtained, while considering user and restaurant
Influence to score in predicting, the characteristic variable for selecting us to need;
S4:Related tables of data is imported in database, the data of the characteristic variable that we select are obtained with SQL statement
Collection, and data set is divided into several smaller data sets;
S5:For the data for obtaining, the Confidence Analysis of Cronbach equalization data are carried out, data are disturbed in removal,
Data sample of the confidence level data higher as analysis is obtained, alpha coefficients is chosen more than 0.5 (predetermined threshold value is 0.5)
Data set, if being transferred to S3 in the absence of such data set;
S6:Structural theory model, the relation between each independent variable and dependent variable that sets be it is linear, it is polynary so as to set up
Linear regression model (LRM), carries out multiple linear regression treatment and obtains data by instrument here;
S7:Model is tested, an index is degree of fitting, degree of fitting 60% (fit threshold takes 60%) is fitting journey
Degree is very high, and second index is checked for DW, and 0.05 (metrics-thresholds are 0.05) is more than by T significant indexes, and we are set
Index screened, obtain regression equation, else if the model that we want cannot be obtained, be transferred to S3;
S8:Moving model, carries out collinearity diagnostics, mainly sees VIF variance expansion factors, if VIF is less than 5, and (threshold value takes
5) then judge do not exist synteny between independent variable, otherwise we need to carry out the conllinear sex chromosome mosaicism of principal component analysis treatment, afterwards
Analysis residual error, if residual error is unsatisfactory for requirement and is transferred to S3;
S9:If meeting the requirement of above-mentioned steps, illustrate that the equation of linear regression model meets the data set, using obtaining
Equation of linear regression, in combination with user and restaurant information, it can be deduced that not yet there is the evaluation star in the restaurant of star.
In the step S1, on UGC classes website, user can be scored and be evaluated by the experience of itself to trade company.And
Each user can provide comment after scoring.The length of the comment word of each user, attached emotion, the current star in restaurant
The characteristics of level and user itself, all have impact on the scoring situation that user can provide.The user scoring for being finally given and commenting that it is write
By there is direct relation, so can to a certain extent predict scoring (star in other words by analyzing its each feature commented on
Level).We capture three data forms user, business, review from food and drink website.User tables are user profile, bag
Include that user's bean vermicelli number, the average star of user are commented and user evaluates the information such as number.Business tables are restaurant information, including restaurant review
The information such as number, restaurant star.Review tables are comment information, including comment on cool, comment funny, comment useful, comment star
Comment and evaluate the information such as text;
In the step S2, user can describe the adjective of emotion in comment using some, show the pair of emotion intensity
Word, punctuation mark, by capturing the emotion value that this series of key vocabularies can quantize included in comment.Because with return
, it is necessary to numeric type data by nominal type data, it is necessary to change into two-value type data, therefore we evaluate user and have done one when returning method
Individual analysis.The user comment data of correlation are extracted in review tables, using semantic analysis, the sentimental polarity of comment text is obtained
(commendation, neutrality or derogatory sense) and subjectivity;
In the step S3, while consider the objective indicator that the index of user's autoscopia and trade company have been present, and
We obtain feature at semantic analysis, select 13 key characters of influence restaurant scoring:Comment cool, comment funny, comment
Useful, polarity, subjectivity, the alphabetical number of comment, comment do not weigh word number, restaurant review number, restaurant star, comment star and comment, use
Family bean vermicelli, the average star of user, user evaluate number;
In the step S4, the data in user, business, review are imported in database, SQL statement is used afterwards
Obtain 13 index summary sheets that we want.Then summary sheet is derived, and is randomly divided into 20 parts;
In the step S5, evaluation length and evaluation emotion value to having been extracted in 20 forms carry out fail-safe analysis,
Here we weigh the confidence level of data by Ke Lunbahe coefficient of reliabilities.Ke Lunbahe coefficient of reliability formula are:
In combination with F inspections, data are screened, removal interference data, it is to avoid mass data treatment is caused to model
Difficulty, obtain confidence level data higher as analysis data sample, when alpha coefficients be more than 0.5, the data set can
Lean on, into next step.Otherwise, S3 is gone to;
In the step S6, model is set up, and our this Rating Model is using star as dependent variable, comment cool, comment
Funny, comment useful, polarity, subjectivity, the alphabetical number of comment, comment do not weigh word number, restaurant review number, restaurant star, use
Family bean vermicelli, the average star of user, user evaluate number as independent variable.We are by general multiple linear regression model:
Y=β0+β1x1+β2x2+...+βpxp+ ε,
Wherein y is dependent variable, β0It is that P can be with accurate measurement and controllable independent variable.Dependent variable y is determined by two parts
It is fixed:A part is error term stochastic variable ε, and another part is the P linear function β of independent variable0+β1x1+β2x2+...+βpxp,
Wherein β0,β1,β2...,βpIt is P+1 unknown parameter, β0Referred to as regression constant, β1,β2,...,βpReferred to as partial regression coefficient, they
Determine dependent variable y and independent variable x1,x2,…,xpLinear relationship concrete form.ε is stochastic variable;
In the step S7, multiple linear regression treatment is carried out to model, R squares after adjustment is compared to R squares, more
The fitting degree of data can be reflected, general 60% is that fitting degree is very high.Positive negative correlation is judged using DW, DW formula are:
DW represents positive correlation less than 2, and negative correlation is represented more than 2, and DW statistics show that data do not exist sequence when being approximately equal to 2
Correlation, i.e., in the absence of shadowing property.Using the conspicuousness of T, the independent variable more than 0.05 thinks do not have a significant impact to model, its
He has a significant impact at independent variable to model.The independent variable too small for coefficient is not also accounted for, and obtains regression equation.Afterwards
Data can be visualized, can more intuitively find out the appropriate level of model.Standardized residual for example shown in Fig. 2 is straight
Fang Tu, residual error has the trend of normal distribution, illustrates that the regression model is rationally appropriate.Standardization prediction as shown in Figure 3
Value-standardized residual scatter diagram, the distribution of residual error is not distribution at random, is illustrated in the presence of certain optimization property.Such as Fig. 4 institutes
The normal state Q-Q figures for showing, matched curve is more close with actual curve, illustrates that degree of fitting is higher;
In the step S8, moving model carries out collinearity diagnostics, mainly sees VIF variance expansion factors, if VIF is less than 5
Then judge do not exist synteny between independent variable, if there is very strong synteny between two variables, can be by two changes
Amount is integrated into one, because the reflection of two independents variable is same content, synteny good general understands the computing of influence matrix.If VIF
There is synteny more than 5 models, it is necessary to synteny optimizes.Detect multicollinearity most straightforward procedure be computation model each
Coefficient correlation between variable, and significance test is carried out to each coefficient correlation.Here we are common using principal component analysis treatment
Linear problem.Principal component analysis is that, by the strong indicator polymerization of synteny an into index, dimensionality reduction simultaneously carries out factorial analysis.General choosing
One principal component of conduct of characteristic value more than 1 is taken, can just turn into a requirement for principal component according to more than 60%, only selection one
Individual principal component.Multiple linear regression is carried out again and analyzes corresponding index.Post analysis residual error, if residual error be unsatisfactory for require
Step S3 is transferred to, data are rearranged;
In the step S9, if meeting the requirement of above-mentioned steps, illustrate that the equation of linear regression model meets the data
Collection.Using the equation of linear regression for obtaining, in combination with user and restaurant information, it can be deduced that comment in the restaurant for not yet having star
Valency star.
As described above for of the invention in the restaurant score in predicting side based on multiple linear regression model of yelp food and drink platforms
The embodiment introduction of method, subjectivity and polarity that the feature and semantic analysis that present invention selection food and drink website provides are obtained, uses
Multiple linear regression model, final predicting the outcome is higher, has reached actually used requirement.It is only explanation for invention
Property, and it is nonrestrictive.Those skilled in the art understanding, can be to it in the spirit and scope that invention claim is limited
Many changes are carried out, is changed, in addition it is equivalent, but fall within protection scope of the present invention.
Claims (9)
1. a kind of restaurant star evaluation method based on linear regression, it is characterised in that:Comprise the following steps:
S1:Data are captured from food and drink website, and data are analyzed, finally obtain three tables of data of correlation, be respectively
User, business, review this three tables;
S2:The user comment data of correlation are extracted in review tables, the sentimental polarity and subjectivity of comment text is analyzed, it is described
Sentimental polarity includes commendation, neutral or derogatory sense;
S3:In the subjectivity and polarity that feature and semantic analysis that website provides are obtained, while considering user and restaurant to commenting
Divide the influence of prediction, select the characteristic variable for needing;
S4:Related tables of data is imported in database, the data set of the characteristic variable that we select is obtained with SQL statement, and
Data set is divided into several smaller data sets;
S5:For the data for obtaining, the Confidence Analysis of Cronbach equalization data are carried out, obtain confidence level number higher
According to the data sample as analysis, data set of the alpha coefficients more than predetermined threshold value is chosen, if turning in the absence of such data set
Move on to S3;
S6:Structural theory model, the relation between each independent variable and dependent variable that sets is linear, so as to set up multiple linear
Regression model, carries out multiple linear regression treatment and obtains data by instrument;
S7:Model is tested, an index is degree of fitting, set fit threshold as fitting degree is very high, second index
For DW is checked, metrics-thresholds are more than by T significant indexes, the index to setting is screened, obtain regression equation, otherwise such as
Fruit cannot obtain the model that we want, and be transferred to S3;
S8:Moving model, carries out collinearity diagnostics, checks VIF variance expansion factors, judges if VIF is less than threshold value from change
Do not exist synteny between amount, otherwise we need to carry out principal component analysis and process conllinear sex chromosome mosaicism, post analysis residual error, if residual
Difference is unsatisfactory for requirement and is transferred to S3;
S9:If meeting the requirement of above-mentioned steps, illustrate that the equation of linear regression model meets the data set, using the line for obtaining
Property regression equation, in combination with user and restaurant information, draws the evaluation star in the restaurant for not yet having star.
2. the restaurant star evaluation method of linear regression is based on as claimed in claim 1, it is characterised in that:The step S1
In, on UGC classes website, user can be scored and be evaluated by the experience of itself to trade company, and each user meeting after scoring
Comment is given, three data forms user, business, review are captured from food and drink website, User tables are user profile, bag
User's bean vermicelli number, the average star of user is included to comment and user's evaluation number information;Business tables are restaurant information, including restaurant review
Number, restaurant star information;Review tables be comment information, including comment cool, comment funny, comment useful, comment star comment
And evaluate text message.
3. the restaurant star evaluation method of linear regression is based on as claimed in claim 1 or 2, it is characterised in that:The step
In S2, user, using the adjective of description emotion, shows adverbial word, the punctuation mark of emotion intensity, by capturing this in comment
The emotion value that a series of key vocabularies can quantize included in comment.Because, it is necessary to numeric type data during with the Return Law,
Nominal type data are changed into two-value type data by needs.
4. the restaurant star evaluation method of linear regression is based on as claimed in claim 1 or 2, it is characterised in that:The step
In S3, while consider the objective indicator that the index of user's autoscopia and trade company have been present, and semantic analysis obtains feature,
13 key characters of selected influence restaurant scoring:Comment cool, comment funny, comment useful, polarity, subjectivity, comment
Alphabetical number, comment do not weigh word number, restaurant review number, restaurant star, comment star comment, user's bean vermicelli, the average star of user and use
Evaluate number in family.
5. the restaurant star evaluation method of linear regression is based on as claimed in claim 4, it is characterised in that:The step S4
In, the data in user, business, review are imported in database, obtain 13 that we want with SQL statement afterwards
One summary sheet of index;Then summary sheet is derived, and is randomly divided into 20 parts.
6. the restaurant star evaluation method of linear regression is based on as claimed in claim 5, it is characterised in that:The step S5
In, evaluation length and evaluation emotion value to having been extracted in 20 forms carry out fail-safe analysis, by Ke Lunbahe reliabilities system
Count to weigh the confidence level of data, Ke Lunbahe coefficient of reliability formula are:
In combination with F inspections, data are screened, obtain data sample of the confidence level data higher as analysis, when
Alpha coefficients are more than 0.5, the data set reliability, into next step;Otherwise, S3 is gone to.
7. the restaurant star evaluation method of linear regression is based on as claimed in claim 6, it is characterised in that:The step S6
In, Rating Model is using star as dependent variable, comment cool, comment funny, comment useful, polarity, subjectivity, comment word
Female number, comment do not weigh word number, restaurant review number, restaurant star, user's bean vermicelli, the average star of user, user's evaluation number conduct
Independent variable;By general multiple linear regression model:
Y=β0+β1x1+β2x2+…+βpxp+ ε,
Wherein y is dependent variable, β0Being P can be determined with accurate measurement and controllable independent variable, dependent variable y by two parts:One
It is error term stochastic variable ε to divide, and another part is the P linear function β of independent variable0+β1x1+β2x2+...+βpxp, wherein β0,
β1,β2…,βpIt is P+1 unknown parameter, β0Referred to as regression constant, β1,β2,...,βpReferred to as partial regression coefficient, they are determined
Dependent variable y and independent variable x1,x2,…,xpLinear relationship concrete form, ε is stochastic variable.
8. the restaurant star evaluation method of linear regression is based on as claimed in claim 7, it is characterised in that:The step S7
In, multiple linear regression treatment is carried out to model, 60% is that fitting degree is very high;Positive negative correlation, DW formula are judged using DW
For:
DW represents positive correlation less than 2, and negative correlation is represented more than 2, and DW statistics show that data are related in the absence of sequence when being equal to 2,
Do not exist shadowing property;
Using the conspicuousness of T, the independent variable more than 0.05 thinks do not have a significant impact to model, and other independents variable have to model
Conspicuousness influences, and obtains regression equation.
9. the restaurant star evaluation method of linear regression is based on as claimed in claim 8, it is characterised in that:The step S8
In, moving model carries out collinearity diagnostics, checks VIF variance expansion factors, if VIF be less than 5 if judge independent variable between do not deposit
In synteny;If VIF be more than 5 if model there is synteny, it is necessary to synteny optimize, will two variables be integrated into one;Inspection
The method for surveying multicollinearity is the coefficient correlation between each independent variable of computation model, and carries out conspicuousness inspection to each coefficient correlation
Test, conllinear sex chromosome mosaicism, one principal component of conduct of selected characteristic value more than 1, according to more than 60% are processed using principal component analysis
Just turn into a requirement for principal component, only select a principal component, multiple linear regression is carried out again and is analyzed accordingly to refer to
Mark, post analysis residual error, if residual error is unsatisfactory for requirement and is transferred to step S3, rearrange data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611071151.1A CN106776868A (en) | 2016-11-29 | 2016-11-29 | A kind of restaurant score in predicting method based on multiple linear regression model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611071151.1A CN106776868A (en) | 2016-11-29 | 2016-11-29 | A kind of restaurant score in predicting method based on multiple linear regression model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106776868A true CN106776868A (en) | 2017-05-31 |
Family
ID=58905290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611071151.1A Pending CN106776868A (en) | 2016-11-29 | 2016-11-29 | A kind of restaurant score in predicting method based on multiple linear regression model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106776868A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399158A (en) * | 2018-02-05 | 2018-08-14 | 华南理工大学 | Attribute sensibility classification method based on dependency tree and attention mechanism |
CN109408772A (en) * | 2018-10-11 | 2019-03-01 | 四川长虹电器股份有限公司 | To the restoration methods of the abnormal data in continuity data |
CN109857983A (en) * | 2018-12-29 | 2019-06-07 | 河南工程学院 | A kind of food and drink venue temperature analysis method towards super-large city |
CN110019790A (en) * | 2017-10-09 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Text identification, text monitoring, data object identification, data processing method |
CN110189159A (en) * | 2019-04-19 | 2019-08-30 | 上海拉扎斯信息科技有限公司 | Data assessment method, apparatus, electronic equipment and storage medium |
CN110490650A (en) * | 2019-08-14 | 2019-11-22 | 浙江大搜车软件技术有限公司 | Merchant information processing method, device, computer equipment and storage medium |
CN110598179A (en) * | 2019-08-19 | 2019-12-20 | 国网新源控股有限公司 | Method for setting threshold value of pumping and storage unit sensor based on multiple regression analysis |
CN112101770A (en) * | 2020-09-09 | 2020-12-18 | 中国联合网络通信集团有限公司 | Audit quality model generation method and device and audit quality prediction method |
CN112418571A (en) * | 2019-08-20 | 2021-02-26 | 华为技术有限公司 | Method and device for enterprise environmental protection comprehensive evaluation |
CN112434262A (en) * | 2020-11-22 | 2021-03-02 | 同济大学 | Waterfront public space activity influence factor identification method and terminal |
CN113052440A (en) * | 2021-03-09 | 2021-06-29 | 北京光速斑马数据科技有限公司 | Method and device for evaluating business service based on customer evaluation |
CN113240425A (en) * | 2021-04-27 | 2021-08-10 | 湖南大学 | Financial anti-money laundering transaction method, device and storage medium based on deep learning |
CN113689144A (en) * | 2020-09-11 | 2021-11-23 | 北京沃东天骏信息技术有限公司 | Quality assessment system and method for product description |
CN117094856A (en) * | 2023-08-24 | 2023-11-21 | 哈尔滨工业大学 | Prediction method for user evaluation behavior after embedding OTA website based on panel logic model |
-
2016
- 2016-11-29 CN CN201611071151.1A patent/CN106776868A/en active Pending
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019790A (en) * | 2017-10-09 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Text identification, text monitoring, data object identification, data processing method |
CN110019790B (en) * | 2017-10-09 | 2023-08-22 | 阿里巴巴集团控股有限公司 | Text recognition, text monitoring, data object recognition and data processing method |
CN108399158B (en) * | 2018-02-05 | 2021-05-14 | 华南理工大学 | Attribute emotion classification method based on dependency tree and attention mechanism |
CN108399158A (en) * | 2018-02-05 | 2018-08-14 | 华南理工大学 | Attribute sensibility classification method based on dependency tree and attention mechanism |
CN109408772A (en) * | 2018-10-11 | 2019-03-01 | 四川长虹电器股份有限公司 | To the restoration methods of the abnormal data in continuity data |
CN109857983A (en) * | 2018-12-29 | 2019-06-07 | 河南工程学院 | A kind of food and drink venue temperature analysis method towards super-large city |
CN109857983B (en) * | 2018-12-29 | 2022-09-30 | 河南工程学院 | Catering venue heat degree analysis method for super-large cities |
CN110189159A (en) * | 2019-04-19 | 2019-08-30 | 上海拉扎斯信息科技有限公司 | Data assessment method, apparatus, electronic equipment and storage medium |
CN110490650A (en) * | 2019-08-14 | 2019-11-22 | 浙江大搜车软件技术有限公司 | Merchant information processing method, device, computer equipment and storage medium |
CN110598179A (en) * | 2019-08-19 | 2019-12-20 | 国网新源控股有限公司 | Method for setting threshold value of pumping and storage unit sensor based on multiple regression analysis |
CN112418571A (en) * | 2019-08-20 | 2021-02-26 | 华为技术有限公司 | Method and device for enterprise environmental protection comprehensive evaluation |
CN112101770A (en) * | 2020-09-09 | 2020-12-18 | 中国联合网络通信集团有限公司 | Audit quality model generation method and device and audit quality prediction method |
CN112101770B (en) * | 2020-09-09 | 2023-07-18 | 中国联合网络通信集团有限公司 | Audit quality model generation method and device and audit quality prediction method |
CN113689144A (en) * | 2020-09-11 | 2021-11-23 | 北京沃东天骏信息技术有限公司 | Quality assessment system and method for product description |
CN112434262A (en) * | 2020-11-22 | 2021-03-02 | 同济大学 | Waterfront public space activity influence factor identification method and terminal |
CN113052440A (en) * | 2021-03-09 | 2021-06-29 | 北京光速斑马数据科技有限公司 | Method and device for evaluating business service based on customer evaluation |
CN113052440B (en) * | 2021-03-09 | 2024-04-26 | 北京光速斑马数据科技有限公司 | Method and device for evaluating business service based on customer evaluation |
CN113240425A (en) * | 2021-04-27 | 2021-08-10 | 湖南大学 | Financial anti-money laundering transaction method, device and storage medium based on deep learning |
CN117094856A (en) * | 2023-08-24 | 2023-11-21 | 哈尔滨工业大学 | Prediction method for user evaluation behavior after embedding OTA website based on panel logic model |
CN117094856B (en) * | 2023-08-24 | 2024-04-30 | 哈尔滨工业大学 | Prediction method for user evaluation behavior after embedding OTA website based on panel logic model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106776868A (en) | A kind of restaurant score in predicting method based on multiple linear regression model | |
US11164075B2 (en) | Evaluation method and apparatus based on text analysis, and storage medium | |
CN109783632B (en) | Customer service information pushing method and device, computer equipment and storage medium | |
WO2017143919A1 (en) | Method and apparatus for establishing data identification model | |
CN109460473B (en) | Electronic medical record multi-label classification method based on symptom extraction and feature representation | |
WO2018086470A1 (en) | Keyword extraction method and device, and server | |
Sehgal et al. | Sops: stock prediction using web sentiment | |
CN109189767B (en) | Data processing method and device, electronic equipment and storage medium | |
US20120221602A1 (en) | Method and apparatus for word quality mining and evaluating | |
CN112700325A (en) | Method for predicting online credit return customers based on Stacking ensemble learning | |
US10387805B2 (en) | System and method for ranking news feeds | |
CN109710766B (en) | Complaint tendency analysis early warning method and device for work order data | |
CN107357763B (en) | Crowdsourcing classification data quality control method based on self-walking learning | |
CN106776672A (en) | Technology development grain figure determines method | |
CN104866558A (en) | Training method of social networking account mapping model, mapping method and system | |
CN112732910B (en) | Cross-task text emotion state evaluation method, system, device and medium | |
CN109325125B (en) | Social network rumor detection method based on CNN optimization | |
CN116151485B (en) | Method and system for predicting inverse facts and evaluating effects | |
CN115359799A (en) | Speech recognition method, training method, device, electronic equipment and storage medium | |
CN116756688A (en) | Public opinion risk discovery method based on multi-mode fusion algorithm | |
Garlapati et al. | Classification of Toxicity in Comments using NLP and LSTM | |
Baumgärtner et al. | Whatever it takes to understand a central banker: Embedding their words using neural networks | |
CN107480126B (en) | Intelligent identification method for engineering material category | |
CN111598691B (en) | Method, system and device for evaluating default risk of credit/debt main body | |
CN112131354A (en) | Answer screening method and device, terminal equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |