CN106708868A - Method and system for analyzing internet data - Google Patents

Method and system for analyzing internet data Download PDF

Info

Publication number
CN106708868A
CN106708868A CN201510784361.4A CN201510784361A CN106708868A CN 106708868 A CN106708868 A CN 106708868A CN 201510784361 A CN201510784361 A CN 201510784361A CN 106708868 A CN106708868 A CN 106708868A
Authority
CN
China
Prior art keywords
product
comment
attribute
weighted value
eigenvalue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510784361.4A
Other languages
Chinese (zh)
Other versions
CN106708868B (en
Inventor
何子琳
刘彦
齐佳音
傅湘玲
张镇平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Beijing Co Ltd
Beijing University of Posts and Telecommunications
Original Assignee
China Mobile Group Beijing Co Ltd
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Beijing Co Ltd, Beijing University of Posts and Telecommunications filed Critical China Mobile Group Beijing Co Ltd
Priority to CN201510784361.4A priority Critical patent/CN106708868B/en
Publication of CN106708868A publication Critical patent/CN106708868A/en
Application granted granted Critical
Publication of CN106708868B publication Critical patent/CN106708868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for analyzing internet data. The method includes acquiring attributes of products on the internet and comments corresponding to the products; determining first weight values corresponding to each comment for each product according to concern degree information of the comment corresponding to the product; determining second weight values of each attribute of each product according to results obtained by sediment classification on the comments corresponding to the attribute of the product; combining the first weight values corresponding to each comment and the second weight values of each attribute of each product with one another and determining data analysis results related to the comments on the products. The method and the system have the advantage that the problem of inaccurate analysis results due to the fact that comment data are analyzed by the aid of existing star rating methods only according to average values can be solved by the aid of the method and the system.

Description

A kind of internet data analysis method and system
Technical field
The present invention relates to communication technical field, more particularly to a kind of internet data analysis method and system.
Background technology
In today that electronic information is developed rapidly, all kinds of e-commerce websites are provided one after another delivers the flat of online comment Platform, e-commerce website generally uses with Star rating the consumer couple to be generally shown in embodied in line comment out The evaluation of overall product or different attribute, then calculates the average value of the Star rating of all comments of the product again, based on flat Average represents the on-line evaluation result of the product.
It can be seen that, represent that consumer have ignored to the on-line evaluation result of the product above by the average value of Star rating Information and the useful sex differernce of different comments that the content of text of comment is included, and lead in the online comment text of product Multiple product attribute can be often referred to, only sees that overall Star rating does not know that evaluation of the consumer to different product attribute, and disappeared There is heterogeneity in the person of expense, that is, the product attribute paid attention to is different, is commented only in accordance with current average star to the preference of product attribute Point, consumer is not easy to according to its preference quickly to select product, also it is not easy to manufacturer changing according to this result guide product Enter direction.
To sum up, only in accordance with mean value feedback comment data there is analysis result inaccurate in existing Star rating method Problem.
The content of the invention
The embodiment of the present invention provides a kind of internet data analysis method and device, is used to solve existing Star rating side Method has that analysis result is inaccurate only in accordance with mean value feedback comment data.
The inventive method includes a kind of internet data analysis method, and the method includes:Obtain the product on internet Attribute and the corresponding comment of the product;For a product, according to the degree of concern of the corresponding every comment of the product Information, determines corresponding first weighted value of every comment;And according to the corresponding comment of each attribute to the product The result that emotional semantic classification is obtained is carried out, the second weighted value of each attribute of the product is determined;It is right with reference to described every comment The first weighted value and the second weighted value of each attribute of the product answered, it is determined that the data of the comment on the product point Analysis result.
Based on same inventive concept, the embodiment of the present invention further provides for a kind of internet data analysis system, and this is System includes:Acquiring unit, attribute and the corresponding comment of the product for obtaining the product on internet;First determines list Unit, for for a product, according to the degree of concern information of the corresponding every comment of the product, determines every comment Corresponding first weighted value;And the knot that emotional semantic classification is obtained is carried out according to the corresponding comment of each attribute to the product Really, the second weighted value of each attribute of the product is determined;Second determining unit, it is corresponding for combining described every comment Second weighted value of each attribute of the first weighted value and the product, it is determined that the data analysis knot of the comment on the product Really.
The embodiment of the present invention by obtaining attribute and the corresponding comment of the product of the product on internet, for one Individual product, on the one hand according to the degree of concern information of the corresponding every comment of the product, determines that every comment is corresponding First weighted value, on the other hand carries out the result that emotional semantic classification is obtained according to the corresponding comment of each attribute to the product, Determine the second weighted value of each attribute of the product.Finally, corresponding first weighted value and institute are commented on reference to described every The second weighted value of each attribute of product is stated, it is determined that the data results of the comment on the product.It can be seen that, the present invention Different comments are assigned the weight of different stage by embodiment, and comment text content is parsed, and draw the every of different product The weight of individual attribute, further combines above-mentioned two weight on the basis of existing comment data so that comment number According to analysis result it is more accurate, the selection or manufacturer that are conducive to instructing consumer products carry out the improvement of product.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will make needed for embodiment description Accompanying drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these accompanying drawings His accompanying drawing.
A kind of Fig. 1 internet data analysis method schematic flow sheets for the embodiment of the present invention is provided;
A kind of Fig. 2 vector machine model schematic diagrames for the embodiment of the present invention is provided;
A kind of Fig. 3 internet data analysis system configuration diagrams for the embodiment of the present invention is provided.
Specific embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into One step ground is described in detail, it is clear that described embodiment is only some embodiments of the invention, rather than whole implementation Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made All other embodiment, belongs to the scope of protection of the invention.
Shown in Figure 1, the embodiment of the present invention provides a kind of internet data analysis method schematic flow sheet, specifically real Existing method includes:
Step S101, the attribute of the product on acquisition internet and the corresponding comment of the product.
Step S102, for a product, according to the degree of concern information of the corresponding every comment of the product, determines institute State every and comment on corresponding first weighted value;And emotional semantic classification is carried out according to the corresponding comment of each attribute to the product The result for obtaining, determines the second weighted value of each attribute of the product.
Step S103, with reference to described every the second power of each attribute for commenting on corresponding first weighted value and the product Weight values, it is determined that the data results of the comment on the product.
For example, internet electric business sells the commodity on line by the network platform, and consumer can be directed to the order of oneself Make comments, comment content would generally be related to many aspects such as quality, size, the logistics of product, and final electric business is according to consumer Star rating result draw the order be favorable comment or difference comment two kinds of results.In step S101, embodiment of the present invention master The attribute of the dynamic all product types obtained in data to be analyzed and each product, such as mobile phone relates to price, type Number, battery standby ability etc., and all orders of the product comment data.
Comment content in view of some orders is more detailed, and the reference value to other users is very big, then Jiu Yaokao Consider and assign certain weight to this comment, for example, consumer Lee has delivered one for certain Mobile phone order on the net Bar is commented on, and the content of comment is related to the multi-aspect informations such as the experience on probation of the mobile phone, mobile phone cost performance, mobile phone radiation power, and And the picture also mixed.Therefore this comment of consumer Lee just has reference value very much, and other consumers see this Bar comment will feel fine, can comment on thumb up to this, in order to the concern journey such as the data of these thumb ups and the sum of comment Degree information also serves as the Consideration of the evaluation result of the product, therefore the embodiment of the present invention proposes the first weighted value.Specifically Ground, for a product, the degree of concern information comments on total and every support fraction of comment including the product;Institute State the first weighted value and meet following equation:
... ... ... .. formula 1
Wherein,Represent i-th the first weighted value of comment, HVs (vi) represent i-th support fraction of comment, p tables Show the comment sum of the product, λ represents i-th weighted factor of comment, and the usual values of λ are 1.
Therefore, after being commented on using above-mentioned formula imparting weight for every, a row matrix of 1 × p can be obtained
It is of course also possible to directly according to the support fraction and the corresponding relation of the first weight of the product, determine described every Bar comments on corresponding first weight.
In addition, the comment serviceability delivered of different user gradations difference, therefore can also be according to the grade of user Certain weighted value is assigned, the comment serviceability that different user is delivered can be thus distinguished.
Based on would generally be related to multiple attributes of the product in the online comment for product, and to the product in comment One attribute of product is made while certainly, may be with thumb down to another attribute of the product, therefore simply sees the product The Star rating result of product is that to comment be inappropriate for favorable comment or difference.Therefore the embodiment of the present invention is according further to the product The comment Evaluations matrix of the generation on the product of each attribute of corresponding attribute and the product.
Specifically, using the product attribute referred in the method extracting comment of text mining, if A=(A1,A2,…,An) be Product attribute collection, then using semantic analysis, according to comment star, three-star at that time representative above is favorable comment, and a star is represented Difference is commented, and remaining is qualitative for neutrality is commented on, therefore the Sentiment orientation of comment is divided into three ranks:It is front, neutrality, negative, successively Represented with 1,0, -1, can so obtain the attribute evaluation matrix of certain a product, as shown in Table 1.
Table one:
From table one, the part attribute on product is not referred in some comment texts, therefore the part belongs to There is null value in above-mentioned Evaluations matrix table in property.For the ease of subsequently carrying out matrix operation based on other factorses, the present invention is implemented Example further fills the null value using default value.Specifically, if the part attribute of the product is not commented on, according to institute The default evaluation of estimate of comment of product is stated, and using institute's evaluation values as the default value without the part attribute commented on, So that the comment evaluation square of the generation on the product of each attribute according to the corresponding attribute of the product and the product Battle array.
For example, the filling mode of null value is:The Star rating (1 star~5 star) that this comment is given is mapped to (- 1~1) In the range of after, then mapping value is inserted all missing values of same a line using mapping function.Mapping function is:
... ... ... formula 2
Wherein, Score represents the mapping value of Star rating, and Rating represents former Star rating.It is so treated, you can To complete product attribute Evaluations matrix.
For the otherness of the further different attribute of highlight products, the embodiment of the present invention continues to be based on above-mentioned comment square Battle array assigns different weighted values to the different attribute of each product.Specifically, for an attribute, according to the product The result of the emotional semantic classification of the corresponding comment of each attribute, determines that the first of corresponding each attribute of every comment of the product is special Value indicative;
The second feature of corresponding each attribute of every comment for determining to be evaluated for front according to the First Eigenvalue Value;And the second feature of corresponding each attribute of every comment for unfavorable ratings is determined according to the First Eigenvalue Value;
The second weighted value of each attribute of the product is determined according to the Second Eigenvalue.
Above-mentioned second weighted value be related to Fig. 2 during actual determination in supporting vector machine model and weight point Analysis model, specifically the determination process of the second weighted value is as follows:
Step one:By text mining and the work of semantic analysis, we have respectively obtained user and the entirety of product have been beaten Divide yiThe marking x of (star rating of comment) and user to each product attributeij, the relation for the existing between the two such as institute of formula 3 Show:
... ... .. formula 3
Wherein, yiRepresent entirety marking of the user to product;wjRepresent the weight of each attribute of product;xijRepresent user Marking to each product attribute.
Step 2:It is two classification (i.e.+1, -1) based on the vector machine model in Fig. 2, and it is many that existing data can be Type, for example, be divided into five classes, if five classification are converted into two classification by us, transformation rule is shown in formula 4.
if yi=yj (xi-xj, -1) and y=-1 ... ... formula 4
The preceding Evaluations matrix of conversion is as shown in Table 2.
Table two:
Conversion post-evaluation matrix is as shown in Table 3.
Table three:
Step 3:Based on above-mentioned relation formula, w is solvedj, when weights omega is solved, we establish weight analysis mould Type, model is improved last foundation according to above-mentioned formula 3 with reference to the vector machine model and algorithm of support vector machine in Fig. 2 Form.
Wherein it is possible to every comment is regarded as a sample, y can be used as the label of sample class, and x is as sample at each Value in dimension.Weight analysis model algorithm is as follows:
ω≥0
ξi>=0 ... ... ... formula 5
Wherein, 1/C represents penalty coefficient (equivalent to C is multiplied by before slack variable), it is to avoid outlier is excessive;ξiRepresent pine Relaxation variable, makes equation have feasible solution;ω represents the weighted value column vector of product feature, ω >=0;xiRepresent sample in each dimension On value,Represent;yiRepresent the label of sample class.
Algorithm of support vector machine algorithm of support vector machine can be used to solve the classification problem of different samples, ask for different classes of The largest interval of sample, it is ensured that the classification results of sample are the most accurate, wx+b=0 is decision function.W in formula is used as this hair Bright embodiment the second weighted value to be asked.Algorithm of support vector machine formula is as follows:
s.t.:y(ωTxi+ b) >=1, i=1 ..., n ... ... ... ... .. formula 6
Wherein, ω represents the column vector of weighted value, ω >=0;ξiRepresent slack variable;B represents constant term;Y represents sample Class label;
To sum up, after determining the first weighted value, the second weighted value, Evaluations matrix by the above method, according to the institute State the result of product of the second weight of each attribute of Evaluations matrix, every corresponding first weight of comment and the product The corresponding overall evaluation result of the product can be determined.Each belongs to will be seen that product based on this overall evaluation result manufacturer Property can find out the attribute that most of consumer compares concern, then at this to the influence degree of overall product overall merit The research and development of a little attributes and improve and put into more resources, to better meet consumer the need for.Further, it is also possible to directly survey The overall merit of each attribute is spent, this is easy to find out the short slab of influence overall product overall merit, then targetedly carries out Improve and manage, more significantly effect can be obtained.
Based on identical technology design, the embodiment of the present invention also provides a kind of internet data analysis system, the internet The executable above method embodiment of data analysis system.Internet data analysis system such as Fig. 3 institutes provided in an embodiment of the present invention Show, including:Acquiring unit 301, the first determining unit 302, the second determining unit 303.Wherein:
Acquiring unit 301, attribute and the corresponding comment of the product for obtaining the product on internet;
First determining unit 302, for for a product, according to the degree of concern of the corresponding every comment of the product Information, determines corresponding first weighted value of every comment;And according to the corresponding comment of each attribute to the product The result that emotional semantic classification is obtained is carried out, the second weighted value of each attribute of the product is determined;
Second determining unit 303, for each with reference to described every corresponding first weighted value of comment and the product Second weighted value of attribute, it is determined that the data results of the comment on the product.
For example, internet electric business sells the commodity on line by the network platform, and consumer can be directed to the order of oneself Make comments, comment content would generally be related to many aspects such as quality, size, the logistics of product, and final electric business is according to consumer Star rating result draw the order be favorable comment or difference comment two kinds of results.In step S101, embodiment of the present invention master The attribute of the dynamic all product types obtained in data to be analyzed and each product, such as mobile phone relates to price, type Number, battery standby ability etc., and all orders of the product comment data.
Comment content in view of some orders is more detailed, and the reference value to other users is very big, then Jiu Yaokao Consider and assign certain weight to this comment, for example, consumer Lee has delivered one for certain Mobile phone order on the net Bar is commented on, and the content of comment is related to the multi-aspect informations such as the experience on probation of the mobile phone, mobile phone cost performance, mobile phone radiation power, and And the picture also mixed.Therefore this comment of consumer Lee just has reference value very much, and other consumers see this Bar comment will feel fine, can comment on thumb up to this, in order to the concern journey such as the data of these thumb ups and the sum of comment Degree information also serves as the Consideration of the evaluation result of the product, therefore the embodiment of the present invention proposes the first weighted value.Specifically Ground, for a product, the degree of concern information comments on total and every support fraction of comment including the product;Institute State the first weighted value and meet formula 1, the particular content of formula 1 is repeated no more as described in above-mentioned method.
It is of course also possible to directly according to the support fraction and the corresponding relation of the first weight of the product, determine described every Bar comments on corresponding first weight.
In addition, the comment serviceability delivered of different user gradations difference, therefore can also be according to the grade of user Certain weighted value is assigned, the comment serviceability that different user is delivered can be thus distinguished.
Based on would generally be related to multiple attributes of the product in the online comment for product, and to the product in comment One attribute of product is made while certainly, may be with thumb down to another attribute of the product, therefore simply sees the product The Star rating result of product is that to comment be inappropriate for favorable comment or difference.Therefore the embodiment of the present invention further Utilization assessment matrix Generation unit generates Evaluations matrix.The Evaluations matrix generation unit 304, for according to the corresponding attribute of the product and described The comment Evaluations matrix of the generation on the product of each attribute of product.
Specifically, using the product attribute referred in the method extracting comment of text mining, if A=(A1,A2,…,An) be Product attribute collection, then using semantic analysis, according to comment star, three-star at that time representative above is favorable comment, and a star is represented Difference is commented, and remaining is qualitative for neutrality is commented on, therefore the Sentiment orientation of comment is divided into three ranks:It is front, neutrality, negative, successively Represented with 1,0, -1, can so obtain the attribute evaluation matrix of certain a product, as shown in Table 1.
From table one, the part attribute on product is not referred in some comment texts, therefore the part belongs to There is null value in above-mentioned Evaluations matrix table in property.For the ease of subsequently carrying out matrix operation based on other factorses, the present invention is implemented Example further fills the null value using default value.Specifically, the Evaluations matrix generation unit 304 specifically for:If institute The part attribute for stating product is not commented on, then the default evaluation of estimate of the comment according to the product, and institute's evaluation values are made It is the default value of the part attribute without comment, so that according to each category of the corresponding attribute of the product and the product Property comment generation the Evaluations matrix on the product.
For example, the filling mode of null value is:The Star rating (1 star~5 star) that this comment is given is mapped to (- 1~1) In the range of after, then mapping value is inserted all missing values of same a line using mapping function.The mapping function such as institute of formula 2 State, repeat no more.
For the otherness of the further different attribute of highlight products, the embodiment of the present invention continues to be based on above-mentioned comment square Battle array assigns different weighted values to the different attribute of each product.First determining unit specifically for:For an attribute, root According to the result of the emotional semantic classification of the corresponding comment of each attribute to the product, determine that every comment of the product is corresponding every The First Eigenvalue of individual attribute;
The second feature of corresponding each attribute of every comment for determining to be evaluated for front according to the First Eigenvalue Value;And the second feature of corresponding each attribute of every comment for unfavorable ratings is determined according to the First Eigenvalue Value;
The second weighted value of each attribute of the product is determined according to the Second Eigenvalue.
Above-mentioned second weighted value be related to Fig. 2 during actual determination in supporting vector machine model and weight point Analysis model, specifically the determination process of the second weighted value is as follows:
Step one:By text mining and the work of semantic analysis, we have respectively obtained user and the entirety of product have been beaten Divide yiThe marking x of (star rating of comment) and user to each product attributeij, the relation for the existing between the two such as institute of formula 3 Show.
Step 2:It is two classification (i.e.+1, -1) based on the vector machine model in Fig. 2, and it is many that existing data can be Type, for example, be divided into five classes, if five classification are converted into two classification by us, transformation rule is shown in formula 4.
Step 3:Based on above-mentioned relation formula, w is solvedj, when weights omega is solved, we establish weight analysis mould Type, model is improved last foundation according to above-mentioned formula 3 with reference to the vector machine model and algorithm of support vector machine in Fig. 2 Form.
Wherein it is possible to every comment is regarded as a sample, y can be used as the label of sample class, and x is as sample at each Value in dimension.Weight analysis model algorithm is as shown in Equation 5.Algorithm of support vector machine algorithm of support vector machine can be used to solve Never with the classification problem of sample, the largest interval of different classes of sample is asked for, it is ensured that the classification results of sample are the most accurate, wx + b=0 is decision function.W in formula is used as the embodiment of the present invention the second weighted value to be asked.Algorithm of support vector machine formula As shown in Equation 6.
To sum up, after determining the first weighted value, the second weighted value, Evaluations matrix by the above method, according to the institute State the result of product of the second weight of each attribute of Evaluations matrix, every corresponding first weight of comment and the product The corresponding overall evaluation result of the product can be determined.It can be seen that, different comments are assigned different stage by the embodiment of the present invention Weight, and comment text content is parsed, the weight of each attribute of different product is drawn, in existing comment data On the basis of further combine above-mentioned two weight so that the analysis result of comment data is more accurate, be conducive to improve The precision of marketing, manufacturer using the analysis result can quick positioning product prominent attributive character, then in formulation marketing When tactful, can targetedly strengthen publicity, can so strengthen impression of these attributive character in the minds of consumer, make The core competitiveness of product, so, the consumer that these attributes are paid attention to just will focus more on the product, so as to improve product Sales volume.It can be seen that, the present embodiments relate to method come from the thought that people-oriented, take into full account the demand of user, with meet User's request is important goal.Related electric business website can it is existing by popularity, sales volume, price ranking on the basis of, profit Increase the ranking of each attribute with the embodiment of the present invention, so, the different consumer of preference can select what is valued according to oneself The overall merit of attribute scans for ranking, without speculating that product is each after the content of text for going to browse numerous online comments again The substantially evaluation of individual attribute, significantly reduces the time cost of search and the risk of transaction.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without deviating from essence of the invention to the present invention God and scope.So, if these modifications of the invention and modification belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising these changes and modification.

Claims (10)

1. a kind of internet data analysis method, it is characterised in that the method includes:
The attribute of the product on acquisition internet and the corresponding comment of the product;
For a product, according to the degree of concern information of the corresponding every comment of the product, determine that every comment is right The first weighted value answered;And the result that emotional semantic classification is obtained is carried out according to the corresponding comment of each attribute to the product, Determine the second weighted value of each attribute of the product;
With reference to described every the second weighted value of each attribute for commenting on corresponding first weighted value and the product, it is determined that on The data results of the comment of the product.
2. the method for claim 1, it is characterised in that for a product, the degree of concern information includes described Product comment it is total and every comment support fraction;
First weighted value meets following equation:
Formula one: ω R i = λ + H V s ( v i ) p + Σ i = 1 p H V s ( v i )
Wherein,Represent i-th the first weighted value of comment, HVs (vi) i-th support fraction of comment is represented, p represents described The comment sum of product, λ represents i-th weighted factor of comment.
3. the method for claim 1, it is characterised in that the attribute and the product of the product on the acquisition internet After the corresponding comment of product, before corresponding first weighted value is commented in described every of the determination, including:
The comment evaluation square of the generation on the product of each attribute according to the corresponding attribute of the product and the product Battle array.
4. method as claimed in claim 3, it is characterised in that also include:
If the part attribute of the product is not commented on, evaluation of estimate is preset in the comment according to the product, and will be described Evaluation of estimate as it is described without comment part attribute default value so that according to the corresponding attribute of the product and the product Each attribute comment generation the Evaluations matrix on the product.
5. the method for claim 1, it is characterised in that the second weight of each attribute of the determination product Value, including:
For an attribute, the result of the emotional semantic classification according to the corresponding comment of each attribute to the product, it is determined that described The First Eigenvalue of corresponding each attribute of every comment of product;
The Second Eigenvalue of corresponding each attribute of every comment for determining to be evaluated for front according to the First Eigenvalue;With And the Second Eigenvalue of corresponding each attribute of every comment for unfavorable ratings is determined according to the First Eigenvalue;
The second weighted value of each attribute of the product is determined according to the Second Eigenvalue.
6. a kind of internet data analysis system, it is characterised in that the system includes:
Acquiring unit, attribute and the corresponding comment of the product for obtaining the product on internet;
First determining unit, for for a product, according to the degree of concern information of the corresponding every comment of the product, really Fixed described every is commented on corresponding first weighted value;And emotion is carried out according to the corresponding comment of each attribute to the product The result that classification is obtained, determines the second weighted value of each attribute of the product;
Second determining unit, for combining described every each attribute for commenting on corresponding first weighted value and the product the Two weighted values, it is determined that the data results of the comment on the product.
7. system as claimed in claim 6, it is characterised in that for a product, the degree of concern information includes described Product comment it is total and every comment support fraction;
First weighted value meets following equation:
Formula one: ω R i = λ + H V s ( v i ) p + Σ i = 1 p H V s ( v i )
Wherein,Represent i-th the first weighted value of comment, HVs (vi) i-th support fraction of comment is represented, p represents described The comment sum of product, λ represents i-th weighted factor of comment.
8. system as claimed in claim 6, it is characterised in that also include:
Evaluations matrix generation unit, for being generated according to the comment of the corresponding attribute of the product and each attribute of the product Evaluations matrix on the product.
9. system as claimed in claim 8, it is characterised in that the Evaluations matrix generation unit specifically for:
If the part attribute of the product is not commented on, evaluation of estimate is preset in the comment according to the product, and will be described Evaluation of estimate as it is described without comment part attribute default value so that according to the corresponding attribute of the product and the product Each attribute comment generation the Evaluations matrix on the product.
10. system as claimed in claim 6, it is characterised in that the first determining unit specifically for:
For an attribute, the result of the emotional semantic classification according to the corresponding comment of each attribute to the product, it is determined that described The First Eigenvalue of corresponding each attribute of every comment of product;
The Second Eigenvalue of corresponding each attribute of every comment for determining to be evaluated for front according to the First Eigenvalue;With And the Second Eigenvalue of corresponding each attribute of every comment for unfavorable ratings is determined according to the First Eigenvalue;
The second weighted value of each attribute of the product is determined according to the Second Eigenvalue.
CN201510784361.4A 2015-11-16 2015-11-16 Internet data analysis method and system Active CN106708868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510784361.4A CN106708868B (en) 2015-11-16 2015-11-16 Internet data analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510784361.4A CN106708868B (en) 2015-11-16 2015-11-16 Internet data analysis method and system

Publications (2)

Publication Number Publication Date
CN106708868A true CN106708868A (en) 2017-05-24
CN106708868B CN106708868B (en) 2020-02-21

Family

ID=58931580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510784361.4A Active CN106708868B (en) 2015-11-16 2015-11-16 Internet data analysis method and system

Country Status (1)

Country Link
CN (1) CN106708868B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909401A (en) * 2017-11-14 2018-04-13 阮敬 A kind of satisfaction measuring method based on big data technology
CN108595562A (en) * 2018-04-12 2018-09-28 西安邮电大学 User's evaluation data analysing method based on accurate sex determination
CN109284373A (en) * 2018-09-06 2019-01-29 合肥工业大学 The acquisition methods and device of product up-gradation strategy based on text mining driving
CN109376888A (en) * 2018-10-09 2019-02-22 长安大学 A kind of Forum on College Eating-room management system and management method based on cell phone application
CN110837739A (en) * 2019-10-24 2020-02-25 支付宝(杭州)信息技术有限公司 Service processing method and device and electronic equipment
CN111767725A (en) * 2020-06-24 2020-10-13 中国平安财产保险股份有限公司 Data processing method and device based on emotion polarity analysis model
CN112559685A (en) * 2020-12-11 2021-03-26 芜湖汽车前瞻技术研究院有限公司 Automobile forum spam comment identification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN103914783A (en) * 2014-04-13 2014-07-09 北京工业大学 E-commerce website recommending method based on similarity of users
CN104156390A (en) * 2014-07-07 2014-11-19 乐视网信息技术(北京)股份有限公司 Comment recommendation method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN103914783A (en) * 2014-04-13 2014-07-09 北京工业大学 E-commerce website recommending method based on similarity of users
CN104156390A (en) * 2014-07-07 2014-11-19 乐视网信息技术(北京)股份有限公司 Comment recommendation method and system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909401A (en) * 2017-11-14 2018-04-13 阮敬 A kind of satisfaction measuring method based on big data technology
CN108595562A (en) * 2018-04-12 2018-09-28 西安邮电大学 User's evaluation data analysing method based on accurate sex determination
CN109284373A (en) * 2018-09-06 2019-01-29 合肥工业大学 The acquisition methods and device of product up-gradation strategy based on text mining driving
CN109376888A (en) * 2018-10-09 2019-02-22 长安大学 A kind of Forum on College Eating-room management system and management method based on cell phone application
CN110837739A (en) * 2019-10-24 2020-02-25 支付宝(杭州)信息技术有限公司 Service processing method and device and electronic equipment
CN111767725A (en) * 2020-06-24 2020-10-13 中国平安财产保险股份有限公司 Data processing method and device based on emotion polarity analysis model
CN111767725B (en) * 2020-06-24 2023-06-20 中国平安财产保险股份有限公司 Data processing method and device based on emotion polarity analysis model
CN112559685A (en) * 2020-12-11 2021-03-26 芜湖汽车前瞻技术研究院有限公司 Automobile forum spam comment identification method

Also Published As

Publication number Publication date
CN106708868B (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN106708868A (en) Method and system for analyzing internet data
EP3893154A1 (en) Recommendation model training method and related apparatus
Jiao et al. Profit maximization mechanism and data management for data analytics services
Sangeetha et al. Service quality models in banking: a review
CN109034973B (en) Commodity recommendation method, commodity recommendation device, commodity recommendation system and computer-readable storage medium
US20140172642A1 (en) Analyzing commodity evaluations
CN105229721A (en) When client device is in the lock state to the dynamic arrangements of the content presented
JP2010079657A (en) Information processor, information processing method, and program
JP2018077615A (en) Advertising image generation device, advertising image generation method and program for advertising image generation device
CN106651544A (en) Conversational recommendation system for minimum user interaction
CN109816134A (en) Shipping address prediction technique, device and storage medium
CN107885784A (en) The method and apparatus for extracting user characteristic data
CN110706028A (en) Commodity evaluation emotion analysis system based on attribute characteristics
CN110033324A (en) Data processing method, device, electronic equipment and computer readable storage medium
CN111654714B (en) Information processing method, apparatus, electronic device and storage medium
Shayaa et al. Social media sentiment analysis of consumer purchasing behavior vs consumer confidence index
KR20220117425A (en) Marketability analysis and commercialization methodology analysis system using big data
CN109636530B (en) Product determination method, product determination device, electronic equipment and computer-readable storage medium
Dargahi et al. Co-production or DIY: an analytical model of consumer choice and social preferences
US20140372207A1 (en) Profit index value generation system and profit index value generation method
CN107679887A (en) A kind for the treatment of method and apparatus of trade company's scoring
Carter et al. When do I profit? Uncovering boundary conditions on reputation effects in online auctions
Al-Zadjali et al. Assessing customer satisfaction of m-banking in Oman using SERVQUAL model
Javidnia et al. Identifying factors affecting acceptance of new technology in the industry using hybrid model of UTAUT and FUZZY DEMATEL
CN103902380B (en) A kind of method, apparatus and equipment determining resource allocation using sandbox

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant