CN107273351A - A kind of product feature extracting method based on big data opining mining - Google Patents

A kind of product feature extracting method based on big data opining mining Download PDF

Info

Publication number
CN107273351A
CN107273351A CN201710395967.8A CN201710395967A CN107273351A CN 107273351 A CN107273351 A CN 107273351A CN 201710395967 A CN201710395967 A CN 201710395967A CN 107273351 A CN107273351 A CN 107273351A
Authority
CN
China
Prior art keywords
product
layer
feature
word
extracting method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710395967.8A
Other languages
Chinese (zh)
Inventor
王振宇
周逸舒
王勇
陈珍珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou Lucheng District New Research Institute Of Advanced Technology
Original Assignee
Wenzhou Lucheng District New Research Institute Of Advanced Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou Lucheng District New Research Institute Of Advanced Technology filed Critical Wenzhou Lucheng District New Research Institute Of Advanced Technology
Priority to CN201710395967.8A priority Critical patent/CN107273351A/en
Publication of CN107273351A publication Critical patent/CN107273351A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present invention proposes a kind of product feature extracting method based on big data opining mining, it is characterised in that comprise the following steps:Step one:The product information of product is captured from website using the page parsing technology of web crawlers and customer comment obtains product language material, and the three layer model that product information integrally comments on details comment is built to product language material;Step 2:Product language material in above-mentioned three layer model is pre-processed, effective data acquisition system is obtained;Step 3:The extraction for carrying out product feature respectively to pretreated three layer model obtains the explicit features of each layer;Step 4:All explicit features are carried out to conclude the explicit features collection for constituting the product.This method can help manufacturer and service provider targetedly to improve properties of product and instruct user to have to each performance of product than more comprehensively understanding.

Description

A kind of product feature extracting method based on big data opining mining
Technical field
The present invention relates to a kind of product feature extracting method based on big data opining mining, to help manufacturer and service Business targetedly improves properties of product and instructs user to have to each performance of product than more comprehensively understanding.
Background technology
Flourishing for ecommerce, excites the upsurge of shopping at network.Network comment serves not only as feedback mechanism help The producer and retailer's lifting properties of product, and effectively help client to make rational decision-making, but comment information is rapid Increase, in the urgent need to making this process become more accurate and convenient by certain technological means.Existing star on network Level evaluation has been not enough to where helping the advantage of the producer, sellers and the clear positioning product of client and where inferior position.At present, The extensive concern that opining mining has caused people is carried out based on product feature.Product feature refers to the part of product, attribute, And the object such as performance.
The mode that product feature is extracted has two kinds, Manual definition and automatically extracts.Kobayashi N etc. manually define vapour The Feature Words of car, the artificial Feature Words for defining film such as Zhuang L, Liu B etc. use correlation rule, utilize Apriori The method that algorithm obtains frequent item set obtains product feature Candidate Set (noun or noun phrase) automatically, and Lee is waited in Chinese in fact In language material, using the non-supervisory type product feature mining algorithm based on Apriori algorithms, the information excavating of product feature is realized, Somprasertsri G etc. are according to syntactic analysis, using Feature Words Relation acquisition product feature different from 6 kinds of emotion word, Wei C P etc. do beta pruning processing to Feature Words using semantics method using emotion word.Because Manual definition has field limitation Property, different field needs different domain experts to determine the Feature Words in the field, and transplantability is poor.
The content of the invention
Based on above mentioned problem, present invention aims at provide a kind of product feature extraction side based on big data opining mining Method, to help manufacturer and service provider targetedly to improve properties of product and instruct user to have each performance of product to compare Comprehensively understand.
For problem above, there is provided following technical scheme:A kind of product feature based on big data opining mining is extracted Method, it is characterised in that comprise the following steps:
Step one:The product information of product is captured from website and is produced from customer comment using the page parsing technology of web crawlers Product language material, and the three layer model of product information-entirety comment-details comment is built to product language material;
Step 2:Product language material in above-mentioned three layer model is pre-processed, effective data acquisition system is obtained;
Step 3:The extraction for carrying out product feature respectively to pretreated three layer model obtains the explicit features of each layer;
Step 4:All explicit features are carried out to conclude the explicit features collection for constituting the product.
The present invention is further arranged to, and the first layer of the three layer model kind is product information layer, generally refers to product Title and product attribute;The second layer is overall comment layer, that is, summarizes the advantage and disadvantage of product;Third layer is that details comments on layer, in detail The thin specific view illustrated to the product.
The present invention is further arranged to, and the pretreatment in the step 2 includes:
(1)Subordinate sentence:To the processing of text document subordinate sentence;
(2)Part-of-speech tagging:The part of speech of each word in sentence is recognized, the scope of characteristic item Candidate Set is reduced;
(3)Stop words:In three layer model, the Feature Words that last layer is extracted will be used as next layer of stop words;
(4)Root is reduced or affixe trimming:The multi-form of same word is mainly reduced into canonical form.
The present invention is further arranged to, and is to the product feature extracting method of product information layer in the step 3:
(1)The product title includes name of product and selling point, and name of product part do not extract to it, and by name of product Word be added to as stop words in stop words vocabulary;Selling point extracting section part of speech is the word of noun;
(2)Extract the word that part of speech in product attribute short sentence is noun;
The present invention is further arranged to, and the overall product feature for commenting on layer is extracted in the step 3 and uses FP growth algorithms Frequent item set is obtained as feature Candidate Set, it is that can obtain this layer then to carry out beta pruning to the frequent item set in feature Candidate Set Explicit features.
The present invention is further arranged to, and the beta pruning of the frequent item set uses two ways:Tight ness rating beta pruning and redundancy Beta pruning, the tight ness rating beta pruning refers to removal insignificant frequent item set in feature Candidate Set, and the redundancy beta pruning refers to Be remove feature Candidate Set in can not complete expression product feature frequent item set.
The present invention is further arranged to, and the product feature extraction that layer is commented on details in the step 3 is specially: First two layers of product feature automatically analyzes out the syntactic structure and word and word of sentence using syntactic analysis as the stop words of this layer Between dominance relation, the word screening conditions that stop words is only extracted as dependency analysis will add number of transactions if not stop words According in the D of storehouse, frequent item set then is extracted to transaction database D again and its beta pruning is handled.
Beneficial effects of the present invention:The present invention using automatically extracting mode, according to product be presented on mode on network and Feature, has built product information-entirety comment-details and has commented on 3 layer models extraction product feature, the model is for different layers Level feature, employs different extracting methods, in addition, the model by the Feature Words on upper strata be defined as lower floor stop words and Dependence is added in third layer, the purpose to characteristic item dimensionality reduction is successfully realized, so as to improve the efficiency of feature extraction.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the product feature extracting method of the invention based on big data opining mining.
Embodiment
With reference to the accompanying drawings and examples, the embodiment to the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
As shown in figure 1, a kind of product feature extracting method based on big data opining mining, it is characterised in that including following Step:
Step one:The product information of product is captured from website and is produced from customer comment using the page parsing technology of web crawlers Product language material, and the three layer model of product information-entirety comment-details comment is built to product language material;
The first layer of the three layer model kind is product information layer, generally refers to product title and product attribute;The second layer is Overall comment layer, that is, summarize the advantage and disadvantage of product;Third layer is that details comments on layer, elaborates the specific view to the product.
Three layer model has following benefit:1) product feature extracting method is separate between each layer, and each layer can be according to it Unique features select different algorithms;2)There is contact between each layer again, the product feature that it has been extracted is shared on upper strata to lower floor, Lower floor can avoid extracting and upper strata identical feature.
Step 2:Product language material in above-mentioned three layer model is pre-processed, effective data acquisition system is obtained;
Above-mentioned pretreatment includes:
(1)Subordinate sentence:To the processing of text document subordinate sentence;
(2)Part-of-speech tagging:The part of speech of each word in sentence is recognized, the scope of characteristic item Candidate Set is reduced;
(3)Stop words:In three layer model, the Feature Words that last layer is extracted will be used as next layer of stop words;
(4)Root is reduced or affixe trimming:The multi-form of same word is mainly reduced into canonical form.
Step 3:The extraction for carrying out product feature respectively to pretreated three layer model obtains the explicit features of each layer;
To product information layer product feature extracting method be:
(1)The product title includes name of product and selling point, and name of product part do not extract to it, and by name of product Word be added to as stop words in stop words vocabulary;Selling point extracting section part of speech is the word of noun;
(2)Extract the word that part of speech in product attribute short sentence is noun;
The overall product feature for commenting on layer is extracted and is specially:Frequent item set is obtained using FP growth algorithms and is used as feature candidate Collection, it is that can obtain the explicit features of this layer then to carry out beta pruning to the frequent item set in feature Candidate Set.The frequent item set Beta pruning uses two ways:Tight ness rating beta pruning and redundancy beta pruning, the tight ness rating beta pruning refer to removing in feature Candidate Set In insignificant frequent item set, the redundancy beta pruning refers to removing can not complete expression product feature in feature Candidate Set Frequent item set.
The FP growth algorithms include 2 methods, construct Fp_tree and FP trees excavation FP-growth (Tree, α);
The construction Fp_tree arthmetic statements are as follows:
1) a transaction database D is scanned first, obtains the set L of 1 frequent episode;
2) FP-Tree root node is created, labeled as " null ";
3) frequent episode in each affairs Trans is ranked up, obtain p | P ], wherein, p is in affairs Trans One element, and P is the list of surplus element;
4) call insert_tree ([ p | P ], T), if T some child N, meet condition N.item-name= P.item-name, then N counting increase by 1, otherwise create new node N, are counted and be set to 1, be linked to its father node T, And the node with identical item-name is linked to by node chain structure.If P non-NULLs, recursive call Insert-tree (P, N).
Excavation FP-growth (Tree, α) arthmetic statement of the FP trees is as follows:
1) if Tree contain single path P then
2) combination (being denoted as β) of for each path Ps interior joint
3) pattern β ∪ α are produced, the minimum support that its support counting support_count is equal to β interior joints is counted;
4) in else for Tree head table each ai
5) pattern β=ai ∪ α, its support counting support_count=ai.support_count is produced;
6) construction β conditional pattern base and β condition FP-Tree, i.e. Tree β;
7) If Treeβ≠Φ then
8) FP _ growth (Tree β, β) is called;}
To details comment on layer product feature extract be specially:First two layers of product feature as this layer stop words, using sentence Method analysis automatically analyzes out dominance relation between the syntactic structure of sentence and word and word, what stop words was only extracted as dependency analysis Word screening conditions, will add in transaction database D if not stop words, and then extract frequent item set to transaction database D again And its beta pruning is handled.
Step 4:All explicit features are carried out to conclude the explicit features collection for constituting the product.
The three layer model built in the present invention extracts product feature better than a traditional layer model, and the three layer model is not for With the level feature of layer, different extracting methods are employed, in addition, the Feature Words on upper strata are defined as the deactivation of lower floor by the model Word and dependence is added in third layer, the purpose to characteristic item dimensionality reduction is successfully realized, so as to improve feature extraction Efficiency.
Described above in the step 3 is only the preferred embodiment of the present invention, it is noted that for this technology For the those of ordinary skill in field, without departing from the technical principles of the invention, some improvement and change can also be made Type, these improvement and modification of above-mentioned hypothesis also should be regarded as protection scope of the present invention.

Claims (7)

1. a kind of product feature extracting method based on big data opining mining, it is characterised in that comprise the following steps:
Step one:The product information of product is captured from website and is produced from customer comment using the page parsing technology of web crawlers Product language material, and the three layer model of product information-entirety comment-details comment is built to product language material;
Step 2:Product language material in above-mentioned three layer model is pre-processed, effective data acquisition system is obtained;
Step 3:The extraction for carrying out product feature respectively to pretreated three layer model obtains the explicit features of each layer;
Step 4:All explicit features are carried out to conclude the explicit features collection for constituting the product.
2. a kind of product feature extracting method based on big data opining mining according to claim 1, it is characterised in that: The first layer of the three layer model kind is product information layer, generally refers to product title and product attribute;The second layer is overall Layer is commented on, that is, summarizes the advantage and disadvantage of product;Third layer is that details comments on layer, elaborates the specific view to the product.
3. a kind of product feature extracting method based on big data opining mining according to claim 1 or 2, its feature exists In:Pretreatment in the step 2 includes:
Subordinate sentence:To the processing of text document subordinate sentence;
Part-of-speech tagging:The part of speech of each word in sentence is recognized, the scope of characteristic item Candidate Set is reduced;
Stop words:In three layer model, the Feature Words that last layer is extracted will be used as next layer of stop words;
Root is reduced or affixe trimming:The multi-form of same word is mainly reduced into canonical form.
4. a kind of product feature extracting method based on big data opining mining according to claim 2, it is characterised in that: It is to the product feature extracting method of product information layer in the step 3:
The product title includes name of product and selling point, and name of product part do not extract to it, and by name of product Word is added in stop words vocabulary as stop words;Selling point extracting section part of speech is the word of noun;
Extract the word that part of speech in product attribute short sentence is noun.
5. a kind of product feature extracting method based on big data opining mining according to claim 2, it is characterised in that: The overall product feature for commenting on layer is extracted in the step 3 feature candidate is used as using FP growth algorithms acquisition frequent item set Collection, it is that can obtain the explicit features of this layer then to carry out beta pruning to the frequent item set in feature Candidate Set.
6. a kind of product feature extracting method based on big data opining mining according to claim 5, it is characterised in that: The beta pruning of the frequent item set uses two ways:Tight ness rating beta pruning and redundancy beta pruning, the tight ness rating beta pruning are referred to Except the insignificant frequent item set in feature Candidate Set, the redundancy beta pruning is referred to can not be complete in removal feature Candidate Set State the frequent item set of product feature.
7. a kind of product feature extracting method based on big data opining mining according to claim 2, it is characterised in that: It is specially to the product feature extraction of details comment layer in the step 3:First two layers of product feature is stopped as this layer Word, dominance relation between the syntactic structure of sentence and word and word is automatically analyzed out using syntactic analysis, stop words be only used as according to The word screening conditions that analysis is extracted are deposited, will be added if not stop words in transaction database D, then again to transaction database D Extract frequent item set and its beta pruning is handled.
CN201710395967.8A 2017-05-31 2017-05-31 A kind of product feature extracting method based on big data opining mining Pending CN107273351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710395967.8A CN107273351A (en) 2017-05-31 2017-05-31 A kind of product feature extracting method based on big data opining mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710395967.8A CN107273351A (en) 2017-05-31 2017-05-31 A kind of product feature extracting method based on big data opining mining

Publications (1)

Publication Number Publication Date
CN107273351A true CN107273351A (en) 2017-10-20

Family

ID=60064368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710395967.8A Pending CN107273351A (en) 2017-05-31 2017-05-31 A kind of product feature extracting method based on big data opining mining

Country Status (1)

Country Link
CN (1) CN107273351A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
US8787707B1 (en) * 2011-06-29 2014-07-22 Amazon Technologies, Inc. Identification of product attributes
CN106257455A (en) * 2016-07-08 2016-12-28 闽江学院 A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object
CN106384245A (en) * 2016-09-06 2017-02-08 合肥工业大学 Product feature analysis method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8787707B1 (en) * 2011-06-29 2014-07-22 Amazon Technologies, Inc. Identification of product attributes
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN106257455A (en) * 2016-07-08 2016-12-28 闽江学院 A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object
CN106384245A (en) * 2016-09-06 2017-02-08 合肥工业大学 Product feature analysis method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘羽等: "《基于观点挖掘的产品特征提取》", 《计算机应用与软件》 *

Similar Documents

Publication Publication Date Title
US11334726B1 (en) Applied artificial intelligence technology for using natural language processing to train a natural language generation system with respect to date and number textual features
CN106250412B (en) Knowledge mapping construction method based on the fusion of multi-source entity
WO2021196520A1 (en) Tax field-oriented knowledge map construction method and system
US9626358B2 (en) Creating ontologies by analyzing natural language texts
US10445428B2 (en) Information object extraction using combination of classifiers
US8635107B2 (en) Automatic expansion of an advertisement offer inventory
CN107423288A (en) A kind of Chinese automatic word-cut and method based on unsupervised learning
US20100241639A1 (en) Apparatus and methods for concept-centric information extraction
US20040172393A1 (en) System and method for matching and assembling records
CN101950284A (en) Chinese word segmentation method and system
CN111625659A (en) Knowledge graph processing method, device, server and storage medium
CN105718585B (en) Document and label word justice correlating method and its device
Sarkhel et al. Visual segmentation for information extraction from heterogeneous visually rich documents
CN108027814A (en) Disable word recognition method and device
KR101948257B1 (en) Multi-classification device and method using lsp
CN106354844A (en) Service combination package recommendation system and method based on text mining
CN110390022A (en) A kind of professional knowledge map construction method of automation
CN108665141A (en) A method of extracting emergency response procedural model automatically from accident prediction scheme
CN103246655A (en) Text categorizing method, device and system
KR101532252B1 (en) The system for collecting and analyzing of information of social network
CN103970865B (en) Microblog text level subject finding method and system based on seed words
Castellanos et al. Leveraging web streams for contractual situational awareness in operational BI
CN107273351A (en) A kind of product feature extracting method based on big data opining mining
CN108090121A (en) Book crossing digging system and method
US8069032B2 (en) Lightweight windowing method for screening harvested data for novelty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171020