CN112560464A - Method and device for identifying implicit attribute of commodity, computer equipment and storage medium - Google Patents

Method and device for identifying implicit attribute of commodity, computer equipment and storage medium Download PDF

Info

Publication number
CN112560464A
CN112560464A CN202011487207.8A CN202011487207A CN112560464A CN 112560464 A CN112560464 A CN 112560464A CN 202011487207 A CN202011487207 A CN 202011487207A CN 112560464 A CN112560464 A CN 112560464A
Authority
CN
China
Prior art keywords
implicit
explicit
commodity
sentence
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011487207.8A
Other languages
Chinese (zh)
Inventor
霍慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202011487207.8A priority Critical patent/CN112560464A/en
Publication of CN112560464A publication Critical patent/CN112560464A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure provides a method, an apparatus, a computer device and a storage medium for identifying implicit attributes of commodities, wherein the method comprises: acquiring an explicit sentence set and an implicit sentence set based on the original comment corpus; constructing a mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words based on the explicit sentence set; judging whether the emotion words in the implicit sentences are domain-specific emotion words or not for each implicit sentence in the implicit sentence set; if the specific emotion words are the field specific emotion words, identifying corresponding implicit attributes of the commodities based on the mapping relation between the explicit attribute clusters of the commodities and the field specific emotion words; and if the hidden words are the general emotional words, identifying corresponding hidden attributes of the commodities based on the positions of the hidden sentences in the whole commodity comment. The technical scheme provided by the disclosure realizes the identification of the implicit commodity attribute modified by the field-specific emotion words/general emotion words, thereby improving the accuracy and the breadth of the identification of the implicit commodity attribute and more comprehensively analyzing the comment granularity and emotion.

Description

Method and device for identifying implicit attribute of commodity, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer information processing technologies, and in particular, to a method and an apparatus for identifying implicit attributes of commodities, a computer device, and a computer-readable storage medium.
Background
For commodity comments, different users often pay attention to different commodity characteristics, the users with the same emotional tendency on the whole possibly have different emotional tendencies on the local details of the commodity, the emotional tendencies of the users on various aspects of the evaluation object are deeply mined, potential users can be helped to know indexes or advantages and disadvantages of targets on various attribute dimensions, and reference is provided for purchasing decisions of the potential users; the system can also help the merchant to know the advantages and the defects of the merchant, thereby purposefully improving the commodity design or service, improving the commodity quality, or realizing accurate marketing, but the premise is to firstly realize the extraction of the commodity attributes.
The commodity attributes include a commodity explicit attribute and a commodity implicit attribute. The explicit property of the product refers to a property that appears in the form of characters directly in the product review, for example, the product review "mobile phone appearance is very beautiful" and includes an explicit property word "appearance". An implicit property of a good refers to a property that does not appear in literal form in the review of the good, but can be inferred by certain words or semantics in the review.
At present, explicit attribute extraction of commodities is mainly concerned, implicit attribute extraction of commodities is less concerned, for example, "beautiful" in commodity comments is rather expensive ", wherein the emotional word" expensive "implies the commodity attribute" price ", the emotional word" beautiful "implies the commodity attribute" appearance ", and if the emotional word is not extracted, a part of comment content is ignored, so that comment fine-grained sentiment analysis is incomplete.
Disclosure of Invention
The present disclosure has been made to at least partially solve the technical problems occurring in the prior art.
According to an aspect of the embodiments of the present disclosure, a method for identifying implicit attributes of a commodity is provided, where the method includes:
acquiring an explicit sentence set and an implicit sentence set based on the original comment corpus;
constructing a mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words based on the explicit sentence set;
judging whether the emotion words in the implicit sentences are domain-specific emotion words or not for each implicit sentence in the implicit sentence set;
if the specific emotion words are the field specific emotion words, identifying corresponding implicit attributes of the commodities based on the mapping relation between the explicit attribute clusters of the commodities and the field specific emotion words;
if the non-domain special emotional words exist, corresponding commodity implicit attributes are identified based on the positions of the implicit sentences in the whole commodity comment.
According to another aspect of the disclosed embodiments, there is provided an implicit attribute recognition apparatus for an article, the apparatus including:
an acquisition module configured to acquire an explicit sentence set and an implicit sentence set based on an original comment corpus;
the building module is set to build a mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words based on the explicit sentence set;
the first judgment module is set to judge whether the emotion words in the implicit sentences are domain-specific emotion words or not for each implicit sentence in the implicit sentence set;
the identification module is arranged for identifying the corresponding commodity implicit attribute based on the mapping relation between the commodity explicit attribute cluster and the field-specific emotional word when the first judgment module judges that the emotional word in the implicit sentence is the field-specific emotional word; and the number of the first and second groups,
and when the first judging module judges that the emotion words in the implicit sentence are non-domain-specific emotion words, identifying corresponding commodity implicit attributes based on the positions of the implicit sentence in the whole commodity comment.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer device, including a memory and a processor, where the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes the foregoing product implicit attribute identification method.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the processor executes the foregoing implicit attribute identification method for an article.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the commodity implicit attribute identification method and the commodity implicit attribute identification device provided by the embodiment of the disclosure are used for pre-constructing the mapping relation between a commodity explicit attribute cluster and a field-specific emotion word, identifying the corresponding commodity implicit attribute based on the mapping relation between the commodity explicit attribute cluster and the field-specific emotion word for an implicit sentence containing the field-specific emotion word, and realizing the identification of the implicit commodity attribute modified by the field-specific emotion word; for the implicit sentences containing the general emotion words, the corresponding commodity implicit attributes are identified based on the positions of the implicit sentences in the whole commodity comment, and the identification of the implicit commodity attributes modified by the general emotion words is realized, so that the precision and the breadth of the identification of the commodity implicit attributes are improved, and the comment granularity emotion analysis is more comprehensive.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosed embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the example serve to explain the principles of the disclosure and not to limit the disclosure.
Fig. 1 is a schematic flow chart of a product implicit attribute identification method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a product implicit attribute identification device according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, specific embodiments of the present disclosure are described below in detail with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order; also, the embodiments and features of the embodiments in the present disclosure may be arbitrarily combined with each other without conflict.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of explanation of the present disclosure, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.
Fig. 1 is a schematic flow chart of a product implicit attribute identification method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes the following steps S101 to S105.
S101, acquiring an explicit sentence set and an implicit sentence set based on original comment corpus;
s102, constructing a mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words based on the explicit sentence set;
s103, judging whether the emotion words in the implicit sentences are domain-specific emotion words or not for each implicit sentence in the implicit sentence set, and if yes, executing the step S104; if the non-domain specific emotion words are general emotion words, the execution step is 105;
s104, identifying corresponding commodity implicit attributes based on the mapping relation between the commodity explicit attribute clusters and the field-specific emotion words;
and S105, identifying corresponding implicit attribute of the commodity based on the position of the implicit sentence in the whole commodity comment.
The field-specific emotion words can be set according to actual needs. For example, the domain-specific affective words corresponding to the explicit attribute of the goods "fabric" can be "soft", "comfortable", "breathable", etc.; the domain-specific affective words corresponding to the product explicit attribute "price" can be "expensive", "cheap", "cost-effective", etc. The general emotional words are different from the domain-specific emotional words, and can be 'good' or 'satisfied'.
In the embodiment, the mapping relation between the commodity explicit attribute cluster and the field-specific emotion words is constructed in advance, and for the implicit sentences containing the field-specific emotion words, the corresponding commodity implicit attributes are identified based on the mapping relation between the commodity explicit attribute cluster and the field-specific emotion words, so that the identification of the implicit commodity attributes modified by the field-specific emotion words is realized; for the implicit sentences containing the general emotion words, the corresponding commodity implicit attributes are identified based on the positions of the implicit sentences in the whole commodity comment, and the identification of the implicit commodity attributes modified by the general emotion words is realized, so that the precision and the breadth of the identification of the commodity implicit attributes are improved, and the comment granularity emotion analysis is more comprehensive. In addition, the present embodiment is applicable to product implicit attribute extraction corresponding to an emotion word of an adjective part of speech, and also applicable to product implicit attribute extraction corresponding to an emotion word of another part of speech or a phrase, such as an adverb and a verb.
In a specific embodiment, before step S101, the following step S106 is further included:
and S106, capturing comment data on the electricity merchant platform as original comment corpora.
In this embodiment, the octopus collector can be used to capture comment data on the electricity merchant platform as the original comment corpus.
In one embodiment, step S101 includes the following steps S1011 to S1014.
S1011, preprocessing the original comment corpus;
s1012, clause segmentation is carried out on the preprocessed comment corpus to obtain a short sentence set;
s1013, extracting sentences containing the commodity explicit attribute clusters from the short sentence set, and forming an explicit sentence set by the sentences;
and S1014, extracting sentences which do not contain the explicit property clusters of the commodities from the short sentence set, and forming an implicit sentence set.
In other words, the explicit sentence set is composed of sentences containing explicit property clusters of commodities; the set of implicit sentences consists of sentences other than the sentences containing the explicit property clusters of the items.
Because the Chinese commodity comment text has the characteristics of short length, random short sentence separation, rich semantics and the like, in the embodiment, based on the preprocessing result of the comment corpus, punctuation marks such as commas, semicolons, periods, exclamation marks and the like are used for carrying out clause division on the preprocessed comment corpus to obtain a short sentence set, and the obtained short sentence set is used as a logical semantic unit for processing.
In a specific embodiment, step S101 is specifically:
cleaning the original comment corpus to obtain an effective comment corpus; and the number of the first and second groups,
and performing word segmentation and part-of-speech tagging on the effective comment corpus to obtain the preprocessed comment corpus.
In this embodiment, the original comment corpus is cleaned to filter out comments with low or no value. The Chinese participles and part-of-speech tagging can be performed on the effective comment corpus by using a Python's crust participle package.
In one embodiment, step S102 includes steps S1021 through S1025 as follows.
S1021, extracting the explicit property cluster of the commodity from the explicit sentence set;
s1022, extracting the domain-specific emotion words from the explicit sentence set;
s1023, calculating the relevance between the domain-specific emotion words and the commodity explicit attribute clusters;
s1024, judging whether the association degree between the domain-specific emotion words and the product explicit attribute clusters is smaller than a preset threshold value p, and if not, executing the step S1025; if the current flow is smaller than the preset threshold value p, the current flow is ended;
s1025, constructing a mapping relation between the product explicit attribute cluster and the domain-specific emotional words.
In this embodiment, in step S1021, the association rule FP-tree algorithm, the filtering technique, the clustering technique, and the like may be comprehensively adopted to extract the explicit property cluster of the commodity from the explicit sentence set; the product explicit attribute cluster comprises a plurality of product explicit attributes, and the product explicit attributes can be nouns, noun phrases and vernoun phrases.
In step S1022, mainly, adjectives are extracted as emotion words. Of course, other parts of speech or phrases such as adverbs and verbs may be extracted as emotional words as necessary.
In step S1024, if the association degree between the domain-specific sentiment words and the product explicit attribute cluster is not less than the preset threshold p, which indicates that the two are strongly associated, the mapping relationship between the two is retained, so that the strong association mapping relationship between the product explicit attribute cluster and the domain-specific sentiment words is constructed in step S1025.
An example of a strong association mapping relationship between an explicit property cluster of a commodity and a domain-specific emotion word is shown in table 1 below:
TABLE 1
Explicit property cluster for goods Special emotion word for field
Fabric Soft, comfortable, hard, durable, breathable, soft …
Price Expensive, low, cheap, cost-effective and cost-effective …
In one embodiment, step S1023 calculates the association between the domain-specific emotion words and the product explicit attribute cluster using the following formula:
Figure BDA0002839661720000061
in the formula, PMI (W, F) is the association degree between the domain-specific emotion words and the commodity explicit attribute clusters, W is the domain-specific emotion words, and F is the commodity explicit attribute clusters; p (W) is the probability of the domain-specific emotional words W appearing in the explicit sentence set; p (F) is the probability of the commodity explicit attribute cluster F appearing in the explicit sentence set; p (W and F) is the probability that the domain-specific emotion words W and the commodity explicit attribute cluster F co-occur in the explicit sentence set.
In this embodiment, for the PMI value between the field-specific emotion word and the product explicit attribute cluster obtained by calculation, the larger the PMI value is, the higher the probability that the field-specific emotion word and the product explicit attribute cluster appear together is, and the higher the association degree is; the smaller the PMI value is, the lower the probability that the domain-specific emotion word and the product explicit attribute cluster appear together is, and the lower the degree of association is.
In one embodiment, the probability p (F) that the commodity explicit attribute cluster F appears in the explicit sentence set is calculated by the following formula:
Figure BDA0002839661720000071
the probability P (W and F) that the domain-specific emotion words and the commodity explicit attribute clusters appear together in the explicit sentence set is calculated by adopting the following formula:
Figure BDA0002839661720000072
wherein n is the number of the commodity explicit attributes in the commodity explicit attribute cluster F, and FiExplicit property clustering for goodsThe ith item explicit Attribute in F, P (F)i) Probability of occurrence of the explicit property for the ith commodity in the explicit sentence set; co-occure (F)iW) is FiAnd W is the probability of the common occurrence in the explicit sentence set, and N is the number of the explicit sentences in the explicit sentence set.
In a specific embodiment, before step S103, the following steps S107 and S108 are further included.
S107, judging whether an emotional word exists in each implicit sentence in the implicit sentence set, and if yes, executing the step S108; if no emotional words exist, the implicit sentence is abandoned;
and S108, extracting the emotional words in the implicit sentence.
In the embodiment, whether the emotional words exist in the implicit sentences is judged firstly, and the implicit sentences without the emotional words are directly abandoned; for an implicit sentence with emotion words, firstly extracting the emotion words in the implicit sentence, then judging the types of the emotion words, if the emotion words are domain-specific emotion words, identifying corresponding commodity implicit attributes based on the mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words, and if the emotion words are general emotion words, identifying the corresponding commodity implicit attributes by judging the positions of the implicit sentence in the whole commodity comment and the consistency with the context emotion tendency.
In one embodiment, step S105 includes the following steps S1051 to S1052.
S1051, obtaining the position of the implicit sentence in the whole commodity comment, if the position is at the beginning or end of the sentence, executing step S1052; if yes, go to step S1053;
s1052, identifying the implicit attribute of the implicit sentence as 'whole';
s1053, analyzing the emotional tendency of the implicit sentence and the context, if the emotional tendency is consistent, identifying the implicit attribute of the context as the implicit attribute of the implicit sentence, otherwise, manually identifying. Specifically, if the implicit sentence is consistent with the emotional tendency of the upper text, identifying the implicit attribute of the upper text as the implicit attribute of the implicit sentence; if the implicit sentence is consistent with the emotional tendency of the context, identifying the implicit attribute of the context as the implicit attribute of the implicit sentence; if the implicit sentence is consistent with the emotional tendency of the context, the implicit attributes of the context are identified to be the implicit attributes of the implicit sentence together.
In the embodiment, the implicit commodity attribute modified by the general emotional words is identified by judging the position of the implicit sentence in the whole commodity comment and the consistency of the implicit sentence and the context emotional tendency.
How to identify the implicit attributes of the goods is described in detail below by way of a specific example in conjunction with table 1 above:
the whole commodity comment sentence is: not just about! Kayasu-shui comfortable and super like!
The sentence is divided into five short sentences, namely 'still good', 'kayi', 'comfortable and straight wearing', 'super like' and 'cost-effective', and the short sentences are all implicit sentences.
For the short sentence "kay", the processing is directly abandoned since there is no emotional word.
For the phrases 'comfortable and straight wearing' and 'cost-effective' to extract the emotion words 'comfortable' and 'cost-effective', the two emotion words are field-specific emotion words, and the implicit attributes are 'fabric' and 'price' respectively can be identified by comparing the table 1.
For the short sentence "good still" and "super like", the emotional words "good still" and "like" are extracted, both of which are general emotional words, which cannot be found in the foregoing table 1, and the corresponding implicit attribute of the product needs to be identified based on the position of the short sentence in the whole product review. Specifically, if the short sentence is "good" at the beginning of the whole commodity comment sentence, the implicit attribute of the short sentence is identified as "whole"; the short sentence 'super like' is positioned in the sentence in the whole commodity comment sentence and is consistent with the emotional tendency of the whole commodity comment sentence, and the implicit attribute of the short sentence is identified as 'fabric'.
The commodity implicit attribute identification method provided by the embodiment of the disclosure is characterized in that a mapping relation between a commodity explicit attribute cluster and a field-specific emotion word is pre-established, and for an implicit sentence containing the field-specific emotion word, a corresponding commodity implicit attribute is identified based on the mapping relation between the commodity explicit attribute cluster and the field-specific emotion word, so that the identification of the implicit commodity attribute modified by the field-specific emotion word is realized; for the implicit sentences containing the general emotion words, the corresponding commodity implicit attributes are identified based on the positions of the implicit sentences in the whole commodity comment, the identification of the implicit commodity attributes modified by the general emotion words is realized, the precision and the breadth of the commodity implicit attribute identification are improved, the comment granularity emotion analysis is more comprehensive, potential users can be helped to know indexes or advantages and disadvantages of targets in various attribute dimensions, reference is provided for purchasing decisions of the potential users, merchants can be helped to know advantages and disadvantages of commodities, commodity design or service is improved in a targeted manner, and the commodity quality is improved or accurate marketing is realized.
Fig. 2 is a schematic structural diagram of a product implicit attribute identification device according to an embodiment of the present disclosure. As shown in fig. 2, the apparatus 2 includes: the device comprises an acquisition module 21, a construction module 22, a first judgment module 23 and an identification module 24.
Wherein the obtaining module 21 is configured to obtain the explicit sentence set and the implicit sentence set based on the original comment corpus. The construction module 22 is configured to construct a mapping relationship between the commodity explicit property cluster and the domain-specific sentiment words based on the explicit sentence set. The first determining module 23 is configured to determine, for each implicit sentence in the set of implicit sentences, whether an emotion word in the implicit sentence is a domain-specific emotion word. The identifying module 24 is configured to identify a corresponding commodity implicit attribute based on a mapping relationship between the commodity explicit attribute cluster and the field-specific emotion word when the first judging module 23 judges that the emotion word in the implicit sentence is the field-specific emotion word; and when the first judging module 23 judges that the emotion word in the implicit sentence is not the domain-specific emotion word, identifying the corresponding commodity implicit attribute based on the position of the implicit sentence in the whole commodity comment.
In one embodiment, the apparatus 2 further comprises: a grasping module 25.
The grabbing module 25 is configured to grab the comment data on the electricity merchant platform as the original comment corpus.
In this embodiment, the capturing module 25 may capture comment data on the electronic commerce platform as an original comment corpus by using an octopus collector.
In one embodiment, the obtaining module 21 includes: the device comprises a preprocessing unit, a dividing unit and a first extracting unit.
Wherein the preprocessing unit is configured to preprocess the original comment corpus. The segmentation unit is used for segmenting the preprocessed comment corpus into clauses to obtain a short sentence set. The extraction unit is used for extracting sentences containing the commodity explicit attribute clusters from the short sentence set and forming the sentences into an explicit sentence set; and extracting sentences which do not contain the explicit property clusters of the commodities from the short sentence set, and forming an implicit sentence set by the sentences. In other words, the explicit sentence set is composed of sentences containing explicit property clusters of commodities; the set of implicit sentences consists of sentences other than the sentences containing the explicit property clusters of the items.
In a specific embodiment, the preprocessing unit is specifically configured to:
cleaning the original comment corpus to obtain an effective comment corpus; and the number of the first and second groups,
and performing word segmentation and part-of-speech tagging on the effective comment corpus to obtain the preprocessed comment corpus.
In this embodiment, the original comment corpus is cleaned to filter out comments with low or no value. The Chinese participles and part-of-speech tagging can be performed on the effective comment corpus by using a Python's crust participle package.
In one embodiment, the building module 22 includes: the device comprises a second extraction unit, a third extraction unit, a calculation unit, a judgment unit and a construction unit.
The second extraction unit is arranged to extract the commodity explicit attribute cluster from the explicit sentence set; the third extraction unit is used for extracting the domain-specific emotion words from the explicit sentence set; the calculation unit is set to calculate the association degree between the domain-specific emotion words and the commodity explicit attribute clusters; the judging unit is used for judging whether the association degree between the domain-specific emotion words and the commodity explicit attribute clusters is smaller than a preset threshold value or not; the construction unit is set to construct the mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words when the judgment unit judges that the association degree between the domain-specific emotion words and the commodity explicit attribute cluster is not smaller than a preset threshold value.
In this embodiment, the second extraction unit may extract the explicit property cluster of the commodity from the explicit sentence set by comprehensively using an association rule FP-tree algorithm, a filtering technique, a clustering technique, and the like; the product explicit attribute cluster comprises a plurality of product explicit attributes, and the product explicit attributes can be nouns, noun phrases and vernoun phrases. The third extraction unit mainly extracts adjectives as emotion words. Of course, other parts of speech or phrases such as adverbs and verbs may be extracted as emotional words as necessary.
In a specific embodiment, the calculating unit calculates the association degree between the domain-specific emotion words and the product explicit attribute clusters by using the following formula:
Figure BDA0002839661720000101
in the formula, PMI (W, F) is the association degree between the domain-specific emotion words and the commodity explicit attribute clusters, W is the domain-specific emotion words, and F is the commodity explicit attribute clusters; p (W) is the probability of the domain-specific emotional words W appearing in the explicit sentence set; p (F) is the probability of the commodity explicit attribute cluster F appearing in the explicit sentence set; p (W and F) is the probability that the domain-specific emotion words W and the commodity explicit attribute cluster F co-occur in the explicit sentence set.
In one embodiment, the probability p (F) that the commodity explicit attribute cluster F appears in the explicit sentence set is calculated by the following formula:
Figure BDA0002839661720000111
the probability P (W and F) that the domain-specific emotion words and the commodity explicit attribute clusters appear together in the explicit sentence set is calculated by adopting the following formula:
Figure BDA0002839661720000112
wherein n is the number of the commodity explicit attributes in the commodity explicit attribute cluster F, and FiFor the ith goods explicit attribute in the goods explicit attribute cluster F, P (F)i) Probability of occurrence of the explicit property for the ith commodity in the explicit sentence set; co-occure (F)iW) is FiAnd W is the probability of the common occurrence in the explicit sentence set, and N is the number of the explicit sentences in the explicit sentence set.
In one embodiment, the identification module 24 includes: the device comprises an acquisition unit, an analysis unit and a recognition unit.
The obtaining unit is used for obtaining the position of the implicit sentence in the whole commodity comment; the identification unit is set to identify the implicit attribute of the implicit sentence as 'whole' when the acquisition unit acquires that the position of the implicit sentence in the whole commodity comment is at the beginning or the end of the sentence; the analysis unit is set to analyze the emotional tendency of the implicit sentence and the context thereof when the acquisition unit acquires that the position of the implicit sentence in the whole commodity comment is in the sentence; the identification unit is also configured to identify the implicit attribute of the context as the implicit attribute of the implicit sentence when the analysis unit analyzes that the implicit sentence is consistent with the emotional tendency of the context.
In one embodiment, the apparatus 2 further comprises: a second decision module 26 and an extraction module 27.
The second determining module 26 is configured to determine, for each implicit sentence in the set of implicit sentences, whether there is an emotional word in the implicit sentence; the extracting module 27 is configured to extract the emotion words in the implicit sentence when the second determining module 26 determines that there are emotion words in the implicit sentence.
The commodity implicit attribute identification device provided by the embodiment of the disclosure pre-constructs a mapping relation between a commodity explicit attribute cluster and a field-specific emotion word, identifies a corresponding commodity implicit attribute based on the mapping relation between the commodity explicit attribute cluster and the field-specific emotion word for an implicit sentence containing the field-specific emotion word, and realizes identification of the implicit commodity attribute modified by the field-specific emotion word; for the implicit sentences containing the general emotion words, the corresponding commodity implicit attributes are identified based on the positions of the implicit sentences in the whole commodity comment, the identification of the implicit commodity attributes modified by the general emotion words is realized, the precision and the breadth of the commodity implicit attribute identification are improved, the comment granularity emotion analysis is more comprehensive, potential users can be helped to know indexes or advantages and disadvantages of targets in various attribute dimensions, reference is provided for purchasing decisions of the potential users, merchants can be helped to know advantages and disadvantages of commodities, commodity design or service is improved in a targeted manner, and the commodity quality is improved or accurate marketing is realized.
Based on the same technical concept, the embodiment of the present disclosure correspondingly provides a computer device, as shown in fig. 3, the computer device 3 includes a memory 31 and a processor 32, the memory 31 stores a computer program, and when the processor 32 runs the computer program stored in the memory 31, the processor 32 executes the foregoing product implicit attribute identification method.
Based on the same technical concept, embodiments of the present disclosure correspondingly provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the processor executes the foregoing method for identifying the implicit attribute of the product.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims (12)

1. A commodity implicit attribute identification method is characterized by comprising the following steps:
acquiring an explicit sentence set and an implicit sentence set based on the original comment corpus;
constructing a mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words based on the explicit sentence set;
judging whether the emotion words in the implicit sentences are domain-specific emotion words or not for each implicit sentence in the implicit sentence set;
if the specific emotion words are the field specific emotion words, identifying corresponding implicit attributes of the commodities based on the mapping relation between the explicit attribute clusters of the commodities and the field specific emotion words;
if the non-domain special emotional words exist, corresponding commodity implicit attributes are identified based on the positions of the implicit sentences in the whole commodity comment.
2. The method of claim 1, further comprising, prior to obtaining the set of explicit sentences and the set of implicit sentences based on the original corpus of comments:
and capturing comment data on the electricity merchant platform as original comment corpora.
3. The method of claim 1, wherein obtaining the set of explicit sentences and the set of implicit sentences based on the original comment corpus comprises:
preprocessing the original comment corpus;
performing clause segmentation on the preprocessed comment corpus to obtain a short sentence set;
extracting sentences containing the commodity explicit attribute clusters from the short sentence set, and forming an explicit sentence set; and the number of the first and second groups,
and extracting sentences which do not contain the explicit property clusters of the commodities from the short sentence set, and forming an implicit sentence set by the sentences.
4. The method of claim 3, wherein the preprocessing the raw comment corpus comprises:
cleaning the original comment corpus to obtain an effective comment corpus; and the number of the first and second groups,
and performing word segmentation and part-of-speech tagging on the effective comment corpus to obtain the preprocessed comment corpus.
5. The method of claim 1, wherein the constructing a mapping relationship between the commodity explicit attribute cluster and the domain-specific sentiment words based on the explicit sentence set comprises:
extracting an explicit property cluster of the commodity from the explicit sentence set;
extracting domain-specific emotional words from the explicit sentence set;
calculating the association degree between the domain-specific emotion words and the commodity explicit attribute clusters;
judging whether the association degree between the domain-specific emotion words and the commodity explicit attribute clusters is smaller than a preset threshold value or not;
and if the attribute is not less than the preset threshold value, constructing a mapping relation between the commodity explicit attribute cluster and the domain-specific emotional words.
6. The method of claim 5, wherein the association degree between the domain-specific emotion words and the explicit property clusters of the commodities is calculated by using the following formula:
Figure FDA0002839661710000021
in the formula, PMI (W, F) is the association degree between the domain-specific emotion words and the commodity explicit attribute clusters, W is the domain-specific emotion words, and F is the commodity explicit attribute clusters; p (W) is the probability of the domain-specific emotional words W appearing in the explicit sentence set; p (F) is the probability of the commodity explicit attribute cluster F appearing in the explicit sentence set; p (W and F) is the probability that the domain-specific emotion words W and the commodity explicit attribute cluster F co-occur in the explicit sentence set.
7. The method according to claim 6, wherein the probability P (F) that the commodity explicit property cluster F appears in the explicit sentence set is calculated by the following formula:
Figure FDA0002839661710000022
the probability P (W and F) that the domain-specific emotion words and the commodity explicit attribute clusters appear together in the explicit sentence set is calculated by adopting the following formula:
Figure FDA0002839661710000023
wherein n is the number of the commodity explicit attributes in the commodity explicit attribute cluster F, and FiFor the ith goods explicit attribute in the goods explicit attribute cluster F, P (F)i) Probability of occurrence of the explicit property for the ith commodity in the explicit sentence set; co-occure (F)iW) is FiAnd W is the probability of the common occurrence in the explicit sentence set, and N is the number of the explicit sentences in the explicit sentence set.
8. The method of claim 1, wherein for each implicit sentence in the set of implicit sentences, before determining whether the emotion words in the implicit sentence are domain-specific emotion words, further comprising:
judging whether the implicit sentence has emotional words;
and if the emotional words exist, extracting the emotional words in the implicit sentence.
9. The method according to claim 1, wherein the identifying of the corresponding implicit property of the commodity based on the position of the implicit sentence in the whole commodity comment is specifically:
acquiring the position of the implicit sentence in the whole commodity comment;
if the implicit attribute is at the beginning or the end of the sentence, the implicit attribute of the implicit sentence is identified as 'integral';
if the implicit sentence is in the sentence, analyzing the emotional tendency of the implicit sentence and the context, and if the emotional tendency is consistent, identifying the implicit attribute of the context as the implicit attribute of the implicit sentence.
10. An implicit attribute recognition apparatus for an article, comprising:
an acquisition module configured to acquire an explicit sentence set and an implicit sentence set based on an original comment corpus;
the building module is set to build a mapping relation between the commodity explicit attribute cluster and the domain-specific emotion words based on the explicit sentence set;
the first judgment module is set to judge whether the emotion words in the implicit sentences are domain-specific emotion words or not for each implicit sentence in the implicit sentence set;
the identification module is arranged for identifying the corresponding commodity implicit attribute based on the mapping relation between the commodity explicit attribute cluster and the field-specific emotional word when the first judgment module judges that the emotional word in the implicit sentence is the field-specific emotional word; and the number of the first and second groups,
and when the first judging module judges that the emotion words in the implicit sentence are non-domain-specific emotion words, identifying corresponding commodity implicit attributes based on the positions of the implicit sentence in the whole commodity comment.
11. A computer device, characterized by comprising a memory in which a computer program is stored and a processor, wherein when the processor runs the computer program stored in the memory, the processor executes the article implicit attribute identification method according to any one of claims 1 to 9.
12. A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs a method of implicit property identification of an item of merchandise according to any of claims 1 to 9.
CN202011487207.8A 2020-12-16 2020-12-16 Method and device for identifying implicit attribute of commodity, computer equipment and storage medium Pending CN112560464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011487207.8A CN112560464A (en) 2020-12-16 2020-12-16 Method and device for identifying implicit attribute of commodity, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011487207.8A CN112560464A (en) 2020-12-16 2020-12-16 Method and device for identifying implicit attribute of commodity, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112560464A true CN112560464A (en) 2021-03-26

Family

ID=75064063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011487207.8A Pending CN112560464A (en) 2020-12-16 2020-12-16 Method and device for identifying implicit attribute of commodity, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112560464A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298366A (en) * 2021-05-12 2021-08-24 北京信息科技大学 Tourism performance service value evaluation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682074A (en) * 2012-03-09 2012-09-19 浙江大学 Product implicit attribute recognition method based on manifold learning
CN110334350A (en) * 2019-07-02 2019-10-15 中国联合网络通信集团有限公司 A kind of implicit attribute abstracting method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682074A (en) * 2012-03-09 2012-09-19 浙江大学 Product implicit attribute recognition method based on manifold learning
CN110334350A (en) * 2019-07-02 2019-10-15 中国联合网络通信集团有限公司 A kind of implicit attribute abstracting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王扶东 等: "在线评论中隐式商品特征识别方法" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298366A (en) * 2021-05-12 2021-08-24 北京信息科技大学 Tourism performance service value evaluation method
CN113298366B (en) * 2021-05-12 2023-12-12 北京信息科技大学 Travel performance service value assessment method

Similar Documents

Publication Publication Date Title
Davidov et al. Semi-supervised recognition of sarcasm in Twitter and Amazon
US7971150B2 (en) Document categorisation system
CN103514183B (en) Information search method and system based on interactive document clustering
US20160314195A1 (en) Detecting and combining synonymous topics
CN104081385A (en) Representing information from documents
Kanaris et al. Learning to recognize webpage genres
US20200004817A1 (en) Method, device, and program for text classification
CN109992653A (en) Information processing method and processing system
Dragoni Shellfbk: An information retrieval-based system for multi-domain sentiment analysis
CN112633011B (en) Research front edge identification method and device for fusing word semantics and word co-occurrence information
CN113127605A (en) Method and system for establishing target recognition model, electronic equipment and medium
KR102185733B1 (en) Server and method for automatically generating profile
CN110019820A (en) Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history
CN109992665A (en) A kind of classification method based on the extension of problem target signature
CN112560464A (en) Method and device for identifying implicit attribute of commodity, computer equipment and storage medium
Trisal et al. K-RCC: A novel approach to reduce the computational complexity of KNN algorithm for detecting human behavior on social networks
Park et al. Improving the accuracy and diversity of feature extraction from online reviews using keyword embedding and two clustering methods
Subha et al. Quality factor assessment and text summarization of unambiguous natural language requirements
CN111310467B (en) Topic extraction method and system combining semantic inference in long text
CN110334350B (en) Implicit attribute extraction method and device
Kulkarni et al. Name Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts.
CN106294689B (en) A kind of method and apparatus for selecting to carry out dimensionality reduction based on text category feature
CN112529627B (en) Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
Nguyen et al. Kelabteam: A statistical approach on figurative language sentiment analysis in twitter
CN107590163B (en) The methods, devices and systems of text feature selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210326