CN111814025A - Viewpoint extraction method and device - Google Patents

Viewpoint extraction method and device Download PDF

Info

Publication number
CN111814025A
CN111814025A CN202010426854.1A CN202010426854A CN111814025A CN 111814025 A CN111814025 A CN 111814025A CN 202010426854 A CN202010426854 A CN 202010426854A CN 111814025 A CN111814025 A CN 111814025A
Authority
CN
China
Prior art keywords
processed
participle
information
viewpoint
dependency relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010426854.1A
Other languages
Chinese (zh)
Inventor
杨春阳
李健
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202010426854.1A priority Critical patent/CN111814025A/en
Publication of CN111814025A publication Critical patent/CN111814025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention provides a viewpoint extraction method and device, and belongs to the technical field of data processing. The method comprises the following steps: performing word segmentation on the comment information to be processed to obtain word segments to be processed; acquiring the part of speech and the dependency relationship of the participle to be processed; and performing viewpoint extraction on the participles to be processed according to the parts of speech and/or the dependency relationship and a preset template to obtain viewpoint information of the comment information to be processed. By extracting the viewpoint information of which the part of speech and/or the dependency relationship conform to the preset template from the information to be processed according to the part of speech and the dependency relationship between the parts of speech in the information to be processed, the acquired viewpoint information is more accurate and the user viewpoint can be effectively expressed due to the consideration of the part of speech and the dependency relationship of the parts of speech.

Description

Viewpoint extraction method and device
Technical Field
The invention belongs to the field of data processing, and particularly relates to a viewpoint extraction method and device.
Background
With the development of network environment, products and services which can be obtained by users through the network are increasingly frequent, and users can also give comments on various services and products on various service platforms. For a platform operator, the comment of the user is analyzed, so that the demand trend of the user can be known, the service and the product of the user can be improved, and the development of the user is promoted.
The platform operator usually browses the comment information of the user manually to know the viewpoint of the user, which is not only time-consuming and inefficient, but also increases the data volume of the comment information, and browsing manually is obviously somewhat more complicated. On the other hand, a mutual point information algorithm can be adopted to calculate the association degree between the candidate characteristic words and the candidate viewpoint words, calculate the probability of joint occurrence of each word in the corpus, and take the specific combination with high probability and including the viewpoint characteristic words as the viewpoint information, thereby automatically extracting the viewpoint information in the comment information.
However, in this way, only the relevance between the participles is considered, and the interrelation between the words is not considered, so that all words of the expression viewpoint in the comment information cannot be effectively extracted, so that the accuracy of the obtained viewpoint information is low, and often many words with high relevance cannot constitute an effective viewpoint and cannot effectively express the viewpoint of the user, so how to accurately and effectively extract the viewpoint of the user from the comment information becomes an urgent problem in the art.
Disclosure of Invention
In view of the above, the present invention provides a viewpoint extraction method and apparatus, so as to solve the problem that how to accurately and effectively extract a user viewpoint from comment information in the prior art is an urgent need in the art.
According to a first aspect of the present invention, there is provided a viewpoint extracting method including:
performing word segmentation on the comment information to be processed to obtain word segments to be processed;
acquiring the part of speech and the dependency relationship of the participle to be processed;
and performing viewpoint extraction on the participles to be processed according to the parts of speech and/or the dependency relationship and a preset template to obtain viewpoint information of the comment information to be processed.
Optionally, the performing viewpoint extraction on the to-be-processed participle according to the part of speech and/or the dependency relationship according to a preset template to obtain viewpoint information of the to-be-processed comment information includes:
extracting a plurality of first participles with dependency relationship as target dependency relationship from the participles to be processed;
combining the plurality of first segmentation words according to the target dependency relationship to obtain viewpoint information of the comment information to be processed;
the target dependency relationship comprises any one of a main predicate relationship, a centering relationship and a state dependency relationship.
Optionally, the combining the plurality of first terms according to the target dependency relationship to obtain the viewpoint information of the comment information to be processed includes:
and under the condition that the same first word is subordinate to a plurality of target dependency relationships, combining the first word according to the target dependency relationships to obtain a plurality of viewpoint information of the comment information to be processed.
Optionally, the performing viewpoint extraction on the to-be-processed participle according to the part of speech and/or the dependency relationship according to a preset template to obtain viewpoint information of the to-be-processed comment information includes:
acquiring a central word segmentation from the word segmentation to be processed according to the dependency relationship;
acquiring a first noun participle with the part of speech as a noun from the participle to be processed;
and combining the central participle and the first noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
Optionally, the performing viewpoint extraction on the to-be-processed participle according to the part of speech and/or the dependency relationship according to a preset template to obtain viewpoint information of the to-be-processed comment information includes:
acquiring a second noun participle with the part of speech as a noun from the participle to be processed;
determining target participles adjacent to the second noun participle in the participles to be processed, wherein the part of speech of the target participle at least comprises: any one of nouns, adjectives, verbs, adverbs and auxiliary words;
and combining the target participle and the second noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
Optionally, after the obtaining of the viewpoint information of the comment information to be processed, the method further includes:
acquiring a central word segmentation from the word segmentation to be processed according to the dependency relationship;
and in the case that negative participles with the relation in the formation shape of the central participle exist in the participles to be processed, and the negative participles exist in a negative word dictionary, adding the negative participles to the viewpoint information according to the dependency relation.
Optionally, the adding the negative participle to the viewpoint information according to the dependency relationship includes:
and adding the negative participle with the minimum participle interval with the center participle to the viewpoint information according to the dependency relationship when the negative participle exists in a plurality.
Optionally, the performing word segmentation processing on the comment information to be processed to obtain word segments to be processed includes:
performing word segmentation processing on the comment information to be processed according to a preset dictionary to obtain word segmentation to be processed, wherein the preset dictionary at least comprises: standard dictionary, negative word dictionary.
Optionally, after obtaining the viewpoint information of the comment information to be processed, the method further includes:
and optimizing the viewpoint information according to a preset algorithm, wherein the preset algorithm at least comprises any one of a distance principle and a word co-occurrence algorithm.
According to a second aspect of the present invention, there is provided a viewpoint extracting apparatus including:
the word segmentation module is used for carrying out word segmentation on the comment information to be processed to obtain word segments to be processed;
the acquisition module is used for acquiring the part of speech and the dependency relationship of the participle to be processed;
and the extraction module is used for carrying out viewpoint extraction on the participle to be processed according to the part of speech and/or the dependency relationship and a preset template to obtain viewpoint information of the comment information to be processed.
Optionally, the extracting module is further configured to:
extracting a plurality of first participles with dependency relationship as target dependency relationship from the participles to be processed;
combining the plurality of first segmentation words according to the target dependency relationship to obtain viewpoint information of the comment information to be processed;
the target dependency relationship comprises any one of a main predicate relationship, a centering relationship and a state dependency relationship.
Optionally, the extracting module is further configured to:
and under the condition that the same first word is subordinate to a plurality of target dependency relationships, combining the first word according to the target dependency relationships to obtain a plurality of viewpoint information of the comment information to be processed.
Optionally, the extracting module is further configured to:
acquiring a central word segmentation from the word segmentation to be processed according to the dependency relationship;
acquiring a first noun participle with the part of speech as a noun from the participle to be processed;
and combining the central participle and the first noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
Optionally, the extracting module is further configured to:
acquiring a second noun participle with the part of speech as a noun from the participle to be processed;
determining target participles adjacent to the second noun participle in the participles to be processed, wherein the part of speech of the target participle at least comprises: any one of nouns, adjectives, verbs, adverbs and auxiliary words;
and combining the target participle and the second noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
Optionally, the apparatus further includes:
the processing module is used for acquiring the central participle from the participles to be processed according to the dependency relationship;
and the adding module is used for adding the negative participle into the viewpoint information according to the dependency relationship when the negative participle exists in a negative word dictionary and the negative participle has a relation with a central participle in a structural shape in the to-be-processed participle.
Optionally, the adding module is further configured to:
and adding the negative participle with the minimum participle interval with the center participle to the viewpoint information according to the dependency relationship when the negative participle exists in a plurality.
Optionally, the word segmentation module is further configured to:
performing word segmentation processing on the comment information to be processed according to a preset dictionary to obtain word segmentation to be processed, wherein the preset dictionary at least comprises: standard dictionary, negative word dictionary.
Optionally, the apparatus further includes:
and the optimization module is used for optimizing the viewpoint information according to a preset algorithm, wherein the preset algorithm at least comprises any one of a distance principle and a word co-occurrence algorithm.
According to a third aspect of the present invention, there is provided an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the viewpoint information extraction method according to any one of the first aspect when executing the computer program.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the viewpoint information extraction method according to any one of the first aspects.
Aiming at the prior art, the invention has the following advantages:
the invention provides a viewpoint extraction method and device, and belongs to the technical field of data processing. The method comprises the following steps: performing word segmentation on the comment information to be processed to obtain word segments to be processed; acquiring the part of speech and the dependency relationship of the participle to be processed; and performing viewpoint extraction on the participles to be processed according to the parts of speech and/or the dependency relationship and a preset template to obtain viewpoint information of the comment information to be processed. By extracting the viewpoint information of which the part of speech and/or the dependency relationship conform to the preset template from the information to be processed according to the part of speech and the dependency relationship between the parts of speech in the information to be processed, the acquired viewpoint information is more accurate and the user viewpoint can be effectively expressed because the part of speech and the dependency relationship of the parts of speech are taken into consideration.
The foregoing description is only an overview of the technical solutions of the present invention, and the present invention can be implemented according to the content of the description in order to make the technical means of the present invention more clearly understood, and the specific embodiments of the present invention are described below in order to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like parts are denoted by like reference numerals throughout the drawings. In the drawings:
FIG. 1 is a flow chart of steps of a method for extracting viewpoints according to an embodiment of the present invention;
FIG. 2 is a flow chart of steps of another method for extracting viewpoints provided in an embodiment of the present invention;
fig. 3 is a block diagram of a viewpoint extracting apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a viewpoint extraction method provided in an embodiment of the present invention, where the method includes:
step 101, performing word segmentation processing on the information to be processed to obtain word segments to be processed.
In the embodiment of the present invention, the information to be processed refers to text information containing user comment content, for example: buyer comments in the e-commerce store, reader comments posted in a forum, reader comments in a blog article, and other information indicating a user's opinion about a certain product or service.
Since the information to be processed is text information composed of phrases, which part is the viewpoint of the user cannot be directly identified from the text information, it is necessary to perform sentence segmentation and word segmentation processing on the information to be processed first to obtain each word segmentation of the information to be processed as a word segmentation to be processed. The word segmentation processing may specifically adopt a conventional word segmentation technology in the prior art such as a Chinese character 'jiba' word segmentation, and before word segmentation, denoising processing may also be performed on the information to be processed, for example: the word segmentation processing is performed after removing useless components such as labels and special symbols, so as to improve the effectiveness of the obtained word segmentation to be processed, and of course, the word segmentation can be performed on the information to be processed, the specific word segmentation mode can be determined according to actual needs, and the specific word segmentation mode is not limited specifically here.
And 102, acquiring the part of speech and the dependency relationship of the participle to be processed.
In the embodiment of the invention, the part of speech refers to the syntactic classification of the participle to be processed, such as adjectives, action words, nouns, adverbs, pronouns and the like. The dependency relationship refers to the relationship between the dominance and the dominance between the participles to be processed, and may include: a cardinal relationship, an actor-guest relationship, an inter-guest relationship, a centered relationship, a state-in-state relationship, a parallel relationship, etc. The dependency relationship is directional, the dominant component is called the dominant one, and the dominant component is called the dependent one, for example: the 'red apple' is subordinate to the 'red', and forms a centering relationship; "beauty" in "very beautiful" is subject to "very good", constituting a relationship in the form.
The part of speech of the word segmentation to be processed may be manually labeled, or may be labeled by a machine learning model, and the model may be, for example: the method includes the following steps of selecting a hidden Markov model, a conditional stochastic model, a recurrent neural network model and the like, wherein a specific part-of-speech tagging mode can be determined according to actual requirements, and the realizability of the scheme is taken as the standard, and the method is not particularly limited.
The dependency relationship may be derived using a dependency parsing algorithm. The dependency syntax analysis is one of key technologies of natural language processing and mainly comprises two aspects, namely, on one hand, determining a syntax system of a language, namely, giving formal definition to a syntax structure of a legal sentence in information to be processed; and on the other hand, according to a given grammar system, the syntactic structure of the sentence is automatically deduced, and the sentence units contained in the sentence and the relationship among the syntactic units are analyzed. The dependency syntax analysis may be implemented in a conventional manner in the art, as long as the dependency relationship of the to-be-processed participle can be obtained, which is described in detail herein.
And 103, performing viewpoint extraction on the to-be-processed participles according to the parts of speech and/or the dependency relationship and according to a preset template to obtain viewpoint information of the to-be-processed comment information.
In the embodiment of the invention, the user expresses viewpoints through languages in various ways, and not necessarily through simple adjective description, or directly represents like or dislike, and the like, so when the viewpoint participles in the participles to be processed are obtained, not only the participles directly representing the viewpoints in the participles to be processed are obtained, but also the participles of which the user modifies the user viewpoints in the participles to be processed are combined to obtain the viewpoint participles completely expressing the user viewpoints. For example: the apple is cheap, wherein the apple is a physical noun, and the cheap is a practical viewpoint word of the user, but the price also describes the viewpoint of the user on the physical noun, so that the price and the cheap are combined and matched as viewpoint participles, not only the cheap, so that the obtained viewpoint participles can express the complete viewpoint of the user. Specifically, adjectives, verbs and the like adjacent to the entity nouns in the multiple to-be-processed participles belonging to the same sentence can be combined according to the fixed part-of-speech collocation template to serve as viewpoint participles, and the specific collocation manner can be determined according to actual requirements, and is not limited here. Further, as viewpoint information of the information to be processed, for example: "the space of this house is large", wherein the viewpoint word is "large space", and the "large space" and the "house" constitute a dependency relationship therebetween, so that the obtained viewpoint information is "large house space".
It can be seen that the viewpoint information in the information to be processed can be more comprehensively obtained from the viewpoint information of the participle to be processed by combining the part of speech and the dependency relationship. Of course, the participles with specific dependency relationship in the participles to be processed can also be directly combined and directly used as the viewpoint information. The viewpoint segmentation obtained above can also be directly used as viewpoint information only by using part-of-speech analysis, as long as the viewpoint information can express the attitude viewpoint of the user, and the viewpoint information can be determined specifically according to actual needs, and is not specifically limited here.
According to the first viewpoint extraction method provided by the invention, the viewpoint information of which the part of speech and/or the dependency relationship conform to the preset template is extracted from the information to be processed according to the part of speech and the dependency relationship between the parts of speech in the information to be processed, and the acquired viewpoint information is more accurate and the user viewpoint can be effectively expressed because the part of speech and the dependency relationship of the parts of speech are considered.
Fig. 3 is another viewpoint extraction method provided in an embodiment of the present invention, where the method includes:
step 201, performing word segmentation processing on the information to be processed according to a preset dictionary to obtain word segmentation to be processed, where the preset dictionary at least includes: standard dictionary, negative word dictionary.
In the embodiment of the present invention, the standard dictionary may be a corpus of a plurality of segmented words, for example, a conventional dictionary such as a chinese dictionary. The negative word dictionary refers to a preset language library containing the negative meaning participles of the table. Because the comment information of the user is not necessarily described in a positive form, but also may be described in a negative form, such as "this car does not like", "the house is not beautiful", etc., while the prior art cannot effectively identify and distinguish the comment information in the negative form, but for the same sentence, whether the comment information is in the negative form has a qualitative influence on the meaning, if the negative form is not identified, the negative word and the viewpoint word are directly segmented, and the finally obtained viewpoint information possibly runs counter to the actual viewpoint of the user. Therefore, when the word segmentation processing is carried out on the information to be processed, not only a common dictionary containing a standard dictionary but also a negative word dictionary are introduced, and the weight of the negative words can be improved, so that the negative modifiers can be completely divided and separated from the central words in the information to be processed.
Step 202, obtaining the part of speech and the dependency relationship of the participle to be processed.
This step can refer to the detailed description of step 102, which is not repeated here.
Step 203, extracting a plurality of first participles with the dependency relationship being the target dependency relationship from the participles to be processed.
The target dependency relationship comprises any one of a predicate relationship, a host-guest relationship, a centering relationship and a state-in-state relationship.
In the embodiment of the present invention, the relationships among the cardinal relations, the main-guest relations, the centering relations and the relationships among the shapes in the general case may express important viewpoints in sentences, such as "traffic is relatively convenient and the room is a little small", where "traffic" and "convenience" and "room" and "small" respectively constitute the cardinal relations.
Further, here, it is necessary to mainly avoid the situation that the extracted first participle cannot constitute a readable sentence, and therefore, it is necessary to limit the part-of-speech of the first participle, such as the subject being limited to non-pronouns and non-adjectives, the predicate being a non-verb, and the like.
And 204, combining the plurality of first segmentation words according to the target dependency relationship to obtain viewpoint information of the information to be processed.
In the embodiment of the present invention, the extracted first term is combined according to the sentence components of the corresponding target dependency relationship, for example, if the constituent cardinality relationships "room" and "small", "traffic" and "convenience" are obtained as described above, the obtained viewpoint information is "room small", "traffic convenience".
Furthermore, it is necessary to complement the limited fixed phrase components of the subject, for example, "the ascending space is large", the main phrase component is "the space is large", the literal meaning means that the area or the volume is large, but actually the "ascending" space is large, and ambiguity is caused if the fixed phrase is not added, so that the noun fixed phrase is limited to be complemented here. Or for viewpoint information without complete structure, the method can also be supplemented by searching participles with a centering relationship or a relationship in a shape, for example, "very" and "close" form a relationship in a shape, for example, the centering relationship of "good cell" can be disassembled and combined into "cell good".
Optionally, step 204 includes: and under the condition that the same first segmentation is subordinate to a plurality of target dependency relationships, combining the first segmentation according to the target dependency relationships to obtain a plurality of viewpoint information of the information to be processed.
In the embodiment of the present invention, in the case where a structure having a component relationship of a common subject or object, or even a fixed language, is complemented, that is, a first term is simultaneously subordinate to a plurality of target dependencies, it is necessary to generate a plurality of viewpoint information for each target dependency. For example: the subject is "service" and "good" is a modified subject, and a parallel component "environment" exists in the subject, so that the good "needs to be added, and two subjects respectively form" environment good "and" service good ".
And step 205, acquiring a central participle from the to-be-processed participles according to the dependency relationship.
In the embodiment of the present invention, the central participle refers to a participle that is not governed by other participles in the to-be-processed participles constituting a sentence, and there is no dependency relationship between the participles on both sides of the central participle. When viewpoint information is extracted from the to-be-processed participles only through the dependency relationship, a center participle in the sentence can be determined first, and the center participle is usually a predicate verb in the sentence.
Step 206, obtaining the first noun participle with the part of speech as a noun from the to-be-processed participle.
In the embodiment of the present invention, the nouns in the to-be-processed participles are usually entity targets for the user to express the viewpoint, and are also indispensable sentence components for expressing the viewpoint, and if there is no first noun participle, ambiguity is easily caused. For example, "house looks good" and "house" is a physical noun that can be a first noun participle.
And step 207, combining the central participle and the first noun participle according to the dependency relationship to obtain viewpoint information of the information to be processed.
In the embodiment of the invention, the center participle, the first noun participle and other participles to be processed are searched and combined according to the common collocation, and the viewpoint information can be obtained. For example: "the space of the house looks large", "the space" is the first noun participle "," looks "as the center participle", "large", and the viewpoint information obtained is "the space looks large".
And 208, acquiring a second noun participle with the part of speech being a noun from the to-be-processed participle.
In the embodiment of the present invention, the second noun word segmentation in this step is similar to the first noun word segmentation in step 206, and is not described herein again.
Step 209, determining a target participle adjacent to the second noun participle in the to-be-processed participle, where the part of speech of the target participle at least includes: nouns, adjectives, verbs, adverbs, and helpers.
In the embodiment of the invention, the viewpoint information can be extracted by fixedly matching the second noun participle with the noun part of speech in the participles to be processed with the ringing part of speech with the specified part of speech. Specifically, through research and analysis on the seed corpus, the fixed part-of-speech template including the viewpoint participle in table 1 below can be obtained.
Figure BDA0002499025000000101
Figure BDA0002499025000000111
TABLE 1
Referring to table 1 above, the matching modes mainly exist include noun + adjective, retest + verb, verb + noun, verb + adjective, adjective + adjective, adverb + adjective, adjective + noun, and adjective + adjective, and of course, the specific matching mode is only an exemplary description, and is not limited to the content of table 1 above, and the specific matching mode may be expanded and modified according to actual needs.
Step 210, combining the target participle and the second noun participle according to the dependency relationship to obtain the viewpoint information of the information to be processed.
In the embodiment of the present invention, referring to the example in table 1, after the target participle that can meet the matching mode is obtained, the target participle and the second noun participle may be combined according to the dependency relationship therebetween, so as to obtain the viewpoint information of the information to be processed.
And step 211, in the case that a negative participle with a relation in a central participle composition shape exists in the participles to be processed and the negative participle exists in a negative word dictionary, adding the negative participle to the viewpoint information according to the dependency relation.
In the embodiment of the present invention, after the viewpoint information is acquired, it is necessary to determine a negation word for the object participle of the modified predicate, and since the negation word changes the meaning of the viewpoint information to be expressed, the object participle having the negation word, that is, the negation word is retained. For example: "this cell is good and the room price is not low", the viewpoint information to be extracted is "cell good" and "room price is not low", and if the word "no" in the negative shape of the table is ignored, the viewpoint information obtained is "cell good" and "room price is low". It is therefore necessary to add negative participles to the point of view information.
Further, the predicate-centric or the segmentation with the relation in the shape of the object in the information to be processed may be determined negatively by checking whether the segmentation exists in a negative word dictionary, and if the segmentation has the relation in the shape with the predicate-centric and exists in the negative word dictionary, combining the negative segmentation into the viewpoint information. For example, "this piece of clothing is not cheap", and if the extracted viewpoint information is "clothing is cheap". However, "not" is related to the structural shape of the core word "cheap" as a shape word, and therefore "not" is added to "cheap clothes" in accordance with the dependency relationship, and finally returned viewpoint information "inconvenient clothes" expressing a negative viewpoint.
Optionally, step 211 includes: and when a plurality of negative participles exist, adding the negative participle with the minimum participle interval with the center participle into the viewpoint information according to the dependency relationship.
In the embodiment of the present invention, in the case where there are a plurality of negative participles, whether the viewpoint information is specifically table negative or table positive may be determined according to the number of negative participles. And when the negative participles have odd number, the opinion information needs to be negative in the table, and the negative participles with the minimum participle interval from the center participles in the information to be processed are added into the opinion information. For example: "this school is not good either", there are two negative participles "not" at this time, the viewpoint information obtained keeps "school is good"; the obtained viewpoint information is 'the room price is reasonable', but three negative participles 'No' exist, at the moment, the negative participle 'No' closest to the central participle 'reasonable' is reserved, and the negative participle 'No' is added to the front of 'reasonable' according to the dependency relationship, so that the final viewpoint information 'the room price is unreasonable' is obtained.
And 212, optimizing the viewpoint information according to a preset algorithm, wherein the preset algorithm at least comprises any one of a distance principle and a word co-occurrence algorithm.
In the embodiment of the present invention, in order to further enhance the accuracy of the obtained viewpoint information, after the viewpoint information is obtained, the obtained viewpoint information may be further optimized according to a viewpoint optimization algorithm such as a distance principle and/or a word co-occurrence algorithm. Specifically, the word co-occurrence algorithm calculates the probability of joint occurrence of each participle in the corpus by calculating the degree of association between candidate feature words and candidate viewpoint words included in the sentence. The distance principle is a principle of judging the relationship between a plurality of syntax components which depend on each other in a sentence according to the distance between the syntax components, and the viewpoint information in the sentence can be extracted according to whether the obtained relationship is a direct relationship or an indirect relationship. The specific optimization algorithm may be determined according to actual requirements, and is not specifically limited herein.
The invention provides another viewpoint extraction method and device, by extracting viewpoint information of which the part of speech and/or the dependency relationship conform to a preset template from information to be processed according to the part of speech and the dependency relationship among the parts of speech in the information to be processed, the acquired viewpoint information is more accurate and the user viewpoint can be effectively expressed due to the consideration of the part of speech and the dependency relationship among the parts of speech. And by introducing a negative word distinguishing mechanism in the word segmentation stage and the extraction stage, the obtained viewpoint information can distinguish the negative words, so that the obtained viewpoint information is more accurate.
Fig. 3 is a viewpoint extracting apparatus 30 according to an embodiment of the present invention, the apparatus including:
and the word segmentation module 301 is configured to perform word segmentation on the comment information to be processed to obtain a word to be processed.
An obtaining module 302, configured to obtain a part of speech and a dependency relationship of the to-be-processed participle.
And the extraction module 303 is configured to perform viewpoint extraction on the to-be-processed participle according to a preset template according to the part of speech and/or the dependency relationship, so as to obtain viewpoint information of the to-be-processed comment information.
Optionally, the extracting module 303 is further configured to:
extracting a plurality of first participles with dependency relationship as target dependency relationship from the participles to be processed;
combining the plurality of first segmentation words according to the target dependency relationship to obtain viewpoint information of the comment information to be processed;
the target dependency relationship comprises any one of a main predicate relationship, a centering relationship and a state dependency relationship.
Optionally, the extracting module 303 is further configured to:
and under the condition that the same first word is subordinate to a plurality of target dependency relationships, combining the first word according to the target dependency relationships to obtain a plurality of viewpoint information of the comment information to be processed.
Optionally, the extracting module 303 is further configured to:
acquiring a central word segmentation from the word segmentation to be processed according to the dependency relationship;
acquiring a first noun participle with the part of speech as a noun from the participle to be processed;
and combining the central participle and the first noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
Optionally, the extracting module 303 is further configured to:
acquiring a second noun participle with the part of speech as a noun from the participle to be processed;
determining target participles adjacent to the second noun participle in the participles to be processed, wherein the part of speech of the target participle at least comprises: any one of nouns, adjectives, verbs, adverbs and auxiliary words;
and combining the target participle and the second noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
Optionally, the apparatus further includes:
and the processing module 304 is configured to obtain a central participle from the to-be-processed participles according to the dependency relationship.
An adding module 305, configured to add a negative participle having a relationship with a central participle configuration shape to the viewpoint information according to the dependency relationship when the negative participle exists in the to-be-processed participle and the negative participle exists in a negative word dictionary.
Optionally, the adding module 305 is further configured to:
and adding the negative participle with the minimum participle interval with the center participle to the viewpoint information according to the dependency relationship when the negative participle exists in a plurality.
Optionally, the word segmentation module 301 is further configured to:
performing word segmentation processing on the comment information to be processed according to a preset dictionary to obtain word segmentation to be processed, wherein the preset dictionary at least comprises: standard dictionary, negative word dictionary.
Optionally, the apparatus further includes:
an optimizing module 306, configured to optimize the viewpoint information according to a preset algorithm, where the preset algorithm at least includes any one of a distance principle and a word co-occurrence algorithm.
The invention provides a viewpoint extraction device, which extracts viewpoint information of which the part of speech and/or the dependency relationship conform to a preset template from information to be processed according to the part of speech and the dependency relationship among the parts of speech of the words in the information to be processed.
For the above gateway device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant points, reference may be made to part of the description of the method embodiment.
In addition, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements each process of the above-mentioned viewpoint extraction method embodiment, and can achieve the same technical effect, and details are not repeated here to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned viewpoint extracting method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the description of the process is not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As is readily imaginable to the person skilled in the art: any combination of the above embodiments is possible, and thus any combination between the above embodiments is an embodiment of the present invention, but the present disclosure is not necessarily detailed herein for reasons of space.
One approach to point of view extraction provided herein is not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The structure required to construct a system incorporating aspects of the present invention will be apparent from the description above. In addition, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and specific languages are described above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of a viewpoint extraction method according to embodiments of the present invention. The present invention may also be embodied as devices or gateway device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several gateway devices, several of these gateway devices may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. A viewpoint extraction method, characterized by comprising:
performing word segmentation on the comment information to be processed to obtain word segments to be processed;
acquiring the part of speech and the dependency relationship of the participle to be processed;
and performing viewpoint extraction on the participles to be processed according to the parts of speech and/or the dependency relationship and a preset template to obtain viewpoint information of the comment information to be processed.
2. The method according to claim 1, wherein the performing viewpoint extraction on the to-be-processed participles according to the part of speech and/or the dependency relationship according to a preset template to obtain viewpoint information of the to-be-processed comment information includes:
extracting a plurality of first participles with dependency relationship as target dependency relationship from the participles to be processed;
combining the plurality of first segmentation words according to the target dependency relationship to obtain viewpoint information of the comment information to be processed;
the target dependency relationship comprises any one of a main predicate relationship, a centering relationship and a state dependency relationship.
3. The method according to claim 2, wherein the combining the plurality of first terms according to the target dependency relationship to obtain the viewpoint information of the comment information to be processed includes:
and under the condition that the same first segmentation is subordinate to a plurality of target dependency relationships, combining the first segmentation according to the target dependency relationships to obtain a plurality of viewpoint information of the comment information to be processed.
4. The method according to claim 1, wherein the performing viewpoint extraction on the to-be-processed participles according to the part of speech and/or the dependency relationship according to a preset template to obtain viewpoint information of the to-be-processed comment information includes:
acquiring a central word segmentation from the word segmentation to be processed according to the dependency relationship;
acquiring a first noun participle with the part of speech as a noun from the participle to be processed;
and combining the central participle and the first noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
5. The method according to claim 1, wherein the performing viewpoint extraction on the to-be-processed participles according to the part of speech and/or the dependency relationship according to a preset template to obtain viewpoint information of the to-be-processed comment information includes:
acquiring a second noun participle with the part of speech as a noun from the participle to be processed;
determining target participles adjacent to the second noun participle in the participles to be processed, wherein the part of speech of the target participle at least comprises: any one of nouns, adjectives, verbs, adverbs and auxiliary words;
and combining the target participle and the second noun participle according to the dependency relationship to obtain the viewpoint information of the comment information to be processed.
6. The method according to claim 1, further comprising, after the obtaining viewpoint information of the comment information to be processed,:
acquiring a central word segmentation from the word segmentation to be processed according to the dependency relationship;
and in the case that negative participles with a relation in a central participle composition shape exist in the participles to be processed, and the negative participles exist in a negative word dictionary, adding the negative participles into the viewpoint information according to the dependence relation.
7. The method of claim 6, wherein the adding the negative participle to the opinion information in accordance with the dependency relationship comprises:
and in the case that the negative participle exists in plurality, adding the negative participle with the minimum participle interval with the center participle into the viewpoint information according to the dependency relationship.
8. The method of claim 1, wherein performing word segmentation on the comment information to be processed to obtain a word to be processed comprises:
performing word segmentation processing on the comment information to be processed according to a preset dictionary to obtain word segmentation to be processed, wherein the preset dictionary at least comprises: standard dictionary, negative word dictionary.
9. The method according to claim 1, wherein after obtaining the opinion information of the comment information to be processed, further comprising:
and optimizing the viewpoint information according to a preset algorithm, wherein the preset algorithm at least comprises any one of a distance principle and a word co-occurrence algorithm.
10. A viewpoint extraction apparatus, characterized in that the method comprises:
the word segmentation module is used for carrying out word segmentation on the comment information to be processed to obtain word segments to be processed;
the acquisition module is used for acquiring the part of speech and the dependency relationship of the participle to be processed;
and the extraction module is used for carrying out viewpoint extraction on the participle to be processed according to the part of speech and/or the dependency relationship and a preset template to obtain viewpoint information of the comment information to be processed.
CN202010426854.1A 2020-05-19 2020-05-19 Viewpoint extraction method and device Pending CN111814025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010426854.1A CN111814025A (en) 2020-05-19 2020-05-19 Viewpoint extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010426854.1A CN111814025A (en) 2020-05-19 2020-05-19 Viewpoint extraction method and device

Publications (1)

Publication Number Publication Date
CN111814025A true CN111814025A (en) 2020-10-23

Family

ID=72848405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010426854.1A Pending CN111814025A (en) 2020-05-19 2020-05-19 Viewpoint extraction method and device

Country Status (1)

Country Link
CN (1) CN111814025A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186552A (en) * 2021-12-13 2022-03-15 北京百度网讯科技有限公司 Text analysis method, device and equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255553A1 (en) * 2004-03-31 2007-11-01 Matsushita Electric Industrial Co., Ltd. Information Extraction System
CN105224640A (en) * 2015-09-25 2016-01-06 杭州朗和科技有限公司 A kind of method and apparatus extracting viewpoint
CN110781369A (en) * 2018-07-11 2020-02-11 天津大学 Emotional cause mining method based on dependency syntax and generalized causal network
CN110825948A (en) * 2019-11-05 2020-02-21 重庆邮电大学 Rumor propagation control method based on rumor-splitting message and representation learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255553A1 (en) * 2004-03-31 2007-11-01 Matsushita Electric Industrial Co., Ltd. Information Extraction System
CN105224640A (en) * 2015-09-25 2016-01-06 杭州朗和科技有限公司 A kind of method and apparatus extracting viewpoint
CN110781369A (en) * 2018-07-11 2020-02-11 天津大学 Emotional cause mining method based on dependency syntax and generalized causal network
CN110825948A (en) * 2019-11-05 2020-02-21 重庆邮电大学 Rumor propagation control method based on rumor-splitting message and representation learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186552A (en) * 2021-12-13 2022-03-15 北京百度网讯科技有限公司 Text analysis method, device and equipment and computer storage medium

Similar Documents

Publication Publication Date Title
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN105095204B (en) The acquisition methods and device of synonym
JP5936698B2 (en) Word semantic relation extraction device
CN106951530B (en) Event type extraction method and device
US8443008B2 (en) Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof
CN107544988B (en) Method and device for acquiring public opinion data
WO2017198031A1 (en) Semantic parsing method and apparatus
KR101508070B1 (en) Method for word sense diambiguration of polysemy predicates using UWordMap
US9600469B2 (en) Method for detecting grammatical errors, error detection device for same and computer-readable recording medium having method recorded thereon
CN112069312B (en) Text classification method based on entity recognition and electronic device
JP4534666B2 (en) Text sentence search device and text sentence search program
Rodrigues et al. Advanced applications of natural language processing for performing information extraction
Azpeitia et al. Nerc-fr: supervised named entity recognition for french
CN111428031B (en) Graph model filtering method integrating shallow semantic information
CN109472008A (en) A kind of Text similarity computing method, apparatus and electronic equipment
CN114997288A (en) Design resource association method
Jha et al. Hsas: Hindi subjectivity analysis system
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN111814025A (en) Viewpoint extraction method and device
Onyenwe et al. Toward an effective igbo part-of-speech tagger
Quan et al. Combine sentiment lexicon and dependency parsing for sentiment classification
US20110106849A1 (en) New case generation device, new case generation method, and new case generation program
CN114970516A (en) Data enhancement method and device, storage medium and electronic equipment
CN110929501B (en) Text analysis method and device
Rofiq Indonesian news extractive text summarization using latent semantic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination