WO2016197577A1 - 评论信息的标注方法、装置和计算机设备 - Google Patents
评论信息的标注方法、装置和计算机设备 Download PDFInfo
- Publication number
- WO2016197577A1 WO2016197577A1 PCT/CN2015/097774 CN2015097774W WO2016197577A1 WO 2016197577 A1 WO2016197577 A1 WO 2016197577A1 CN 2015097774 W CN2015097774 W CN 2015097774W WO 2016197577 A1 WO2016197577 A1 WO 2016197577A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- emotional
- word
- words
- commentary
- sentiment
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Definitions
- the present invention relates to the field of network information processing technologies, and in particular, to a method, an apparatus, and a computer device for marking comment information.
- existing comment clustering methods are mostly related to product reviews, such as user ratings in websites such as Tmall and Amazon. It mainly clusters comments around different attributes of the product. In general, first build the category of the attribute, secondly mine the included attribute from the comment, and then attribute the comment to the category of the attribute it contains. Among them, mining attributes from comments often employs methods such as dictionary-based, machine-based learning, and the like.
- event reviews are fundamentally different from product reviews, and most of the event comments have no attributes. Therefore, it is not possible to simply copy the product clustering method to the event comment.
- the content of the event comment is more extensive, making the commonly used method of pre-building categories based on the dictionary not applicable to event comments.
- An object of the embodiments of the present invention is to provide a method, an apparatus, and a computer device for marking comment information, so as to automatically perform emotional clustering on event comments, and label the emotional properties thereof to enhance the user network experience.
- an embodiment of the present invention provides a method for marking comment information, including: acquiring data of a plurality of event comments; separately dividing a statement into the plurality of events, and respectively using the divided statements as Commenting opinions; extracting emotional words from the commentary viewpoints respectively; associating the emotional words co-occurring in any of the commentary views, constructing an emotional word community network; and feelings in any emotional word community in the emotional word community network Word office
- the commentary of the genus labels data of an emotional nature that is positive, negative, or neutral, and the emotional word community includes a set of emotional words that are directly or indirectly associated.
- the processing of respectively extracting the sentiment words from the commentary viewpoint comprises: respectively cutting a sentence of the commentary viewpoint, and selecting the emotion by matching the cut participle with the pre-constructed sentiment word dictionary A word that includes a plurality of emotional words and data of their emotional nature.
- the process of associating the emotional words co-occurring in any of the commentary views further comprises: calculating the two associated emotional words in the emotional word community network The emotion words co-occur in the co-occurrence frequency in the same comment viewpoint, and if the calculated co-occurrence frequency value is lower than the predetermined co-occurrence frequency threshold, the association relationship between the two emotion words is removed.
- the processing of marking the data of the emotional nature to the commentary viewpoint of the sentiment word in any of the emotional word communities in the emotional word community network comprises: according to the emotional nature of the emotional words in the emotional word community The commentary point to which it belongs is labeled with data of emotional nature.
- the sentiment word dictionary further includes data of sentiment intensity of the plurality of sentiment words, wherein the commentary viewpoints belonging to the sentiment words in any of the sentiment word communities in the sentiment word community network are marked with an emotional nature
- the processing of the data further includes: if any of the comment opinions includes emotional words of different emotional nature, the data of the emotional nature corresponding to the emotional words with the strongest emotional intensity is marked for the commentary viewpoint.
- the processing of separately subdividing the plurality of event commenting statements further comprises: removing a sentence whose word count exceeds a predetermined sentence length, and/or removing a statement of an advertisement nature.
- An embodiment of the present invention further provides an annotation device for commenting information, including: a comment acquisition module, configured to acquire data of a plurality of event comments; and a comment clause module, configured to respectively divide the statement into the plurality of event comments, And the divided sentences are respectively used as a commentary viewpoint; the sentiment word extracting module is configured to respectively extract the sentiment words from the commentary viewpoint; the emotional network building module is configured to associate the emotional words co-occurring in any of the commenting viewpoints, and construct An emotional word community network; an emotion tagging module, configured to annotate the emotional nature of the commentary opinion to which the emotional word in any emotional word community in the emotional word community network belongs, the emotional nature is positive, negative or neutral,
- the emotional word community includes a set of emotional words that are directly or indirectly associated.
- the sentiment word extraction module is configured to respectively cut a word for the comment view, and select the sentiment word by matching the cut word segment with a pre-built emotional word dictionary, the sentiment word dictionary Includes data for multiple emotional words and their emotional nature.
- the sentiment network building module is configured to calculate a co-occurrence frequency of the two sentiment words co-occurring in the same comment view for any two associated sentiment words in the sentiment word community network, and if If the calculated value of the co-occurrence frequency is lower than a predetermined co-occurrence frequency threshold, the association relationship between the two emotional words is removed.
- the sentiment labeling module is configured to label the data of the emotional nature for the commentary viewpoint to which the sentiment word in the sentiment word community belongs.
- the sentiment word dictionary further includes data of emotional intensity of the plurality of emotional words
- the sentiment annotation module is further configured to: if any of the commentary opinions include emotional words of different emotional nature, The viewpoint labels the data of the emotional nature corresponding to the emotional words with the strongest emotional intensity.
- the comment clause module is further configured to remove a sentence whose word count exceeds a predetermined sentence length, and/or a statement that removes the nature of the advertisement.
- Embodiments of the present invention also provide a computer device comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory, and Configuring, by the one or more processors, instructions for performing the annotation method of the comment information included in the one or more programs: acquiring data of a plurality of event reviews; separately reviewing the plurality of events Dividing sentences, and dividing the divided statements as review opinions respectively; extracting emotional words from the review opinions respectively; associating the emotional words co-occurring in any of the comment views to construct an emotional word community network;
- the commentary opinion to which the sentiment word in any of the emotional word communities in the network belongs is labeled with data of the emotional nature, which is positive, negative or neutral, and the emotional word community includes a set of emotional words that are directly or indirectly associated.
- the method, device and computer device for marking comment information provided by the embodiment of the present invention obtain a plurality of comment opinions by segmenting the obtained plurality of event comments, and then using the sentiment words included in the comment view as a cluster basis, thereby Emotional clustering of event comments is automatically performed, and the emotional nature of the events is marked, so that the user can quickly know the generality of the lyrics and facilitate the reading of the user. Greatly enriched the user experience.
- FIG. 1 is a flow chart showing a method of marking comment information according to Embodiment 1 of the present invention.
- FIG. 2 is a view showing an example of an emotional word dictionary showing a method of marking comment information according to the first embodiment of the present invention
- FIG. 3 is a diagram showing an example of an emotional word community network of a method for marking comment information according to Embodiment 1 of the present invention
- FIG. 4 is a diagram showing an example of an emotion clustering result of a method of labeling comment information according to Embodiment 1 of the present invention.
- Figure 5 is a logic block diagram showing an annotation device for comment information according to a second embodiment of the present invention.
- Figure 6 is a logic block diagram showing a computer device in accordance with a third embodiment of the present invention.
- the basic idea of the present invention is to further divide the statement of the event comment after the data of the plurality of event comments is obtained, and use the divided statement as the comment viewpoint, and use the "emotional word" as the key of the comment opinion clustering, thereby Emotional clustering of event comments is automatically performed and their emotional nature is marked.
- the emotional nature may be, but is not limited to, positive, negative or neutral, ultimately producing effects such as positive speech, negative speech, and neutral speech, so that the user can easily and quickly know various aspects of other users' attention events, and enhance the user network. Experience.
- the present invention is applicable to a wide range of applications, and user comments similar to news information are particularly applicable to the method, and can also provide a large number of data sources for public opinion monitoring.
- FIG. 1 is a flow chart showing a method of labeling comment information according to a first embodiment of the present invention. Available at The method is performed, for example, on a Weibo server.
- step S110 data of a plurality of event comments is acquired.
- the data of the event comment may be, for example, but not limited to, a comment text published by a source user such as Weibo, Post Bar, News, Forum, or the like.
- step S120 the divided statements are respectively commented on the plurality of events, and the divided sentences are respectively taken as the review opinions.
- step S120 includes removing statements that exceed the predetermined sentence length, and/or removing the nature of the advertisement.
- step S130 emotion words are extracted from the commentary views, respectively.
- step S130 includes: respectively cutting a sentence of the commentary viewpoint, and selecting the sentiment word by matching the cut segmentation word with a pre-built emotional word dictionary.
- the prior art generally adopts a direct matching method for emotional word extraction, but the direct matching method cannot guarantee the extraction effect, and some emotional words may be neglected.
- this embodiment adopts the method of first-cutting words and then matching. That is to say, a Chinese character sequence is divided into a single word, so that the cut word segment is matched with the pre-built emotional word dictionary, and the matched words are selected as emotional words.
- the sentiment word dictionary includes a plurality of emotional words and data of their emotional nature.
- the emotional nature is, but is not limited to, positive, negative or neutral.
- the emotional word dictionary may include tens of thousands or even more emotional words and data of their emotional nature.
- the emotional word “experienced and versatile” expresses a positive emotion.
- the data of emotional nature is “1” for positive and the same reason.
- the emotional word for “drug” expresses negative emotions and their emotional nature.
- the data is “-1” for negative, while the “sudden” emotional word has no obvious emotional tendency.
- the data of emotional nature is “0”, which means neutral.
- step S140 the emotional words that appear together in any of the comment views are associated. Building an emotional word community network.
- the emotional word community includes a set of emotional words that are directly or indirectly associated.
- FIG. 3 is a diagram showing an example of an emotional word community network in which the comment information is marked in the first embodiment of the present invention, wherein each small circle represents an emotional word, and the line between the two small circles represents between the emotional words.
- the association relationship, the shorter the length of the connection indicates that the relationship between the emotional words is stronger, for example, the emotional word a and the emotional word b are related, and the three dotted large circles respectively represent the three emotional word communities.
- Emotional word community 1 reform, crisis, reality, simplicity, care, no, you think yourself, distant, strange, bad, what, trouble, problem, happiness, don't know, health, should, economy, development, load, Niubi, thinking, understanding, garbage, unable, in fact;
- Emotional word community 2 ⁇ , waiting, innocent, loyalty, respect, idiot, tough guy, two ribs, clarification, honesty, willingness, frustration;
- Emotional word community 3 natural, fresh, standard, highest, hardcover
- step S140 further includes: calculating, for the two associated emotional words in the emotional word community network, the two emotional words together The co-occurrence frequency appearing in the same commentary viewpoint, and if the calculated co-occurrence frequency value is lower than the predetermined co-occurrence frequency threshold, the association relationship between the two emotion words is removed.
- the co-occurrence frequency of the two sentiment words may be calculated according to the number of occurrences of each of the two emotional words and the number of co-occurrences thereof.
- the number of co-occurrences usually refers to the number of occurrences of two sentimental words in the same commentary. For example, “experienced scholarship” and “foreign glory” appear simultaneously in a certain commentary view, then the co-occurrence number of the two is 1 If the "experienced and versatile” and “previous glory” appear again in another commentary, their co-occurrence is 2.
- the calculation of the aforementioned co-occurrence frequency can be performed by the following formula:
- word1 is the emotional word a
- word2 is the emotional word b
- word1, word2) is the co-occurrence number of the emotional word a and the emotional word b
- word1 is the number of occurrences of the emotional word a
- word2 is the emotional word b
- word1, word2 is the emotional word b
- word1, word2 is the emotional word b
- word1 is the number of occurrences of the emotional word a
- word2 is the emotional word b
- word1, word2 is the co-occurrence number of the emotional word a and the emotional word b
- word1 is the number of occurrences of the emotional word a
- word2 is the emotional word b
- word1, word2 is the co-occurrence number of the emotional word a and the emotional word b
- word1 is the number of occurrences of the emotional word a
- word2 is the emotional word b
- word1, word2 is the co-occurrence number of the emotional word a and the emotional word b
- word1 is the number of
- two emotion words lower than a predetermined co-occurrence frequency threshold are further selected, and the association relationship between the emotion words satisfying the above conditions is removed.
- the relationship between the emotional words whose occurrence times lower than the set appearance number threshold and other emotional words can be removed.
- step S150 data of the emotional nature is marked for the commentary viewpoint to which the sentiment word in any of the sentiment word communities in the sentiment word community network belongs.
- the data of the emotional nature is marked for the commentary viewpoint to which it belongs. That is to say, if the emotional nature of an emotional word is positive, then the emotional nature of the commentary view to which the emotional word belongs is also marked as positive.
- the sentiment word dictionary may further include data of sentiment intensity of the plurality of sentiment words, the data of the sentiment intensity being a quantification of the emotional nature.
- the last vertical column data represents the emotional intensity data of the emotional word, and the larger the value, the stronger the emotion representing the emotional word, for example, the emotional word of "experienced and versatile", the emotional intensity data is “7” indicates that the positive emotional nature embodied in “experienced and versatile” is relatively heavy.
- the emotional word of “foreign glory” is “9”, and the emotional intensity data “7” is “rooted”. "If you want to be big, it means that the "foreign glory” is more important than the "experienced and versatile”.
- step S150 The method further includes: if any of the comment opinions includes emotional words of different emotional nature, the data of the emotional nature corresponding to the emotional words with the strongest emotional intensity is marked for the commentary viewpoint.
- FIG. 4 is a diagram showing an example of sentiment clustering results of the method for labeling comment information according to the first embodiment of the present invention. Referring to FIG. 4, the same is true for "Asia Pacific Urban Real Estate Research Institute Dean: The property market still has a golden period of 20 years.”
- the positive nature of the commentary is one class, that is, the positive speech shown in the figure, the same reason, the negative nature of the commentary is the negative speech, the emotional nature shown in the figure.
- the neutral commentary is one category, the neutral speech shown in the figure.
- the method for labeling comment information provided by the embodiment of the present invention firstly performs segmentation on a plurality of obtained event comments to obtain a plurality of comment opinions, and secondly, as an emotional word included in the commentary viewpoint that more accurately reflects the user's opinion tendency. Based on the class, the emotions are automatically clustered and the emotional nature of the event is marked.
- the commentary with the emotional nature is used to make the user more intuitive and convenient to understand the views of other users' attention hotspots and improve the user experience.
- the embodiments of the present invention have wide application scope and are applicable to the classification of any user comments, especially the user comments of the event information class.
- Fig. 5 is a logic block diagram showing an apparatus for marking comment information according to a second embodiment of the present invention. It can be used to perform the method steps of the embodiment shown in FIG.
- the annotation device of the comment information includes a comment acquisition module 510, a comment clause module 520, an sentiment word extraction module 530, an emotion network construction module 540, and an emotion annotation module 550.
- the comment acquisition module 510 is configured to acquire data of a plurality of event comments.
- the comment clause module 520 is configured to separately divide the statements into the plurality of events, and respectively use the divided statements as the comment views.
- the comment clause module 520 is further configured to remove the words whose word count exceeds a predetermined sentence length. Sentences, and/or statements that remove the nature of the advertisement.
- the sentiment word extraction module 530 is configured to extract the sentiment words from the commentary views, respectively.
- the sentiment word extraction module 530 is configured to respectively perform a word segmentation on the sentence of the comment opinion, and select the emotion word by matching the cut word segment with the pre-built emotion word dictionary, the emotion word.
- the dictionary includes data on multiple emotional words and their emotional nature.
- the sentiment network building module 540 is configured to associate the emotional words co-occurring in any of the comment views to construct an emotional word community network.
- the emotion network construction module 540 is configured to calculate a co-occurrence frequency of the two emotion words co-occurring in the same comment opinion for any two associated emotion words in the sentiment word community network, and If the calculated value of the co-occurrence frequency is lower than a predetermined co-occurrence frequency threshold, the association relationship between the two emotional words is removed.
- the emotion labeling module 550 is configured to label data of an emotional nature for a commentary point to which the sentiment word in any emotional word community in the emotional word community network belongs, the emotional nature is positive, negative or neutral, the emotional word community Includes a set of emotional words that are directly or indirectly associated.
- the sentiment labeling module 550 is configured to label the data of the emotional nature for the comment viewpoint to which the sentiment word in the sentiment word community belongs.
- the sentiment word dictionary further includes data of sentiment intensity of the plurality of sentiment words
- the sentiment labeling module 550 is further configured to: if any of the comment opinions includes emotional words of different emotional nature,
- the commentary perspectives are data that characterize the emotional nature of the emotionally strongest emotional words.
- the annotation device of the comment information provided by the embodiment of the present invention firstly segments the obtained plurality of event comments to obtain a plurality of comment opinions. Further, the emotional words contained in the commentary viewpoint are used as the clustering basis, so that the event reviews are automatically clustered by emotion, and the emotional nature is marked, so that the user can quickly know the generality of the lyrics and facilitate the reading of the user, thereby greatly enriching the user. Experience.
- Figure 6 is a logic block diagram showing a computer device in accordance with a third embodiment of the present invention.
- the computer device can be used to implement the annotation method of the comment information provided in the above embodiment. Specifically:
- the computer device can include an input unit 610, including one or more computers.
- a memory 620 for reading a storage medium, a processor 630 including one or more processing cores, a display unit 640, a communication unit 650, and a power source 660 are included.
- the computer device architecture illustrated in the figures does not constitute a limitation to a computer device, and may include more or fewer components than those illustrated, or a combination of certain components, or different component arrangements. among them:
- the input unit 610 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
- input unit 610 can include touch-sensitive surface 611 as well as other input devices 612.
- Touch-sensitive surface 611 also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any touchable surface 611 or on the touch surface 611 The operation near the sensitive surface 611) and driving the corresponding connecting device according to a preset program.
- the touch-sensitive surface 611 can include two portions of a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- the processor 630 is provided and can receive commands from the processor 630 and execute them.
- input unit 610 can also include other input devices 612.
- other input devices 612 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
- the memory 620 can be used to store software programs and data, and the processor 630 executes various functional applications and data processing by running software programs and data stored in the memory 620.
- the memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of computer equipment (such as audio data, phone book, etc.).
- memory 620 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 620 can also include a memory controller to provide access to memory 620 by processor 630 and input unit 610.
- the processor 630 is a control center of the computer device and is connected by various interfaces and lines.
- the various portions of the computer device perform various functions and processing data of the computer device by running or executing software programs and/or modules stored in the memory 620, and recalling data stored in the memory 620, thereby performing the computer device Overall monitoring.
- Display unit 640 can be used to display information entered by the user or information provided to the user as well as various graphical user interfaces of the computer device, which can be composed of graphics, text, icons, video, and any combination thereof.
- the display unit 640 may include a display panel.
- the display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
- input unit 610 and display unit 640 are implemented as two separate components to implement input and output functions, in some embodiments, input unit 610 can be integrated with display unit 640 for input and output.
- the communication unit 650 can be used for transmitting and receiving information or receiving and transmitting signals during a call.
- the communication unit 650 can be a network communication device such as an RF (Radio Frequency) circuit, a router, a modem, or the like.
- communication unit 650 can also communicate with the network and other devices via wireless communication.
- Wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access) Divisional Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
- the computer device may also include a power source 660 (such as a battery) that supplies power to the various components.
- the power source may be logically coupled to the processor 630 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
- the power supply 660 can also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
- the computer device may also include a camera, a Bluetooth module, sensors (such as light sensors, motion sensors, and other sensors, etc.), audio circuits, and wireless communication units, etc., and are not described herein.
- the computer device includes one or more processors 630, memory 620, and one or more programs, one or more programs stored in memory 620, and configured to execute, by one or more processors 630, instructions for one or more programs to include an annotation method for reviewing information
- Emotional words are associated to construct an emotional word community network; data of emotional nature is marked for the commentary opinion to which the emotional words in any emotional word community in the emotional word community network belong, the emotional nature is positive, negative or neutral,
- the emotional word community includes a set of emotional words that are directly or indirectly associated.
- the processing of extracting the sentiment words from the commentary viewpoint respectively includes: cutting the sentences of the commentary viewpoints separately, and selecting the emotion words by matching the cut segmentation words with the pre-built emotional word dictionary.
- the sentiment word dictionary includes a plurality of emotional words and data of their emotional nature.
- the process of constructing the emotional word community network further includes: calculating the two associated emotional words for any two of the emotional word community networks The co-occurrence frequency of the emotional words co-occurring in the same commentary viewpoint, and if the calculated co-occurrence frequency value is lower than the predetermined co-occurrence frequency threshold, the association relationship between the two emotional words is removed.
- processing of marking the data of the emotional nature to the commentary viewpoint of the sentiment word in any of the emotional word communities in the emotional word community network includes: according to the emotional nature of the emotional words in the emotional word community The commentary point to which it belongs is data that is emotional in nature.
- the sentiment word dictionary further includes data of sentiment intensity of the plurality of sentiment words, wherein the commentary viewpoints belonging to the sentiment words in any of the sentiment word communities in the sentiment word community network are marked with an emotional nature
- the processing of the data further includes: if any of the comment opinions includes emotional words of different emotional nature, the data of the emotional nature corresponding to the emotional words with the strongest emotional intensity is marked for the commentary viewpoint.
- processing of separately subdividing the plurality of event commenting statements further includes: removing a sentence whose word count exceeds a predetermined sentence length, and/or removing a statement of an advertisement property.
- the computer device provided by the embodiment of the present invention firstly segments the obtained multiple event comments to obtain a plurality of comment opinions, and further uses the sentiment words included in the commentary view as the clustering According to this, the event comments are automatically clustered emotionally, and the emotional nature is marked, so that the user can quickly know the generality of the lyrics and facilitate the reading of the user, which greatly enriches the user experience.
- each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of hardware plus software function modules.
- the above-described integrated modules implemented in the form of software function modules can be stored in a computer readable storage medium.
- the software function modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
一种评论信息的标注方法、装置和计算机设备,所述方法包括:获取多个事件评论的数据(S110);分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点(S120);分别从所述评论观点提取情感词(S130);将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络(S140);为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据(S150),所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。该标注方法、装置和计算机设备能够自动地对事件评论进行情感聚类并为其标注情感性质。
Description
本申请要求于2015年6月12日提交中国专利局、申请号为201510325108.2、发明名称为“评论信息的标注方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及网络信息处理技术领域,尤其涉及一种评论信息的标注方法、装置和计算机设备。
通常,现有的评论聚类方法多是关于商品评论的,例如天猫,亚马逊等网站中的用户评价。其主要是围绕商品的不同属性对评论进行聚类。一般来说,首先构建属性的类别,其次从评论中挖掘包含的属性,然后把评论归到所包含属性的类别。其中,从评论中挖掘属性常采用例如基于词典、基于机器学习等方法。
然而,事件评论与商品评论有着本质的不同,事件评论大多都没有属性。因此,无法简单地把商品聚类方法照搬到事件评论上。其次,事件评论内容较广,使得常用的基于词典预先构建类别的方法无法适用于事件评论。
发明内容
本发明实施例的目的在于,提供一种评论信息的标注方法、装置和计算机设备,以实现自动地对事件评论进行情感聚类,并为其标注情感性质,提升用户网络体验。
为实现上述发明目的,本发明的实施例提供了一种评论信息的标注方法,包括:获取多个事件评论的数据;分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;分别从所述评论观点提取情感词;将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;为所述情感词社区网络中的任一情感词社区中的情感词所
属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
优选地,所述分别从所述评论观点提取情感词的处理包括:分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词,所述情感词词典包括多个情感词以及其情感性质的数据。
优选地,所述将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络的处理还包括:对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
优选地,所述为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据的处理包括:根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。
优选地,所述情感词词典还包括所述多个情感词的情感强度的数据,所述为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据的处理还包括:如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
优选地,所述分别对所述多个事件评论划分语句的处理还包括:去除字数超过预定句长的语句,和/或去除广告性质的语句。
本发明的实施例还提供了一种评论信息的标注装置,包括:评论获取模块,用于获取多个事件评论的数据;评论分句模块,用于分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;情感词提取模块,用于分别从所述评论观点提取情感词;情感网络构建模块,用于将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;情感标注模块,用于为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
优选地,所述情感词提取模块用于分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词,所述情感词词典包括多个情感词以及其情感性质的数据。
优选地,所述情感网络构建模块用于对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
优选地,所述情感标注模块用于根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。
优选地,所述情感词词典还包括所述多个情感词的情感强度的数据,所述情感标注模块还用于如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
优选地,所述评论分句模块还用于去除字数超过预定句长的语句,和/或去除广告性质的语句。
本发明的实施例还提供了一种计算机设备,所述计算机设备包括:一个或多个处理器;存储器;一个或多个程序,所述一个或多个程序存储在所述存储器中,且经配置以由所述一个或者多个处理器执行所述一个或者多个程序包含的用于执行所述评论信息的标注方法的指令:获取多个事件评论的数据;分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;分别从所述评论观点提取情感词;将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
本发明实施例提供的评论信息的标注方法、装置和计算机设备,通过对获取到的多个事件评论进行分句得到多个评论观点,再以评论观点中包含的情感词为聚类依据,从而自动地对事件评论进行情感聚类,并为其标注情感性质,使得用户能够快速知晓舆情的大体,方便用户阅读,
极大丰富了用户体验。
此外,标注了情感性质的数据的评论观点使得用户能够方便了解其他用户关注事件热点的看法。
图1是示出本发明实施例一的评论信息的标注方法的流程图;
图2是示出本发明实施例一的评论信息的标注方法的情感词词典的示例图;
图3是示出本发明实施例一的评论信息的标注方法的情感词社区网络的示例图;
图4是示出本发明实施例一的评论信息的标注方法的情感聚类结果的示例图;
图5是示出本发明实施例二的评论信息的标注装置的逻辑框图;
图6是示出本发明实施例三的的计算机设备的逻辑框图。
本发明的基本构思是,在获取到多个事件评论的数据之后,进一步地对事件评论划分语句,并将划分出的语句作为评论观点,采用“情感词”作为评论观点聚类的关键,从而自动地对事件评论进行情感聚类,并为其标注情感性质。所述情感性质可以是,但不限于,正面、负面或中立,最终生成例如正面言论、负面言论以及中立言论的效果,使得用户能够方便、快速地知道其他用户关注事件的各个方面,提升用户网络体验。
此外,本发明适用范围广,类似于新闻资讯类的用户评论尤其适用于所述方法,同时还可为舆情监控提供大量的数据来源。
下面结合附图对本发明实施例一种评论信息的标注方法、装置和计算机设备进行详细描述。
实施例一
图1是示出本发明实施例一的评论信息的标注方法的流程图。可在
例如微博服务器上执行所述方法。
参照图1,在步骤S110,获取多个事件评论的数据。
这里,事件评论的数据可以是例如但不限于,微博、贴吧、新闻、论坛等来源用户发表的评论文本。
在步骤S120,分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点。
也就是说,对每个事件评论进行分句,划分出的语句作为一个独立的评论观点。为了更为准确地获得评论观点,根据本发明的可选实施例,步骤S120包括:去除字数超过预定句长的语句,和/或去除广告性质的语句。
在步骤S130,分别从所述评论观点提取情感词。
根据本发明的示例性实施例,步骤S130包括:分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词。
需要说明的是,现有技术关于情感词抽取一般采用直接匹配的方式,但是直接匹配的方式无法保证抽取效果,有可能会忽略一些情感词,然而,本实施例采用先切词再匹配的方式,也就是说,将一个汉字序列切分成一个一个单独的词语,从而将切出的分词与预先构建的情感词词典进行匹配,匹配到的词语选取为情感词。
这里,所述情感词词典包括多个情感词以及其情感性质的数据。所述情感性质是,但不限于,正面、负面或中立。
图2是示出本发明实施例一的评论信息的标注方法的情感词词典的示例图,其中列出了多个情感词样例,例如博学多才、毒谋、猛然等等。这里仅是示意性得列出部分情感词,实际上情感词词典可包括上万甚至更多情感词以及其情感性质的数据。例如,“博学多才”这个情感词表达的是一种正面的情感,其情感性质的数据为“1”代表正面,同理,“毒谋”这个情感词表达的是负面的情感,其情感性质的数据为“-1”代表负面,而“猛然”这个情感词没有明显的情感倾向,其情感性质的数据为“0”,则代表中立。
在步骤S140,将在任一评论观点中共同出现的情感词进行关联,构
建情感词社区网络。
具体地,将情感词是否共同出现作为依据,对任一评论观点中的情感词进行关联,那么事件评论的数据中所有的情感词即可构成一个情感词社区网络。这里,所述情感词社区包括一组直接或间接关联的情感词。
图3是示出本发明实施例一的评论信息的标注方法的情感词社区网络的示例图,其中的每个小圆圈代表一个情感词,两个小圆圈之间的连线代表情感词之间的关联关系,连线长度越短表明情感词之间的关联关系越强,例如情感词a和情感词b是相关联的,三个虚线大圆圈分别代表三个情感词社区。
以关于《亚太城市房地产研究院院长:楼市仍有20年黄金期》的事件评论为例,经过前述步骤S110~S140的处理得到如下的四个情感词社区:
情感词社区1:改革、危机、确实、单纯、计较、不行、你以为自己、遥远、奇怪、差的、什么东西、烦恼、问题、快乐、不知道、健康、应该、经济、发展、负荷、牛逼、想、理解、垃圾、无法、其实;
情感词社区2:皑皑、守候、天真、忠义、敬、痴、硬汉、两肋插刀、解语花、诚、愿、坎坷;
情感词社区3:自然、清新、标准、最高、精装;
情感词社区4:太棒了、最爱、感谢。
为了更为准确地构建情感词社区网络,根据本发明的示例性实施例,步骤S140还包括:对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
在具体的实现方式中,可根据所述两个情感词各自的出现次数及其共现次数,计算所述两个情感词的共现频度。这里,共现次数通常是指两个情感词在同一评论观点中出现的次数,比如“博学多才”和“前程似锦”在某一评论观点中同时出现,那么二者的共现次数为1,倘若“博学多才”和“前程似锦”又在另一评论观点中再次同时出现,它们的共现次数就为2。前述共现频度的计算可通过以下公式执行:
其中,word1为情感词a,word2为情感词b,(word1,word2)为情感词a和情感词b的共现次数,(word1)为情感词a的出现次数,(word2)为情感词b的出现次数,N(word1,word2)为情感词a与情感词b的共现频度。
在获得各个情感词之间的共现频度之后,进一步选取低于预定的共现频度阈值的两个情感词,将符合上述条件的情感词之间的关联关系去除。此外,为了避免个别低频情感词与其他情感词的共现频度过高,还可去除出现次数低于设定出现次数阈值的情感词与其他情感词之间的关联关系。
在步骤S150,为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据。
具体地,根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。也就是说,如果某个情感词的情感性质为正面,那么将所述情感词所属的评论观点的情感性质也标注为正面。
根据本发明的优选实施例,所述情感词词典还可以包括所述多个情感词的情感强度的数据,所述情感强度的数据是对情感性质的量化。
同样参照图2,图中最后一竖列数据代表情感词的情感强度的数据,其数值越大则代表所述情感词的情感越强烈,例如“博学多才”这个情感词,其情感强度数据是“7”,表明“博学多才”体现的正面的情感性质比较重,再例如“前程似锦”这个情感词,其情感强度数据是“9”,根前述“博学多才”的情感强度数据“7”要大,则表明“前程似锦”比“博学多才”表达正面的情感性质更重。
此外,还会出现一些特殊的情况,例如在同一评论观点中出现不同情感性质的情感词时,该如何为所述评论观点标注情感性质,相应地,根据本发明的示例性实施例,步骤S150还包括:如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
举例来说,假设某一评论观点中包括了“前程似锦”和“怨气”两个情感词,恰好“前程似锦”的情感性质是正面,而“怨气”的情感性
质是负面,那就比较二者的情感强度的大小,同样参照图2可知,“前程似锦”的情感强度数据是“9”,“怨气”的情感强度数据是“7”,由此可见,“前程似锦”的情感强度数据比“怨气”的情感强度数据要大,则表明所述评论观点表达正面的情感倾向最强,那么就将所述评论观点的情感性质也标注为正面,其情感性质的数据为“1”。
经过前述步骤S110~S150的处理,最终得到标注了情感性质的数据的评论观点。图4是示出本发明实施例一的评论信息的标注方法的情感聚类结果的示例图,参照图4,同样以关于《亚太城市房地产研究院院长:楼市仍有20年黄金期》的事件评论为例,情感性质为正面的评论观点为一类,即图中所示的正面言论,同理,情感性质为负面的评论观点为一类,即图中所示的负面言论,情感性质为中立的评论观点为一类,即图中所示的中立言论。从而达到了一个类似于正方反方的效果,使得用户方便快速地了解舆情的大体,极大程度地提升了用户网络体验。
本发明实施例提供的评论信息的标注方法,首先,对获取到的多个事件评论进行分句得到多个评论观点,其次,以评论观点中包含的更准确反映用户意见倾向的情感词作为聚类依据,从而自动地对事件评论进行情感聚类,并为其标注情感性质,标注了情感性质的评论观点使得用户更为直观、便捷地了解其他用户关注事件热点的看法,提升了用户体验。本发明实施例适用范围广,适用于任何用户评论的归类,尤其是事件资讯类的用户评论。
实施例二
图5是示出本发明实施例二的评论信息的标注装置的逻辑框图。可用于执行如图1所示实施例的方法步骤。
参照图5,所述评论信息的标注装置包括评论获取模块510、评论分句模块520、情感词提取模块530、情感网络构建模块540和情感标注模块550。
评论获取模块510用于获取多个事件评论的数据。
评论分句模块520用于分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点。
可选地,所述评论分句模块520还用于去除字数超过预定句长的语
句,和/或去除广告性质的语句。
情感词提取模块530用于分别从所述评论观点提取情感词。
具体地,所述情感词提取模块530用于分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词,所述情感词词典包括多个情感词以及其情感性质的数据。
情感网络构建模块540用于将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络。
优选地,所述情感网络构建模块540用于对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
情感标注模块550用于为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
优选地,所述情感标注模块550用于根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。
进一步地,所述情感词词典还包括所述多个情感词的情感强度的数据,所述情感标注模块550还用于如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
本发明实施例提供的评论信息的标注装置,先对获取到的多个事件评论进行分句得到多个评论观点。进一步以评论观点中包含的情感词作为聚类依据,从而自动地对事件评论进行情感聚类,并为其标注情感性质,使得用户能够快速知晓舆情的大体,方便用户阅读,极大丰富了用户体验。
实施例三
图6是示出本发明实施例三的计算机设备的逻辑框图。
参照图6,所述计算机设备可用于实施上述实施例中提供的评论信息的标注方法。具体来讲:
计算机设备可包括输入单元610、包括有一个或一个以上计算机可
读存储介质的存储器620、包括有一个或者一个以上处理核心的处理器630、显示单元640、通信单元650、以及电源660等部件。本领域技术人员可以理解,图中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
输入单元610可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。优选地,输入单元610可包括触敏表面611以及其他输入设备612。触敏表面611也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面611上或在触敏表面611附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面611可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器630,并能接收处理器630发来的命令并加以执行。除了触敏表面611,输入单元610还可以包括其他输入设备612。优选地,其他输入设备612可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
存储器620可用于存储软件程序以及数据,处理器630通过运行存储在存储器620的软件程序以及数据,从而执行各种功能应用以及数据处理。存储器620可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器620可包括高速随机存取存储器,还可包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器620还可包括存储器控制器,以提供处理器630和输入单元610对存储器620的访问。
处理器630是计算机设备的控制中心,利用各种接口和线路连接整
个计算机设备的各个部分,通过运行或执行存储在存储器620内的软件程序和/或模块,以及调用存储在存储器620内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。
显示单元640可用于显示由用户输入的信息或提供给用户的信息以及计算机设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元640可包括显示面板,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板。虽然在图示中,输入单元610与显示单元640是作为两个独立的部件来实现输入和输出功能,但是在某些实施例中,可将输入单元610与显示单元640集成而实现输入和输出功能。
通信单元650可用于收发信息或通话过程中,信号的接收和发送,通信单元650可以为RF(Radio Frequency,射频)电路、路由器、调制解调器、等网络通信设备。此外,通信单元650还可通过无线通信与网络和其他设备通信。无线通信可使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA(Code Division Multiple Access,码分多址)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、LTE(Long Term Evolution,长期演进)、电子邮件、SMS(Short Messaging Service,短消息服务)等。
计算机设备还可包括给各个部件供电的电源660(比如电池),优选的,电源可通过电源管理系统与处理器630逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源660还可包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
尽管未示出,计算机设备还可以包括摄像头、蓝牙模块、传感器(比如光传感器、运动传感器以及其他传感器等)、音频电路和无线通信单元等,在此不再赘述。
在本实施例中,计算机设备包括一个或者多个处理器630、存储器
620,以及一个或者多个程序,一个或者多个程序存储于存储器620中,且经配置以由一个或者多个处理器630执行一个或者多个程序包含的用于执行评论信息的标注方法的指令:获取多个事件评论的数据;分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;分别从所述评论观点提取情感词;将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
此外,所述分别从所述评论观点提取情感词的处理包括:分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词,所述情感词词典包括多个情感词以及其情感性质的数据。
另外,所述将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络的处理还包括:对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
此外,所述为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据的处理包括:根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。
进一步地,所述情感词词典还包括所述多个情感词的情感强度的数据,所述为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据的处理还包括:如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
另外,所述分别对所述多个事件评论划分语句的处理还包括:去除字数超过预定句长的语句,和/或去除广告性质的语句。
本发明实施例提供的计算机设备,先对获取到的多个事件评论进行分句得到多个评论观点,进一步以评论观点中包含的情感词作为聚类依
据,从而自动地对事件评论进行情感聚类,并为其标注情感性质,使得用户能够快速知晓舆情的大体,方便用户阅读,极大丰富了用户体验。
在本发明所提供的几个实施例中,应该理解到,所公开的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
Claims (13)
- 一种评论信息的标注方法,其特征在于,所述方法包括:获取多个事件评论的数据;分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;分别从所述评论观点提取情感词;将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
- 根据权利要求1所述的方法,其特征在于,所述分别从所述评论观点提取情感词的处理包括:分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词,所述情感词词典包括多个情感词以及其情感性质的数据。
- 根据权利要求1或2所述的方法,其特征在于,所述将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络的处理还包括:对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
- 根据权利要求1~3中任一项所述的方法,其特征在于,所述为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据的处理包括:根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。
- 根据权利要求2~4中任一项所述的方法,其特征在于,所述情感词词典还包括所述多个情感词的情感强度的数据,所述为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据的处理还包括:如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
- 根据权利要求1~5中任一项所述的方法,其特征在于,所述分别对所述多个事件评论划分语句的处理还包括:去除字数超过预定句长的语句,和/或去除广告性质的语句。
- 一种评论信息的标注装置,其特征在于,所述装置包括:评论获取模块,用于获取多个事件评论的数据;评论分句模块,用于分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;情感词提取模块,用于分别从所述评论观点提取情感词;情感网络构建模块,用于将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;情感标注模块,用于为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
- 根据权利要求7所述的装置,其特征在于,所述情感词提取模块用于分别对所述评论观点的语句进行切词,并且通过将切出的分词与预先构建的情感词词典匹配来选取所述情感词,所述情感词词典包括多个情感词以及其情感性质的数据。
- 根据权利要求7或8所述的装置,其特征在于,所述情感网络构建模块还用于对于所述情感词社区网络中的任意两个关联的情感词,计算所述两个情感词共同出现在同一评论观点中的共现频度,并且如果计算的共现频度的值低于预定的共现频度阈值,则去除所述两个情感词之间的关联关系。
- 根据权利要求7~9中任一项所述的装置,其特征在于,所述情感标注模块用于根据所述情感词社区中的情感词的情感性质为其所属的评论观点标注情感性质的数据。
- 根据权利要求8~10中任一项所述的装置,其特征在于,所述情感词词典还包括所述多个情感词的情感强度的数据,所述情感标注模块还用于如果任一所述评论观点包括不同情感性质的情感词,则为所述评论观点标注情感强度最强的情感词对应的情感性质的数据。
- 根据权利要求7~11中任一项所述的装置,其特征在于,所述评论分句模块还用于去除字数超过预定句长的语句,和/或去除广告性质的语句。
- 一种计算机设备,其特征在于,所述计算机设备包括:一个或多个处理器;存储器;一个或多个程序,所述一个或多个程序存储在所述存储器中,且经配置以由所述一个或者多个处理器执行所述一个或者多个程序包含的用于执行所述评论信息的标注方法的指令:获取多个事件评论的数据;分别对所述多个事件评论划分语句,并且将划分出的语句分别作为评论观点;分别从所述评论观点提取情感词;将在任一评论观点中共同出现的情感词进行关联,构建情感词社区网络;为所述情感词社区网络中的任一情感词社区中的情感词所属的评论观点标注情感性质的数据,所述情感性质是正面、负面或中立,所述情感词社区包括一组直接或间接关联的情感词。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510325108.2A CN104933130A (zh) | 2015-06-12 | 2015-06-12 | 评论信息的标注方法及装置 |
CN201510325108.2 | 2015-06-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016197577A1 true WO2016197577A1 (zh) | 2016-12-15 |
Family
ID=54120297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/097774 WO2016197577A1 (zh) | 2015-06-12 | 2015-12-17 | 评论信息的标注方法、装置和计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104933130A (zh) |
WO (1) | WO2016197577A1 (zh) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684481A (zh) * | 2019-01-04 | 2019-04-26 | 深圳壹账通智能科技有限公司 | 舆情分析方法、装置、计算机设备和存储介质 |
CN110175851A (zh) * | 2019-02-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | 一种作弊行为检测方法及装置 |
CN111126035A (zh) * | 2019-12-24 | 2020-05-08 | 深圳视界信息技术有限公司 | 一种电商评论分析场景下细粒度属性分析方法 |
CN111209371A (zh) * | 2019-12-31 | 2020-05-29 | 新华网股份有限公司 | 评论数据处理方法、装置、计算机设备和存储介质 |
CN111260437A (zh) * | 2020-01-14 | 2020-06-09 | 北京邮电大学 | 一种基于商品方面级情感挖掘和模糊决策的产品推荐方法 |
CN112148878A (zh) * | 2020-09-23 | 2020-12-29 | 网易(杭州)网络有限公司 | 情感数据处理方法及装置 |
CN112417858A (zh) * | 2020-11-23 | 2021-02-26 | 北京明略昭辉科技有限公司 | 一种实体权重评分方法、系统、电子设备及存储介质 |
CN112528133A (zh) * | 2019-09-18 | 2021-03-19 | 北京国双科技有限公司 | 一种网络数据标注方法、装置、设备和存储介质 |
CN112565824A (zh) * | 2020-12-03 | 2021-03-26 | 腾讯科技(深圳)有限公司 | 一种生成弹幕的方法、装置、终端及存储介质 |
KR102244699B1 (ko) * | 2020-06-15 | 2021-04-27 | 주식회사 크라우드웍스 | 인공지능 학습데이터 생성을 위한 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법 |
CN113157899A (zh) * | 2021-05-27 | 2021-07-23 | 东莞心启航联贸网络科技有限公司 | 一种大数据画像分析方法、服务器及可读存储介质 |
CN115209210A (zh) * | 2022-07-19 | 2022-10-18 | 抖音视界有限公司 | 基于弹幕生成信息的方法和装置 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933130A (zh) * | 2015-06-12 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | 评论信息的标注方法及装置 |
CN105824898A (zh) * | 2016-03-14 | 2016-08-03 | 苏州大学 | 一种网络评论的标签提取方法和装置 |
CN106874417A (zh) * | 2017-01-22 | 2017-06-20 | 努比亚技术有限公司 | 一种搜索方法及设备 |
CN107220352B (zh) * | 2017-05-31 | 2020-12-08 | 北京百度网讯科技有限公司 | 基于人工智能构建评论图谱的方法和装置 |
CN107704558A (zh) * | 2017-09-28 | 2018-02-16 | 北京车慧互动广告有限公司 | 一种用户意见抽取方法及系统 |
CN110134938A (zh) * | 2018-02-09 | 2019-08-16 | 优酷网络技术(北京)有限公司 | 评论分析方法及装置 |
CN109271512B (zh) * | 2018-08-29 | 2023-11-24 | 中国平安保险(集团)股份有限公司 | 舆情评论信息的情感分析方法、装置及存储介质 |
CN109739947A (zh) * | 2018-12-26 | 2019-05-10 | 广东工业大学 | 一种数据处理装置、方法、电子设备和存储介质 |
CN111027328B (zh) * | 2019-11-08 | 2024-03-26 | 广州坚和网络科技有限公司 | 通过语料训练判断评论情绪正负及感情色彩的方法 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294230A1 (en) * | 2006-05-31 | 2007-12-20 | Joshua Sinel | Dynamic content analysis of collected online discussions |
CN101408883A (zh) * | 2008-11-24 | 2009-04-15 | 电子科技大学 | 一种网络舆情观点收集方法 |
CN101609459A (zh) * | 2009-07-21 | 2009-12-23 | 北京大学 | 一种情感特征词提取系统 |
US20110131485A1 (en) * | 2009-11-30 | 2011-06-02 | International Business Machines Corporation | Publishing specified content on a webpage |
CN102375838A (zh) * | 2010-08-17 | 2012-03-14 | 富士通株式会社 | 用于构建极性词素数据库以及确定词的极性的方法和装置 |
CN102999485A (zh) * | 2012-11-02 | 2013-03-27 | 北京邮电大学 | 一种基于公众汉语网络文本的现实情感分析方法 |
CN103699626A (zh) * | 2013-12-20 | 2014-04-02 | 华南理工大学 | 一种微博用户个性化情感倾向分析方法及系统 |
CN104933130A (zh) * | 2015-06-12 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | 评论信息的标注方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102236650B (zh) * | 2010-04-20 | 2014-06-04 | 日电(中国)有限公司 | 用于修正和/或扩展情感词典的方法和装置 |
US9292589B2 (en) * | 2012-09-04 | 2016-03-22 | Salesforce.Com, Inc. | Identifying a topic for text using a database system |
CN103150367B (zh) * | 2013-03-07 | 2016-01-20 | 宁波成电泰克电子信息技术发展有限公司 | 一种中文微博的情感倾向分析方法 |
CN104484437B (zh) * | 2014-12-24 | 2018-07-20 | 福建师范大学 | 一种网络短评情感挖掘方法 |
-
2015
- 2015-06-12 CN CN201510325108.2A patent/CN104933130A/zh active Pending
- 2015-12-17 WO PCT/CN2015/097774 patent/WO2016197577A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294230A1 (en) * | 2006-05-31 | 2007-12-20 | Joshua Sinel | Dynamic content analysis of collected online discussions |
CN101408883A (zh) * | 2008-11-24 | 2009-04-15 | 电子科技大学 | 一种网络舆情观点收集方法 |
CN101609459A (zh) * | 2009-07-21 | 2009-12-23 | 北京大学 | 一种情感特征词提取系统 |
US20110131485A1 (en) * | 2009-11-30 | 2011-06-02 | International Business Machines Corporation | Publishing specified content on a webpage |
CN102375838A (zh) * | 2010-08-17 | 2012-03-14 | 富士通株式会社 | 用于构建极性词素数据库以及确定词的极性的方法和装置 |
CN102999485A (zh) * | 2012-11-02 | 2013-03-27 | 北京邮电大学 | 一种基于公众汉语网络文本的现实情感分析方法 |
CN103699626A (zh) * | 2013-12-20 | 2014-04-02 | 华南理工大学 | 一种微博用户个性化情感倾向分析方法及系统 |
CN104933130A (zh) * | 2015-06-12 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | 评论信息的标注方法及装置 |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684481A (zh) * | 2019-01-04 | 2019-04-26 | 深圳壹账通智能科技有限公司 | 舆情分析方法、装置、计算机设备和存储介质 |
CN110175851A (zh) * | 2019-02-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | 一种作弊行为检测方法及装置 |
CN110175851B (zh) * | 2019-02-28 | 2023-09-12 | 腾讯科技(深圳)有限公司 | 一种作弊行为检测方法及装置 |
CN112528133A (zh) * | 2019-09-18 | 2021-03-19 | 北京国双科技有限公司 | 一种网络数据标注方法、装置、设备和存储介质 |
CN111126035A (zh) * | 2019-12-24 | 2020-05-08 | 深圳视界信息技术有限公司 | 一种电商评论分析场景下细粒度属性分析方法 |
CN111209371A (zh) * | 2019-12-31 | 2020-05-29 | 新华网股份有限公司 | 评论数据处理方法、装置、计算机设备和存储介质 |
CN111209371B (zh) * | 2019-12-31 | 2024-06-07 | 新华网股份有限公司 | 评论数据处理方法、装置、计算机设备和存储介质 |
CN111260437B (zh) * | 2020-01-14 | 2023-07-11 | 北京邮电大学 | 一种基于商品方面级情感挖掘和模糊决策的产品推荐方法 |
CN111260437A (zh) * | 2020-01-14 | 2020-06-09 | 北京邮电大学 | 一种基于商品方面级情感挖掘和模糊决策的产品推荐方法 |
KR102244699B1 (ko) * | 2020-06-15 | 2021-04-27 | 주식회사 크라우드웍스 | 인공지능 학습데이터 생성을 위한 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법 |
CN112148878A (zh) * | 2020-09-23 | 2020-12-29 | 网易(杭州)网络有限公司 | 情感数据处理方法及装置 |
CN112417858A (zh) * | 2020-11-23 | 2021-02-26 | 北京明略昭辉科技有限公司 | 一种实体权重评分方法、系统、电子设备及存储介质 |
CN112565824A (zh) * | 2020-12-03 | 2021-03-26 | 腾讯科技(深圳)有限公司 | 一种生成弹幕的方法、装置、终端及存储介质 |
CN113157899B (zh) * | 2021-05-27 | 2022-01-14 | 叉烧(上海)新材料科技有限公司 | 一种大数据画像分析方法、服务器及可读存储介质 |
CN113157899A (zh) * | 2021-05-27 | 2021-07-23 | 东莞心启航联贸网络科技有限公司 | 一种大数据画像分析方法、服务器及可读存储介质 |
CN115209210A (zh) * | 2022-07-19 | 2022-10-18 | 抖音视界有限公司 | 基于弹幕生成信息的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN104933130A (zh) | 2015-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016197577A1 (zh) | 评论信息的标注方法、装置和计算机设备 | |
WO2021139701A1 (zh) | 一种应用推荐方法、装置、存储介质及电子设备 | |
CN104584003B (zh) | 词检测和域字典推荐 | |
TWI729472B (zh) | 特徵詞的確定方法、裝置和伺服器 | |
EP3183728B1 (en) | Orphaned utterance detection system and method | |
WO2020125445A1 (zh) | 分类模型训练方法、分类方法、设备及介质 | |
CN110162644B (zh) | 一种图像集建立方法、装置和存储介质 | |
CN111783468B (zh) | 文本处理方法、装置、设备和介质 | |
WO2015185019A1 (zh) | 一种基于语义理解的表情输入方法和装置 | |
Guo et al. | LD-MAN: Layout-driven multimodal attention network for online news sentiment recognition | |
KR101911999B1 (ko) | 피처 기반 후보 선택 기법 | |
CN104978332B (zh) | 用户生成内容标签数据生成方法、装置及相关方法和装置 | |
US20140067818A1 (en) | Pushing specific content to a predetermined webpage | |
JP2021131528A (ja) | ユーザ意図認識方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム | |
JP2021101361A (ja) | イベントトピックの生成方法、装置、機器及び記憶媒体 | |
CN108509569A (zh) | 企业画像的生成方法、装置、电子设备以及存储介质 | |
CN102779149A (zh) | 信息处理装置,信息处理方法,程序和信息处理系统 | |
CN110413787A (zh) | 文本聚类方法、装置、终端和存储介质 | |
CN113076735B (zh) | 目标信息的获取方法、装置和服务器 | |
CN111177462B (zh) | 视频分发时效的确定方法和装置 | |
CN111753089A (zh) | 话题聚类方法、装置、电子设备及存储介质 | |
CN106663123B (zh) | 以评论为中心的新闻阅读器 | |
CN112270173B (zh) | 文本中的人物挖掘方法、装置、电子设备及存储介质 | |
CN111385188A (zh) | 对话元素的推荐方法、装置、电子设备和介质 | |
EP4080381A1 (en) | Method and apparatus for generating patent summary information, and electronic device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15894831 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15894831 Country of ref document: EP Kind code of ref document: A1 |