TW201812615A - Sentiment orientation recognition method, object classification method and data processing system - Google Patents

Sentiment orientation recognition method, object classification method and data processing system Download PDF

Info

Publication number
TW201812615A
TW201812615A TW106123845A TW106123845A TW201812615A TW 201812615 A TW201812615 A TW 201812615A TW 106123845 A TW106123845 A TW 106123845A TW 106123845 A TW106123845 A TW 106123845A TW 201812615 A TW201812615 A TW 201812615A
Authority
TW
Taiwan
Prior art keywords
processed
short text
category
sentiment
text
Prior art date
Application number
TW106123845A
Other languages
Chinese (zh)
Inventor
潘林林
趙爭超
林君
肖謙
張一昌
Original Assignee
阿里巴巴集團服務有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集團服務有限公司 filed Critical 阿里巴巴集團服務有限公司
Publication of TW201812615A publication Critical patent/TW201812615A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Provided by the present application are a sentiment orientation recognition method, object classification method and data processing system. A sentiment degree estimation model, constructed by the present invention in the sentiment ornamentation recognition method, fully considers the category to which a short text belongs. Therefore, sentiment orientation is determined more accurately on the basis of the sentiment degree estimation model. In addition, since the object classification method provided by the present application use text feature information, image feature information and other feature information of an object as the basis of object classification, the object classification method provided by the present application may simultaneously give consideration to the text feature information, image feature information and other feature information, thereby being able to improve classification accuracy.

Description

情感傾向的識別方法、對象分類方法及資料處理系統    Identification method of emotional tendency, object classification method and data processing system   

本發明關於資料處理技術領域,尤其關於情感傾向的識別方法、對象分類方法及資料處理系統。 The present invention relates to the field of data processing technology, and in particular, to a method for identifying emotional tendencies, a method for classifying objects, and a data processing system.

目前,在很多技術領域都涉及對對象進行分類的問題,通常而言,依據對象的文本對對象進行分類,將對象分為兩個類別:第一類別或第二類別。在對象的文本中,按標點符號可以將文本為多個短文本。 At present, the problem of classifying objects is involved in many technical fields. Generally speaking, objects are classified according to the text of the objects, and the objects are classified into two categories: the first category or the second category. In the text of an object, press punctuation to turn the text into multiple short texts.

由於漢字的詞義豐富,在不同的語境下相同的短文本可能對應不同的類別。例如,以對象為衣服用戶評價文本為例,第一條用戶評價為“衣服顏色暗淡,剛好”,第二條用戶評價為“衣服顏色暗淡,不鮮亮”。上述兩個對象具有相同的短文本“衣服顏色暗淡”。若按文本進行分類,則會將兩個短文本歸為一類,可是兩者理應對應不同的類別。 Due to the rich meaning of Chinese characters, the same short text may correspond to different categories in different contexts. For example, taking the object as the user's evaluation text of the clothes as an example, the first user evaluates as "dim clothes color, just right", and the second user evaluates as "dim clothes color, not bright". The two objects mentioned above have the same short text "Dark clothes". If classified by text, two short texts will be classified into one category, but the two should correspond to different categories.

可以看出在不同語境中,第一條用戶評價中的“衣服顏色暗淡”對應正面情感,理應分為第一類別;第二條用戶評價中的“衣服顏色暗淡”對應負面情感,理應分為第二類別。因此,目前通常利用短文本對應的情感傾向來確 定對象的類別。 It can be seen that in different contexts, the "dim clothes color" in the first user evaluation corresponds to positive emotions and should be divided into the first category; the "dim clothes color" in the second user evaluation corresponds to negative emotions and should be divided For the second category. Therefore, the emotional tendency corresponding to short text is usually used to determine the category of objects.

為了確定短文本的情感傾向,傳統方式通常為人工查看並確定短文本的情感傾向。雖然人工標注確定短文本的情感傾向的準確率較高,但是效率較低,無法適用於批量短文本的處理。 In order to determine the emotional tendency of short texts, the traditional way is to manually view and determine the emotional tendency of short texts. Although the accuracy of manual annotation to determine the sentiment of short text is high, it is not efficient and cannot be applied to the processing of batch short texts.

本發明的申請人在研究過程中發現:可以利用處理器自動識別短文本的情感傾向。具體實現過程可以為:在處理器具體執行之前,先構建情感詞庫。情感詞庫包含很多正面詞匯,例如,“衣服”“屏幕大”“漂亮”、“快速”、“合適”、“美麗”等,情感詞庫也包含很多負面詞匯,例如,“衣服”“難看”、“慢速”、“屏幕小”等。 The applicant of the present invention discovered during the research process that the processor can automatically recognize the emotional tendency of short text using a processor. The specific implementation process can be as follows: before the processor executes, the emotional lexicon is constructed. The emotional lexicon contains many positive words, such as "clothing", "large screen", "pretty", "quick", "suitable", "beautiful", etc. The emotional lexicon also contains many negative words, such as "clothing", "ugly "," Slow "," Small screen ", etc.

為了對待處理對象進行處理,首先對待處理對象按標點符號進行切分,相鄰兩個標點符號之間為一個短文本,從而將待處理對象切分為若干個待處理短文本。例如,以“衣服很合適,老媽很喜歡”為例,按照標點符號切分後,可以獲得兩個短文本“衣服很合適”和“老媽很喜歡”。待處理對象的每個短文本,均為待處理短文本。 In order to process the object to be processed, first, the object to be processed is divided according to punctuation marks, and a short text is arranged between two adjacent punctuation marks, so that the object to be processed is divided into several short texts to be processed. For example, taking "the clothes are suitable and the mother likes it" as an example, after dividing according to the punctuation marks, you can get two short texts "the clothes are suitable" and "the mother likes it". Each short text of the object to be processed is a short text to be processed.

參見圖1,為處理器確定待處理短文本的情感傾向的流程圖,執行過程具體包括以下步驟: Referring to FIG. 1, a flowchart for determining a sentiment tendency of a short text to be processed by a processor, the execution process specifically includes the following steps:

步驟1:處理器對待處理短文本進行分詞,獲得分詞結果。 Step 1: The processor performs segmentation on the short text to be processed, and obtains the segmentation result.

按照預設分詞規則,將待處理短文本分為若干詞語,若干詞語均為分詞結果。 According to the preset word segmentation rules, the short text to be processed is divided into several words, and some words are the result of word segmentation.

例如,以待處理短文本為“衣服很合適”為例,在分詞後獲得的結果為“衣服”、“很”和“合適”。以待處理短文本為“手機屏幕很大”,則分詞後獲得的分詞結果為“手機”、“屏幕”、“很”和“大”。 For example, taking the short text to be processed as "suitable for clothing" as an example, the results obtained after segmentation are "suitable", "very", and "suitable". Taking the short text to be processed as "the mobile phone screen is very large", the segmentation results obtained after the segmentation are "mobile phone", "screen", "very" and "large".

由於對待處理短文本進行分詞,並不是本發明關注的重點,在此不再詳細描述預設分詞規則的具體實現方式。 Since segmentation of short texts to be processed is not the focus of the present invention, the specific implementation of the preset segmentation rules will not be described in detail here.

步驟2:將分詞結果與情感詞庫,按情感匹配規則進行匹配。 Step 2: Match the word segmentation results to the sentiment lexicon and match them according to the sentiment matching rules.

步驟3:確定與待處理短文本對應的情感傾向。 Step 3: Determine the emotional tendency corresponding to the short text to be processed.

將分詞結果、與情感詞庫和情感規則進行匹配,若分詞結果中分詞均對應正面情感且不包含否定詞,則確定短文本對應正面情感。若分詞結果中情感詞均對應負面情感且不包含否定詞,則確定短文本對應負面情感。 Match the segmentation result with the sentiment lexicon and sentiment rules. If the segmentation results in the segmentation result correspond to positive emotions and do not contain negative words, then it is determined that the short text corresponds to positive emotions. If the emotional words in the segmentation result all correspond to negative emotions and do not contain negative words, then it is determined that the short text corresponds to negative emotions.

處理器可以自動執行圖1所示的過程,從而可以自動確定待處理短文本的情感傾向。但是,本發明申請人在研究過程中發現:雖然上述自動處理過程在一定程度上可以識別待處理短文本的情感傾向,但是,上述處理過程獲得的待處理短文本的情感傾向可能不準確。 The processor can automatically execute the process shown in FIG. 1, so that the emotional tendency of the short text to be processed can be automatically determined. However, the applicant of the present invention found during research that although the above-mentioned automatic processing process can recognize the emotional tendency of the short text to be processed to a certain extent, the emotional tendency of the short text to be processed obtained by the above processing process may be inaccurate.

例如,以對象為淘寶網上的用戶評價為例,由於淘寶網上有很多類目(例如服飾類目、電子設備類目、母嬰類目等),各個類目的物品均有相應的用戶評價。申請人在研究過程中發現:在不同類目下包含相同情感詞的短文本 可能對應不同的情感傾向。 For example, taking the user evaluation on Taobao as an example, because Taobao has many categories (such as clothing category, electronic equipment category, mother and baby category, etc.), each category has a corresponding item. User reviews. The applicant found during research that short texts containing the same sentiment words in different categories may correspond to different sentiment tendencies.

比如,在電子設備類目下、一個短文本為“屏幕很大”,該短文本的情感傾向為正面情感。在服飾類目下、一個短文本為“衣服很大”,該短文本的情感傾向為負面情感。從上述舉例可以看出,在兩個不同類目下、兩個短文本均有“很大”,所以兩個短文本包含相同的情感詞,但是這兩個短文本卻具有不同的情感傾向。 For example, under the category of electronic devices, a short text is "large screen", and the short text's emotional tendency is positive emotion. Under the clothing category, a short text is "big clothes", and the short text's emotional tendency is negative emotion. From the above examples, it can be seen that under two different categories, the two short texts are "large", so the two short texts contain the same emotional words, but the two short texts have different emotional tendencies.

由於上述圖1中處理器自動確定短文本的情感傾向的過程中,處理器針對所有對象均採用同樣的處理方式,即現有的處理過程沒有從對象類目的角度、來分別處理短文本的情感傾向,所以,現有技術中確定短文本的情感傾向不準確。 Because the processor in FIG. 1 automatically determines the emotional tendency of the short text, the processor adopts the same processing method for all objects, that is, the existing processing process does not separately process the emotional tendency of the short text from the perspective of the object category. Therefore, it is inaccurate to determine the emotional tendency of the short text in the prior art.

因此,本發明提供一種情感傾向的識別方法,以便可以準確確定待處理短文本的情感傾向。 Therefore, the present invention provides a method for identifying an emotional tendency, so that the emotional tendency of a short text to be processed can be accurately determined.

為了實現上述目的,本發明提供了以下技術特徵:一種情感傾向的識別方法,包括:確定待處理短文本對應類目標識;其中,一個文本相鄰兩個標點符號之間文字稱為短文本;確定與所述類目標識對應的情感度估測模型的實現方式;若所述情感度估測模型的實現方式為所有類目對應一個情感度估測模型,則確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:所述待處理短文本的分詞和所述待處理短文本所屬的類目標識;依據預 先訓練的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型包括:依據至少兩種類目的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向;若所述情感度估測模型的實現方式為一個類目對應一個情感度估測模型,確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:所述待處理短文本的分詞;依據與所述類目標識對應的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型為:依據所述類目標識對應的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。 In order to achieve the above object, the present invention provides the following technical features: a method for identifying emotional tendencies, including: determining a category identifier corresponding to short text to be processed; wherein a text between two punctuation marks adjacent to a text is called a short text; Determine the implementation of the sentiment estimation model corresponding to the category identifier; if the implementation of the sentiment estimation model is that all categories correspond to one sentiment estimation model, determine the characteristics corresponding to the short text to be processed Each feature in the feature set includes: a segmentation of the short text to be processed and a category identifier to which the short text to be processed belongs; based on a pre-trained sentiment estimation model, combined with the short text to be processed Feature set to estimate the sentiment degree of the short text to be processed; wherein, the sentiment estimation model includes: outputting a positive sentiment degree obtained after training according to at least two types of short text samples with emotional tendencies And negative emotion degree model; based on the positive emotion degree and negative emotion degree corresponding to the short text to be processed, determining the The sentiment tendency corresponding to the short text to be processed; if the implementation of the sentiment estimation model is a category corresponding to an sentiment estimation model, the feature set corresponding to the short text to be processed is determined; The features include: the word segmentation of the short text to be processed; the sentiment estimation of the short text to be processed according to the sentiment estimation model corresponding to the category identifier, and the feature set of the short text to be processed; The sentiment estimation model is: a model that outputs positive sentiment and negative sentiment after training according to the short text samples corresponding to the category identifier and sentimental short text samples; based on the short text to be processed The corresponding positive emotion degree and negative emotion degree determine the emotional tendency corresponding to the short text to be processed.

較佳的,在確定所述待處理短文本對應的情感傾向後,還包括:輸出所述待處理短文本對應的情感傾向。 Preferably, after determining the emotional tendency corresponding to the short text to be processed, the method further includes: outputting the emotional tendency corresponding to the short text to be processed.

一種情感傾向的識別方法,包括:確定待處理短文本對應的特徵集合;其中,一個文本相鄰兩個標點符號之間的文字稱為短文本;所述特徵集合中每個特徵包括:所述待處理短文本的分詞和所述待處理短文本所屬的類目標識; 依據預先訓練的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型包括:依據至少兩種類目的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。 A method for identifying an emotional tendency includes: determining a feature set corresponding to short text to be processed; wherein a text between two punctuation marks adjacent to a text is called short text; each feature in the feature set includes: the The word segmentation of the short text to be processed and the category identifier to which the short text to be processed belong; according to a pre-trained sentiment estimation model and the feature set of the short text to be processed, the sentiment estimation of the short text to be processed; The sentiment estimation model includes: a model outputting positive sentiment and negative sentiment obtained after training according to at least two types of short text samples with emotional tendencies; and based on the corresponding short text to be processed The positive emotion degree and the negative emotion degree determine the emotion tendency corresponding to the short text to be processed.

較佳的,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果;將所述分詞結果中的各個分詞和所述類目標識進行組合,獲得各個特徵;將各個特徵的集合,確定為所述待處理短文本的特徵集合。 Preferably, the determining a feature set corresponding to the short text to be processed includes: obtaining a category identifier corresponding to the short text to be processed, and a segmentation result obtained after performing a word segmentation operation on the short text to be processed; Each word segmentation result in the word segmentation result is combined with the category identifier to obtain each feature; the set of each feature is determined as the feature set of the short text to be processed.

較佳的,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果;將所述分詞結果中的各個分詞和所述類目標識進行組合,獲得各個特徵;利用n元語言模型對所述各個特徵進行特徵組合,獲得若干個組合特徵;將各個特徵和所述若干個組合特徵的集合,確定為所 述待處理短文本的特徵集合。 Preferably, the determining a feature set corresponding to the short text to be processed includes: obtaining a category identifier corresponding to the short text to be processed, and a segmentation result obtained after performing a word segmentation operation on the short text to be processed; Each segmentation in the segmentation result is combined with the category identifier to obtain each feature; the n-ary language model is used to combine the features to obtain several combined features; each feature and the number of the combined features are The set is determined as a feature set of the short text to be processed.

較佳的,所述利用n元語言模型對所述各個特徵進行特徵組合,獲得若干個組合特徵,包括:利用二元語言模型對所述各個特徵進行特徵組合,獲得若干個組合特徵。 Preferably, the use of the n-ary language model to combine the features to obtain several combined features includes: using a binary language model to combine the features to obtain several combined features.

較佳的,所述依據預先訓練的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測,包括:將所述特徵集合輸入至所述情感度估測模型;由所述情感度估測模型估算後、輸出待處理短文本對應的正面情感度和負面情感度。 Preferably, the emotion degree estimation model based on the pre-trained emotion degree estimation combined with the feature set of the short text to be processed, and the emotion degree estimation of the short text to be processed includes inputting the feature set to the emotion degree estimation. Model; after the estimation by the sentiment estimation model, the positive sentiment and the negative sentiment corresponding to the short text to be processed are output.

較佳的,所述基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向,包括:確定所述正面情感度和所述負面情感度兩者中的較大情感度;判斷所述較大情感度是否大於預設置信度;若所述較大情感度大於預設置信度,則確定所述待處理短文本對應的情感傾向與所述較大情感度的情感傾向一致。 Preferably, determining the emotional tendency corresponding to the short text to be processed based on the positive emotion level and the negative emotion level corresponding to the short text to be processed includes: determining both the positive emotion level and the negative emotional level. The larger emotion degree of the person; judging whether the larger emotion degree is greater than the preset reliability; if the larger emotion degree is greater than the preset reliability, determining the emotional tendency corresponding to the short text to be processed and the The emotional tendencies of larger emotions are consistent.

較佳的,所述情感度估測模型包括:利用最大熵模型,依據至少兩個類目標識對應的若干個短文本的特徵集合訓練後得到的、輸出正面情感度和負面情感度的模型。 Preferably, the sentiment estimation model includes a model that outputs maximum positive sentiment and negative sentiment by using a maximum entropy model and trained according to a feature set of several short texts corresponding to at least two category identifiers.

較佳的,在確定所述待處理短文本對應的情感傾向後,還包括:輸出所述待處理短文本對應的情感傾向。 Preferably, after determining the emotional tendency corresponding to the short text to be processed, the method further includes: outputting the emotional tendency corresponding to the short text to be processed.

一種情感傾向的識別方法,包括:確定待處理短文本對應的特徵集合和類目標識;其中,一個文本相鄰兩個標點符號之間的文字稱為短文本;所述特徵集合中每個特徵包括:所述待處理短文本的分詞;依據與所述類目標識對應的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型為:依據所述類目標識對應的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。 A method for identifying emotional tendency includes: determining a feature set and a category identifier corresponding to short text to be processed; wherein a text between two punctuation marks adjacent to a text is called short text; each feature in the feature set The method includes the word segmentation of the short text to be processed, an emotion level estimation model based on the sentiment degree estimation model corresponding to the category identifier, and a feature set of the short text to be processed, and the emotion. The degree estimation model is: a model that outputs positive sentiment and negative sentiment after training according to the short text samples corresponding to the category identifier and several short text samples with emotional tendencies; based on the corresponding short text to be processed The positive emotion degree and the negative emotion degree determine the emotion tendency corresponding to the short text to be processed.

較佳的,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本執行分詞操作後獲得的分詞結果;利用n元語言模型對各個分詞進行分詞組合,獲得若干個組合分詞;將各個分詞和若干個組合分詞的集合,確定為所述待處理短文本的特徵集合,一個分詞對應一個特徵。 Preferably, the determining a feature set corresponding to the short text to be processed includes: obtaining a segmentation result obtained after performing the word segmentation operation on the short text to be processed; using an n-gram language model to perform segmentation and combination on each segmentation to obtain several combinations Word segmentation; a set of each word segmentation and several combined word segmentation is determined as a feature set of the short text to be processed, and one word segmentation corresponds to one feature.

較佳的,所述確定待處理短文本對應的特徵集合,包 括:獲取所述待處理短文本執行分詞操作後獲得的分詞結果;將所述分詞結果,確定為所述待處理短文本的特徵集合,一個分詞對應一個特徵。 Preferably, the determining a feature set corresponding to the short text to be processed includes: obtaining a segmentation result obtained after performing the word segmentation operation on the short text to be processed; and determining the segmentation result as a feature of the short text to be processed Set, one participle corresponds to one feature.

較佳的,在確定所述待處理短文本對應的情感傾向後,還包括:輸出所述待處理短文本對應的情感傾向。 Preferably, after determining the emotional tendency corresponding to the short text to be processed, the method further includes: outputting the emotional tendency corresponding to the short text to be processed.

一種情感傾向的識別系統,包括:資料提供設備,用於發送若干個對象;處理器,用於接收所述資料提供設備送的若干個對象,依據若干個對象的短文本構建情感度估測模型,並利用情感度估測模型確定待處理短文本的情感傾向。 An emotion tendency recognition system includes: a data providing device for sending a plurality of objects; a processor for receiving a plurality of objects sent by the data providing device, and constructing a sentiment estimation model based on short texts of the plurality of objects , And use the sentiment estimation model to determine the sentiment tendency of short text to be processed.

較佳的,所述處理器,還用於構建情感度估測模型與對象所屬的類目標識的對應關係。 Preferably, the processor is further configured to construct a correspondence relationship between the sentiment estimation model and the category identifier to which the object belongs.

較佳的,所述系統還包括接收設備;所述處理器,還用於輸出所述待處理文本的情感傾向;所述接收設備,用於接收所述待處理文本的情感傾向。 Preferably, the system further includes a receiving device; the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the emotional tendency of the text to be processed.

一種情感傾向的識別系統,包括:資料提供設備,用於發送若干個對象;模型構建設備,用於接收所述資料提供設備送的若干個對象,依據若干個對象的短文本構建情感度估測模型, 並發送所述情感度估測模型;處理器,用於接收所述情感度估測模型,並利用情感度估測模型確定待處理短文本的情感傾向。 An emotional tendency recognition system includes: a data providing device for sending several objects; a model building device for receiving several objects sent by the data providing device; and constructing an emotion degree estimation based on short texts of the objects A model, and sends the sentiment estimation model; a processor is configured to receive the sentiment estimation model and use the sentiment estimation model to determine an emotional tendency of a short text to be processed.

較佳的,所述模型構建設備,還用於構建情感度估測模型與對象所屬的類目標識的對應關係,並將對應關係發送至所述處理器。 Preferably, the model construction device is further configured to construct a correspondence between the sentiment estimation model and a category identifier to which the object belongs, and send the correspondence to the processor.

較佳的,所述系統還包括接收設備;所述處理器,還用於輸出所述待處理文本的情感傾向;所述接收設備,用於接收所述待處理文本的情感傾向。 Preferably, the system further includes a receiving device; the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the emotional tendency of the text to be processed.

一種對象分類方法,包括:確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據預先訓練的類別識別模型,對所述待處理對象的特徵資訊進行類別識別;其中,所述類別識別模型為:依據若干對象樣本的特徵資訊訓練後得到的、第一類別和第二類別的分類器。 An object classification method includes: determining feature information of an object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature information includes an emotional tendency of a short text; according to a pre-trained category The recognition model performs category recognition on the feature information of the object to be processed. The category recognition model is a classifier of the first category and the second category, which is obtained by training based on the feature information of a plurality of object samples.

較佳的,所述特徵資訊還包括:構建所述對象的第一主體的特徵資訊;和/或,所述對象所附屬於第二主體的特徵資訊。 Preferably, the feature information further includes: feature information of a first subject constructing the object; and / or, feature information of the second subject attached to the object.

較佳的,所述依據預先訓練的類別識別模型,對所述特徵資訊進行類別識別,包括: 將所述特徵資訊輸入至所述類別識別模型;確定所述待處理對象對應的第一類別匹配度和第二類別匹配度;對所述第一類別匹配度和第二類別匹配度進行比較;若第一類別匹配度大於第二類別匹配度,則確定所述待處理對象的類別為第一類別;若第二類別匹配度大於第一類別匹配度,則確定所述待處理對象的類別為第二類別。 Preferably, classifying the feature information based on a pre-trained class recognition model includes: inputting the feature information to the class recognition model; determining a first class match corresponding to the object to be processed The degree of matching with the second category; comparing the degree of matching with the first category and the degree of matching with the second category; if the degree of matching with the first category is greater than the degree with matching the second category, determining that the category of the object to be processed is the first Category; if the second category matching degree is greater than the first category matching degree, it is determined that the category of the object to be processed is the second category.

較佳的,還包括:在確定所述待處理對象為第一類別之後,將所述待處理對象添加至對象集合中;發送所述對象集合中的對象。 Preferably, the method further includes: after determining that the object to be processed is a first category, adding the object to be processed to an object set; and sending the objects in the object set.

較佳的,還包括:接收多個對象樣本,所述對象樣本來源於所述對象集合,且,滿足預設規則;將所述多個對象樣本,添加至訓練類別識別模型的已有對象樣本中;基於更新後的已有對象樣本,重新訓練類別識別模型。 Preferably, the method further includes: receiving a plurality of object samples, the object samples originating from the object set, and satisfying a preset rule; adding the plurality of object samples to an existing object sample of the training category recognition model Medium; based on the updated existing object samples, the class recognition model is retrained.

一種用戶評價的分類方法,包括:確定待處理用戶評價的特徵資訊;其中,所述特徵資訊包括用戶評價的文本特徵資訊、用戶評價的圖像特徵資訊、賣家的特徵資訊和買家的特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據預先訓練的梯度提升決策樹模型,對所述待處理 用戶評價的特徵資訊進行類別識別;其中,所述類別識別模型為:依據若干用戶評價樣本的特徵資訊訓練後得到的、第一類用戶評價和第二類用戶評價的分類器。 A method for classifying user evaluations includes: determining characteristic information of user evaluations to be processed; wherein the characteristic information includes text characteristic information of user evaluations, image characteristic information of user evaluations, characteristic information of sellers, and characteristic information of buyers Moreover, the text feature information includes the sentiment tendency of short text; class recognition is performed on the feature information evaluated by the user to be processed according to a pre-trained gradient promotion decision tree model; wherein the category recognition model is based on: Classifier of the first type of user evaluation and the second type of user evaluation obtained after training the feature information of the user evaluation sample.

較佳的,還包括:在確定所述待處理用戶評價為第一類用戶評價之後,將所述待處理用戶評價添加至第一類用戶評價集合中;發送所述第一類用戶評價集合。 Preferably, the method further includes: after determining that the to-be-processed user evaluation is a first-type user evaluation, adding the to-be-processed user evaluation to a first-type user evaluation set; and sending the first-type user evaluation set.

較佳的,還包括:接收多個第一類用戶評價,所述第一類用戶評價來源於所述第一類用戶評價集合;將所述多個第一類用戶評價,添加至類別識別模型已有的用戶評價樣本中;基於更新後的已有的用戶評價樣本,重新訓練類別識別模型。 Preferably, the method further includes: receiving a plurality of first-type user evaluations, the first-type user evaluations originating from the first-type user evaluation set; and adding the plurality of first-type user evaluations to a category recognition model Among the existing user evaluation samples; based on the updated existing user evaluation samples, the class recognition model is retrained.

一種對象分類系統,包括:資料提供設備,用於發送若干個對象;處理器,用於接收所述資料提供設備送的若干個對象,依據若干對象的特徵資訊訓練後得到、輸出第一類別和第二類別的類別識別模型;用於確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據所述類別識別模型,對所述待處理對象的特徵資訊進行類別識別;還用於輸出第一類別的對象;資料接收設備,用於接收並使用所述第一類別的對 象。 An object classification system includes: a data providing device for sending a plurality of objects; a processor for receiving a plurality of objects sent by the data providing device, and obtaining and outputting a first category and A category recognition model of the second category; used to determine feature information of an object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature information includes short-text emotional tendencies; The category recognition model performs category recognition on the feature information of the object to be processed; it is also used to output objects of the first category; and a data receiving device for receiving and using the objects of the first category.

一種對象分類系統,包括:資料提供設備,用於發送若干個對象;模型構建設備,用於接收所述資料提供設備送的若干個對象,依據若干個對象的特徵資訊訓練後得到、輸出第一類別和第二類別的類別識別模型,並發送所述類別識別模型;處理器,用於接收所述類別識別模型,並確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據所述類別識別模型,對所述待處理對象的特徵資訊進行類別識別;還用於輸出第一類別的對象;資料接收設備,用於接收並使用所述第一類別的對象。 An object classification system includes: a data providing device for sending a plurality of objects; a model building device for receiving a plurality of objects sent by the data providing device, and obtaining and outputting a first number after training based on characteristic information of the plurality of objects A category recognition model of a category and a second category, and send the category recognition model; a processor, configured to receive the category recognition model and determine feature information of an object to be processed; wherein the feature information includes text feature information and Image feature information, and the text feature information includes an emotional tendency of a short text; class recognition is performed on the feature information of the object to be processed according to the category recognition model; and is also used to output the object of the first category; data A receiving device for receiving and using the objects of the first category.

通過以上技術手段,可以實現以下有益效果:本發明提供一種情感傾向的識別方法,本方法利用與類目對應的若干個帶有情感傾向的短文本作為訓練樣本,獲取短文本的特徵集合進行訓練,並獲得情感度估測模型。由於每個特徵包含短文本的分詞和類目標識,所以,申請構建的情感度估測模型充分考慮了短文本所屬的類目。因此,基於情感度估測模型確定出的待處理短文本的情感傾向也更加準確。 Through the above technical means, the following beneficial effects can be achieved: The present invention provides a method for identifying emotional tendencies. This method uses several short texts with emotional tendencies corresponding to the category as training samples, and obtains the feature set of short texts for training. , And get a sentiment estimation model. Since each feature contains the word segmentation and category identification of the short text, the sentiment estimation model constructed by the application fully considers the category to which the short text belongs. Therefore, the sentiment tendency of the short text to be processed determined based on the sentiment estimation model is also more accurate.

100‧‧‧資料提供設備 100‧‧‧ Information Providing Equipment

200‧‧‧處理器 200‧‧‧ processor

300‧‧‧模型構建設備 300‧‧‧model building equipment

為了更清楚地說明本發明實施例或現有技術中的技術方案,下面將對實施例或現有技術描述中所需要使用的附圖作簡單地介紹,顯而易見地,下面描述中的附圖僅僅是本發明的一些實施例,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些附圖獲得其他的附圖。 In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only the present invention. For some embodiments of the invention, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.

圖1為現有技術確定待處理短文本的情感傾向的流程圖;圖2a-2b為本發明實施例提供的情感傾向的識別系統的結構示意圖;圖3a-3c為本發明實施例提供的情感度估測模型與類目的對應關係的示意圖;圖4a-4c為本發明實施例提供的構建情感度估測模型的流程圖;圖5為本發明實施例提供的又一構建情感度估測模型的流程圖;圖6a-6b為本發明實施例提供的又一構建情感度估測模型的流程圖;圖7為本發明實施例提供的情感傾向的識別方法的流程圖;圖8a-8b為本發明實施例提供的情感傾向的識別方法的流程圖;圖9為本發明實施例提供的情感傾向的識別方法的流程圖; 圖10為本發明實施例提供的情感傾向的識別方法的流程圖;圖11a-11b為本發明實施例提供的情感傾向的識別方法的流程圖;圖12為本發明實施例提供的對象分類方法的流程圖;圖13為本發明實施例提供的又一對象分類方法的流程圖;圖14為本發明實施例提供的又一對象分類方法的流程圖;圖15為本發明實施例提供的又一對象分類方法的流程圖;圖16為本發明實施例提供的一種對象分類系統的結構示意圖;圖17為本發明實施例提供的又一種對象分類系統的結構示意圖;圖18為本發明實施例提供的對象分類方法的場景實施例的流程圖。 FIG. 1 is a flowchart of determining an emotional tendency of a short text to be processed in the prior art; FIGS. 2a-2b are schematic structural diagrams of an emotional tendency recognition system according to an embodiment of the present invention; A schematic diagram of the correspondence between the estimation model and the category; Figures 4a-4c are flowcharts of constructing an emotion degree estimation model according to an embodiment of the present invention; and Fig. 5 is another method for constructing an emotion degree estimation model provided by an embodiment of the present invention Flowchart; Figures 6a-6b are flowcharts of constructing an emotion degree estimation model according to an embodiment of the present invention; Figure 7 is a flowchart of a method for identifying sentiment tendency according to an embodiment of the present invention; A flowchart of a method for identifying an emotional tendency according to an embodiment of the invention; FIG. 9 is a flowchart of a method for identifying an emotional tendency provided by an embodiment of the invention; FIG. 10 is a flowchart of a method for identifying an emotional tendency provided by an embodiment of the invention; 11a-11b are flowcharts of a method for identifying an emotional tendency according to an embodiment of the present invention; FIG. 12 is a flowchart of an object classification method according to an embodiment of the present invention; and FIG. 13 is a flowchart provided by an embodiment of the present invention A flowchart of still another object classification method; FIG. 14 is a flowchart of still another object classification method according to an embodiment of the present invention; FIG. 15 is a flowchart of still another object classification method according to an embodiment of the present invention; FIG. 17 is a schematic structural diagram of another object classification system according to an embodiment of the present invention; and FIG. 18 is a flowchart of a scenario embodiment of an object classification method according to an embodiment of the present invention.

下面將結合本發明實施例中的附圖,對本發明實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例僅僅是本發明一部分實施例,而不是全部的實施例。基於本發明中的實施例,本領域習知技術人員在沒有做出具有進步性之創作的前提下所獲得的所有其他實施例,都 屬本發明保護的範圍。 In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making progressive creations fall within the protection scope of the present invention.

為了準確確定待處理短文本的情感傾向,本發明提出構建情感度估測模型的技術手段,以利用情感度估測模型來估測待處理短文本對應的正面情感度和負面情感度。其中,正面情感度用於表示待處理短文本屬正面情感的程度,同理,負面情感度用於表示待處理短文本屬負面情感的程度。在確定正面情感度和負面情感度之後,可以進一步確定待處理短文本的情感傾向。 In order to accurately determine the sentiment tendency of the short text to be processed, the present invention proposes a technical method for constructing an emotional degree estimation model to estimate the positive sentiment and negative sentiment corresponding to the short text to be processed using the sentiment estimation model. Among them, the positive sentiment degree is used to indicate the degree to which the short text to be processed belongs to the positive sentiment. Similarly, the negative sentiment degree is used to indicate the degree to which the short text to be processed belongs to the negative sentiment. After determining the positive sentiment and negative sentiment, the emotional tendency of the short text to be processed can be further determined.

為了使本領域技術人員更加清楚瞭解本發明的應用場景,參見圖2a或圖2b,為本發明提供了情感傾向的識別系統。 In order to enable those skilled in the art to understand the application scenario of the present invention more clearly, referring to FIG. 2a or FIG. 2b, an emotional tendency recognition system is provided for the present invention.

圖2a提供的情感傾向的識別系統具體包括:資料提供設備100,與資料提供設備100相連的處理器200。 The emotional tendency recognition system provided in FIG. 2 a specifically includes: a material providing device 100, and a processor 200 connected to the material providing device 100.

其中,資料提供設備100用於向處理器200發送若干個對象。處理器200,用於依據若干個對象的短文本構建情感度估測模型,並利用情感度估測模型確定待處理短文本的情感傾向。 The data providing device 100 is configured to send several objects to the processor 200. The processor 200 is configured to construct an emotion degree estimation model according to short texts of several objects, and use the emotion degree estimation model to determine an emotional tendency of short texts to be processed.

本發明還提供另一種情感傾向的識別系統(參見圖2b)。 The present invention also provides another recognition system for emotional tendencies (see Fig. 2b).

圖2b提供的情感傾向的識別系統具體包括:資料提供設備100,與資料提供設備相連的模型構建設備300,與所述模型構建設備相連的處理器200。模型構建設備300可以為具有處理能力的處理設備。 The emotional tendency recognition system provided in FIG. 2b specifically includes: a material providing device 100, a model building device 300 connected to the material providing device, and a processor 200 connected to the model building device. The model building device 300 may be a processing device having a processing capability.

其中,資料提供設備100用於向模型構建設備300發送 若干個對象。模型構建設備300,用於依據若干個對象的短文本構建情感度估測模型,並將情感度估測模型發送至處理器200。處理器200,用於利用情感度估測模型確定待處理短文本的情感傾向。 Among them, the material providing device 100 is used to send several objects to the model building device 300. The model construction device 300 is configured to construct an emotion degree estimation model according to short texts of several objects, and send the emotion degree estimation model to the processor 200. The processor 200 is configured to determine an emotional tendency of a short text to be processed by using an emotion degree estimation model.

在圖2a和圖2b提出的情感傾向的識別系統中,處理器200和模型構建設備300均可以執行構建情感度估測模型的過程,並且,兩者構建情感度估測模型的過程是一致的。因此,將處理器200或模型構建設備300統稱為處理設備,以便在下述介紹構建情感度估測模型的過程中,採用處理設備來統一表示處理器200或模型構建設備300。 In the emotion tendency recognition system proposed in FIG. 2a and FIG. 2b, both the processor 200 and the model construction device 300 can execute the process of constructing an emotion degree estimation model, and the process of constructing the emotion degree estimation model by the two is the same . Therefore, the processor 200 or the model construction device 300 is collectively referred to as a processing device, so that the processing device is used to uniformly represent the processor 200 or the model construction device 300 in the process of constructing an emotion degree estimation model described below.

在圖2a和圖2b所示的系統中還可以包括與處理器相連的接收設備(圖示中未示出)。在處理器確定待處理短文本的情感傾向後,處理器,還用於輸出所述待處理文本的情感傾向;所述接收設備,用於接收所述待處理文本的情感傾向,以便接收設備可以利用待處理文本的情感傾向執行其它處理過程。 The system shown in FIG. 2a and FIG. 2b may further include a receiving device (not shown in the figure) connected to the processor. After the processor determines the emotional tendency of the short text to be processed, the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the emotional tendency of the text to be processed, so that the receiving device can Use the emotional tendency of the text to be processed to perform other processing.

下面介紹構建情感度估測模型的過程。由於現有技術確定待處理短文本的情感傾向的過程中未考慮短文本的類目,所以現有技術中確定出的情感傾向不準確。因此,本發明在處理設備構建情感度估測模型的過程中考慮短文本的類目,以便構建的情感度估測模型可以準確確定出待處理短文本的正面情感度和負面情感度。 The process of constructing a sentiment estimation model is described below. Since the category of short text is not considered in the process of determining the sentiment tendency of the short text to be processed in the prior art, the sentiment tendency determined in the prior art is not accurate. Therefore, the present invention considers the category of short texts in the process of constructing the sentiment estimation model by the processing device, so that the constructed sentiment estimation model can accurately determine the positive sentiment and negative sentiment of the short text to be processed.

本發明提出處理設備構建情感度估測模型的三種實現方式,參見圖3a-3c為三種實現方式中類目與情感度估測 模型的示意圖。 The present invention proposes three implementation modes for processing equipment to construct an emotion degree estimation model. Refer to FIGS. 3a to 3c, which are schematic diagrams of a category and an emotion degree estimation model in the three implementation modes.

第一種實現方式:所有類目對應一個情感度估測模型(參見圖3a)。第二種實現方式:每個類目對應一個情感度估測模型(參見圖3b)。第三種實現方式:介於第一種實現方式和第二種實現方式之間的一種實現方式(參見圖3c);假設有N個類目,則第三種實現方式可以構建M個情感度估測模型,其中,M為非零自然數,且,1<M<N。 The first implementation manner: all categories correspond to a sentiment estimation model (see FIG. 3a). The second implementation method: each category corresponds to an emotion estimation model (see FIG. 3b). The third implementation: an implementation between the first and the second implementation (see Figure 3c); assuming there are N categories, the third implementation can build M emotions Estimation model, where M is a non-zero natural number, and 1 <M <N.

下面詳細介紹這三種實現方式的具體實現過程: The detailed implementation process of these three implementation methods is described in detail below:

第一種實現方式:所有類目對應一個情感度估測模型。 The first implementation method: all categories correspond to a sentiment estimation model.

為了準確確定各個類目下的短文本對應的情感傾向,本實現方式為所有類目構建一個對應的情感度估測模型。 In order to accurately determine the sentiment corresponding to the short text in each category, this implementation method builds a corresponding sentiment estimation model for all categories.

參見圖4a,為所有類目對應的情感度估測模型的過程,具體包括以下步驟: Referring to FIG. 4a, the process of estimating the sentiment degree corresponding to all categories includes the following steps:

步驟S401:確定用於構建情感度估測模型的短文本樣本。 Step S401: Determine a short text sample for constructing the sentiment estimation model.

a)獲取資料提供設備發送的各個類目下的若干個對象,並對每個對象進行切分,獲得每個對象的短文本集合。 a) Obtaining several objects in each category sent by the data providing device, and segmenting each object to obtain a short text set of each object.

資料提供設備可以向處理設備發送各個類目下的對象,處理設備可以獲取每個類目下的多個對象。為了方便後續處理,處理設備可以對每個對象按標點符號進行切分,從而將每個對象切分為多個短文本。 The data providing device can send the objects in each category to the processing device, and the processing device can obtain multiple objects in each category. In order to facilitate subsequent processing, the processing device may segment each object according to punctuation marks, thereby segmenting each object into multiple short texts.

例如,以對象為淘寶用戶評價為例,在服飾類目下的一個用戶評價“衣服很合適,老媽很喜歡”,則按照標點符號切分後,可以獲得兩個短文本“衣服很合適”和“老媽很喜歡”。目標短文本。例如,在電子設備類目下的一個用戶評價“手機屏幕很大,外觀很漂亮”,則按照標點符號切分後,可以獲得兩個短文本“手機屏幕很大”和“外觀很漂亮”。 For example, taking the object of Taobao user evaluation as an example, a user under the clothing category evaluates "the clothes are very suitable, and the mother likes it", then after dividing according to the punctuation marks, you can get two short texts "the clothes are suitable" And "My mother loves it." Target short text. For example, a user under the category of electronic devices evaluates that "the mobile phone screen is large and the appearance is very beautiful", after dividing according to punctuation marks, two short texts "the mobile phone screen is large" and "the appearance is very beautiful" can be obtained.

b)在所有的短文本中篩選出用於構建情感度估測模型的短文本樣本。 b) Filter out short text samples from all short texts used to build the sentiment estimation model.

經過實驗發現,圖1所示的執行過程,確定一個短文本屬正面情感的準確率較高,確定一個短文本屬負面情感的準確率較低。 Through experiments, it is found that the execution process shown in FIG. 1 has a high accuracy rate for determining that a short text belongs to positive emotions and a low accuracy rate that determines that a short text belongs to negative emotions.

因此,在本步驟中處理設備可以將每個短文本執行如圖1所示的過程,若按圖1所示的過程,確定出一個短文本對應正面情感。那麼,確定該短文本可以用於構建情感度估測模型,且,該短文本對應正面情感。 Therefore, in this step, the processing device may execute the process shown in FIG. 1 for each short text. If the process shown in FIG. 1 is performed, it is determined that one short text corresponds to a positive emotion. Then, it is determined that the short text can be used to construct an emotion degree estimation model, and the short text corresponds to positive emotion.

若按圖1所示的過程,確定一個短文本對應負面情感。那麼,再由人工進行進一步的確認。若一個短文本在人工確認後屬負面情感,則確定該短文本可以用於構建情感度估測模型,且,該短文本對應負面情感。 If the process shown in FIG. 1 is used, it is determined that a short text corresponds to a negative emotion. Then, further confirmation is performed manually. If a short text is negative emotion after manual confirmation, it is determined that the short text can be used to construct an emotion degree estimation model, and the short text corresponds to the negative emotion.

若一個短文本在人工確認後屬正面情感,則說明該短文本的特點不明顯,不適合作為構建情感度估測模型的短文本。因此則丟棄該短文本。 If a short text is positive emotion after manual confirmation, it indicates that the characteristics of the short text are not obvious, and it is not suitable as a short text for constructing a sentiment estimation model. The short text is therefore discarded.

步驟S402:確定每個短文本對應的特徵集合。 Step S402: Determine a feature set corresponding to each short text.

在步驟S401使用圖1所示的過程中可以獲得每個短文本的分詞結果(詳見圖1中步驟1,在此不再贅述)。然後,進一步確定每個短文本對應的特徵集合。 In step S401, the segmentation result of each short text can be obtained by using the process shown in FIG. 1 (see step 1 in FIG. 1 for details, and details are not described herein again). Then, further determine the feature set corresponding to each short text.

本步驟可以有兩種執行方式,兩種方式的區別在於:第一種方式確定出的特徵集合中包含組合特徵,而第二種方式中確定出的特徵集合不包含組合特徵。 This step can be implemented in two ways. The difference between the two methods is that the feature set determined in the first way includes combined features, while the feature set determined in the second way does not include combined features.

由於確定每個短文本對應的特徵集合均是一致的,因此,以一個目標短文本為例,對確定目標短文本的特徵集合的過程進行詳細介紹。 Since the feature set corresponding to each short text is determined to be consistent, a process of determining the feature set of the target short text is described in detail by taking a target short text as an example.

參見圖4b,為確定目標短文本的特徵集合的第一種執行方式的具體過程: Referring to FIG. 4b, the specific process of the first execution manner of determining the feature set of the target short text:

步驟411:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果。 Step 411: Obtain a category identifier corresponding to the short text to be processed and a segmentation result obtained after performing a word segmentation operation on the short text to be processed.

處理設備在步驟S301中已經獲得目標短文本的分詞結果。由於目標短文本與待處理對象的類目是一致的,因此,處理設備可以將待處理對象的類目標識,確定為目標短文本的類目標識。 The processing device has obtained the segmentation result of the target short text in step S301. Since the target short text is consistent with the category of the object to be processed, the processing device may determine the category identifier of the object to be processed as the category identifier of the target short text.

以目標短文本屬服飾類目,且為“衣服很大”為例,目標短文本對應的分詞結果為“衣服”“很”和“大”,假設服飾類目的標識為“16”,則目標短文本的對應的類目標識為“16”。 Taking the target short text as the clothing category and "clothes are very large" as an example, the segmentation results corresponding to the target short text are "clothes", "very" and "big". Assuming the clothing category is identified as "16", the target The corresponding category identifier of the short text is "16".

以目標短文本屬電子設備類目,且為“屏幕很大”為例,目標短文本對應的分詞結果為“屏幕”“很”和“大”,假設電子設備類目的標識為“10”,則目標短文 本的對應的類目標識為“10”。 Taking the target short text belongs to the category of electronic devices and the screen size is large as an example, the segmentation results corresponding to the target short text are "screen", "very" and "large", assuming that the electronic device category identifier is "10" The corresponding category identifier of the target short text is "10".

步驟412:將各個分詞和所述類目標識進行組合,獲得各個特徵。 Step 412: Combine each participle and the category identifier to obtain each feature.

由於不同類目下的短文本對應的分詞可能是一致的,因此,為了充分考慮類目對短文本的影響,本發明將各個分詞與類目進行組合,獲得各個特徵。 Since the word segmentation corresponding to the short text in different categories may be consistent, in order to fully consider the impact of the category on the short text, the present invention combines each word segmentation and category to obtain each feature.

由於特徵包含了類目標識,並且,不同類目的標識是不同的,所以採用特徵可以準確區分不同類目的分詞。這樣,訓練得到的情感度估測模型可以準確區分不同類目下的相同分詞。 Because features include category identifiers, and different category identifiers are different, using features can accurately distinguish word segmentation of different categories. In this way, the trained sentiment estimation model can accurately distinguish the same word segmentation in different categories.

繼續延續上述舉例,以目標短文本“衣服很大”為例,則目標短文本對應的各個特徵可以為“衣服16”“很16”和“大16”。以目標短文本“屏幕很大”為例,則目標短文本對應的各個特徵可以為“屏幕10”“很10”和“大10”。站在特徵角度,處理設備可以分辨出分詞“大16”和“大10”是兩個不同的特徵,且兩個特徵屬不同的類目。 Continuing the above example, taking the target short text “Clothing is very large” as an example, each feature corresponding to the target short text may be “Cloth 16”, “Very 16”, and “Large 16”. Taking the target short text “large screen” as an example, the respective features corresponding to the target short text may be “screen 10”, “very 10”, and “big 10”. From the perspective of features, the processing equipment can distinguish the segmentation "big 16" and "big 10" as two different features, and the two features belong to different categories.

在本舉例中,分詞和類目標識的組合方式為分詞在前、類目標標識後,還可以是類目標識在前、分詞在後。當然,分詞和類目標識還可以有其它組合方式,在此不做限定。 In this example, the combination of the word segmentation and the category identifier is that the word segmentation comes first and the category target identifier follows, and the category tag comes first and the word segmentation comes after. Of course, there can be other combinations of word segmentation and category identification, which are not limited here.

步驟413:對各個特徵進行n元組合,獲得若干個組合特徵。 Step 413: Perform n-element combination on each feature to obtain several combined features.

因為,通過研究過程中發現,一些特徵具有固定搭 配,例如“沒有色差”、“沒有掉色”、“沒有起球”等等。對於這種固定搭配,由於兩個詞均是負面情感的詞匯,但是兩者疊加起來表達則為正面情感,所以這樣的詞匯若分開的話會造成一定的誤判。因此,本實施例可以進行特徵組合。 Because, through the research process, it was found that some features have a fixed match, such as "no color difference", "no color fade", "no pilling" and so on. For this fixed collocation, since both words are vocabularies of negative emotions, but the two words are superimposed to express positive emotions, if such words are separated, it will cause certain misjudgments. Therefore, this embodiment can perform feature combination.

具體而言,為利用n元語言模型對每個短文本的各個特徵進行組合。n為非零自然數,n元語言模型中的一個元對應短文本中的一個分詞。n元語言模型進行特徵組合具體為:將相鄰的n個特徵合併在一起,將n-1個特徵合併在一起,直到將2個特徵合併在一起。 Specifically, the features of each short text are combined using an n-gram language model. n is a non-zero natural number, and a meta in the n-gram language model corresponds to a participle in short text. The feature combination of the n-gram language model is specifically: merging adjacent n features together and n-1 features together until 2 features are merged together.

以n=2為例,若目標短文本的各個特徵為“衣服16”、“很16”和“大16”,則利用二元語言模型進行特徵組合後,獲得組合特徵為“衣服16很16”和“很16大16”。 Taking n = 2 as an example, if the features of the target short text are "clothes 16", "very 16", and "big 16", after combining features using a binary language model, the combined feature is obtained as "clothes 16 and 16" "And" very 16 big 16 ".

以n=3為例,若目標短文本的各個特徵為“衣服16”、“很16”和“大16”,則進行三元語言模型進行特徵組合後,獲得組合特徵為“衣服16很16大16”、“衣服16很16”和“很16大16”。 Taking n = 3 as an example, if the features of the target short text are "clothes 16", "very 16", and "big 16", then the ternary language model is used to combine the features, and the combined feature is "clothes 16 to 16" Big 16 "," Clothing 16 is 16 "and" Very 16 Big 16 ".

步驟414:將各個特徵和若干個組合特徵的集合,確定為所述目標短文本的特徵集合。 Step 414: Determine each feature and a set of several combined features as the feature set of the target short text.

延續上述實施例,則以二元語言模型進行特徵組合為例,則最終獲得的目標短文本的特徵集合包括:“衣服16”、“很16”、“大16”、“衣服16很16”和“很16大16”。 Continuing the above embodiment, taking the feature combination of the binary language model as an example, the feature set of the target short text finally obtained includes: "clothes 16", "very 16", "big 16", "clothes 16 very 16" And "very 16 big 16".

參見圖4c,為確定目標短文本的特徵集合的第二種執行方式的具體過程: Referring to FIG. 4c, a specific process of a second execution manner for determining a feature set of a target short text:

步驟421:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果。 Step 421: Obtain a category identifier corresponding to the short text to be processed and a segmentation result obtained after performing a word segmentation operation on the short text to be processed.

步驟422:將各個分詞和所述類目標識進行組合,獲得各個特徵。 Step 422: Combine each participle and the category identifier to obtain each feature.

圖4c中的步驟S421和步驟S422的執行過程與圖4b中的步驟S411和步驟S412一致,在不再贅述。 The execution process of steps S421 and S422 in FIG. 4c is consistent with steps S411 and S412 in FIG. 4b, and details are not described herein again.

步驟423:將各個特徵的集合,確定為所述目標短文本的特徵集合。 Step 423: Determine the feature set as the feature set of the target short text.

在圖4c的執行過程中缺少進行特徵組合的步驟,所以,可以直接將步驟S422中確定的各個特徵的集合,確定為目標短文本的特徵集合。 The step of performing feature combination is missing in the execution process of FIG. 4c. Therefore, the feature set determined in step S422 can be directly determined as the feature set of the target short text.

以目標短文本為“衣服很大”為例,則按圖4c執行後最終獲得的目標短文本的特徵集合包括:“衣服16”、“很16”、“大16”。 Taking the target short text as "large clothes" as an example, the feature set of the target short text finally obtained after execution according to Fig. 4c includes: "clothes 16", "very 16", and "big 16".

接著返回圖4a,進入步驟S403:確定各個短文本對應特徵集合中每個特徵的情感傾向,以及每個特徵的正面情感度和負面情感度,並將各個特徵以及各個特徵對應的情感傾向、正面情感度和負面情感度,作為情感度估測模型的輸入參數。 Then return to FIG. 4a and proceed to step S403: determine the emotional tendency of each feature in the feature set corresponding to each short text, and the positive and negative emotion degrees of each feature, and compare each feature and the emotional tendency and positive feature corresponding to each feature The sentiment and negative sentiment are used as input parameters of the sentiment estimation model.

在步驟S401執行圖1實施例的過程中,已經確定短文本的情感傾向。由於各個特徵的情感傾向與短文本的情感傾向是一致的。因此,在短文本對應正面情感時,確定特 徵集合中每個特徵對應正面情感;在短文本對應負面情感時,確定特徵集合中每個特徵對應負面情感。 During the execution of the embodiment of FIG. 1 in step S401, the emotional tendency of the short text has been determined. The emotional tendency of each feature is consistent with that of short text. Therefore, when the short text corresponds to the positive emotion, it is determined that each feature in the feature set corresponds to the positive emotion; when the short text corresponds to the negative emotion, it is determined that each feature in the feature set corresponds to the negative emotion.

以一個特徵為例,對確定特徵的正面情感度和負面情感度的過程進行詳細介紹。處理設備可以獲得很多數量的同一個特徵,並且,該特徵對應的情感傾向可能相同,可能不同。 Taking a feature as an example, the process of determining the positive sentiment and negative sentiment of a feature is described in detail. The processing device can obtain a large number of the same feature, and the emotional tendencies corresponding to the feature may be the same or may be different.

因此,處理設備可以統計該特徵的總數量,並統計屬正面情感的第一數量,以及屬負面情感的第二數量。依據第一數量與總數量的比例關係,確定該特徵的正面情感度;依據第一數量與總數量的比例關係,確定該特徵的負面情感度。 Therefore, the processing device can count the total number of the features, and count the first number belonging to positive emotions and the second number belonging to negative emotions. The positive emotion degree of the feature is determined according to the proportional relationship between the first quantity and the total quantity; the negative emotion degree of the feature is determined according to the proportional relationship between the first quantity and the total quantity.

步驟S404:按照預設分類器模型進行訓練,並獲得訓練後得到的情感度估測模型。 Step S404: Training is performed according to a preset classifier model, and an emotion degree estimation model obtained after training is obtained.

預設分類器模型可以包括最大熵模型、支持向量機、神經網路算法等等。有關訓練過程已有相關技術手段,在此不再贅述。 The preset classifier model may include a maximum entropy model, a support vector machine, a neural network algorithm, and so on. Relevant technical means have been provided for the training process, which will not be repeated here.

下面介紹處理設備構建情感度估測模型的第二種實現方式,在第二種實現方式中為每個類目構建一個情感度估測模型,因此,由於每個情感度估測模型中只有一個類目,所以在第二種實現方式中分詞即相當於特徵,因此在第二種實現方式中無需將分詞和類目標識進行組合。 The following describes the second implementation of the processing device to construct the sentiment estimation model. In the second implementation, an emotional estimation model is constructed for each category. Therefore, since there is only one Category, so word segmentation is equivalent to features in the second implementation, so there is no need to combine word segmentation and category identification in the second implementation.

由於每個類目對應的情感度估測模型的構建過程是一致的。因此,以一個目標類目為例,對構建目標類目對應的目標情感度估測模型的過程進行詳細介紹。 The construction process of the sentiment estimation model corresponding to each category is consistent. Therefore, taking a target category as an example, the process of constructing a target sentiment estimation model corresponding to the target category is described in detail.

參見圖5,構建目標情感度估測模型的過程具體包括以下步驟: Referring to FIG. 5, the process of constructing the target sentiment estimation model includes the following steps:

步驟S501:確定構建目標情感度估測模型的短文本樣本。 Step S501: Determine a short text sample for constructing a target sentiment estimation model.

a)獲取資料提供設備發送的目標類目下的若干個對象,並對每個對象進行切分,獲得每個對象的短文本集合。 a) Obtaining several objects under the target category sent by the data providing device, and segmenting each object to obtain a short text set of each object.

b)在所有的短文本中篩選出用於構建情感度估測模型的短文本。 b) The short texts used to construct the sentiment estimation model are filtered out of all short texts.

步驟S501的具體執行過程與步驟S401的執行過程類似,在此不再贅述。 The specific execution process of step S501 is similar to the execution process of step S401, and details are not described herein again.

步驟S502:確定每個短文本對應的特徵集合。 Step S502: Determine a feature set corresponding to each short text.

在步驟S501使用圖1所示的過程中可以獲得每個短文本的分詞結果(詳見圖1中步驟1,在此不再贅述)。然後,進一步確定每個短文本對應的特徵集合。本步驟可以有兩種執行方式,兩種方式的區別在於:第一種方式確定出的特徵集合中包含組合特徵,而第二種方式中確定出的特徵集合不包含組合特徵。 In step S501, the segmentation result of each short text can be obtained by using the process shown in FIG. 1 (see step 1 in FIG. 1 for details, and details are not described herein again). Then, further determine the feature set corresponding to each short text. This step can be implemented in two ways. The difference between the two methods is that the feature set determined in the first way includes combined features, while the feature set determined in the second way does not include combined features.

由於確定每個短文本對應的特徵集合均是一致的,因此,以一個目標短文本為例,對確定目標短文本的特徵集合的過程進行詳細介紹。 Since the feature set corresponding to each short text is determined to be consistent, a process of determining the feature set of the target short text is described in detail by taking a target short text as an example.

參見圖6a,為確定目標短文本的特徵集合的第一種執行方式的具體過程: Referring to FIG. 6a, the specific process of the first execution mode for determining the feature set of the target short text:

步驟601:獲取所述目標短文本對應的分詞結果,每 個分詞對應一個特徵。 Step 601: Obtain a segmentation result corresponding to the target short text, and each segmentation corresponds to a feature.

步驟602:對所述各個特徵進行n元組合,獲得若干個組合特徵。 Step 602: Perform n-element combination on each feature to obtain several combined features.

步驟603:將各個特徵和若干個組合特徵的集合,確定為所述目標短文本的特徵集合。 Step 603: Determine each feature and a set of several combined features as the feature set of the target short text.

以待處理短文本為“衣服很大”,以二元語言模型進行特徵組合為例,則本實施例最終獲得的目標短文本的特徵集合包括:“衣服”、“很”、“大”、“衣服很”和“很大”。 Taking the short text to be processed as "large clothes" and taking the feature combination of the binary language model as an example, the feature set of the target short text finally obtained in this embodiment includes: "clothes", "very", "big", "Clothing is very" and "large."

參見圖6b,為確定目標短文本的特徵集合的第二種執行方式的具體過程: Referring to FIG. 6b, a specific process of a second execution manner for determining a feature set of a target short text:

步驟611:獲取所述目標短文本對應的分詞結果,每個分詞對應一個特徵。 Step 611: Obtain a segmentation result corresponding to the target short text, and each segmentation corresponds to a feature.

步驟612:將分詞結果,確定為所述目標短文本的特徵集合。 Step 612: Determine the segmentation result as a feature set of the target short text.

在圖6b的執行過程中缺少進行特徵組合的步驟,所以,可以直接將步驟S611中確定的各個特徵的集合,確定為目標短文本的特徵集合。 The step of performing feature combination is missing in the execution process of FIG. 6b, so the set of each feature determined in step S611 can be directly determined as the feature set of the target short text.

以目標短文本為“衣服很大”為例,則按圖6b執行後最終獲得的目標短文本的特徵集合包括:“衣服”、“很”、“大”。 Taking the target short text as "large clothes" as an example, the feature set of the target short text finally obtained after execution according to Fig. 6b includes: "clothes", "very", and "large".

接著返回圖5,進入步驟S503:確定目標類目下各個短文本對應特徵集合中每個特徵的情感傾向,以及每個特徵的正面情感度和負面情感度,並將目標類目下各個特徵 以及各個特徵對應的情感傾向、正面情感度和負面情感度,作為目標情感度估測模型的輸入參數。 Then return to FIG. 5 and proceed to step S503: determine the emotional tendency of each feature in the corresponding feature set of each short text under the target category, as well as the positive and negative emotional degrees of each feature, and compare each feature and each feature under the target category The corresponding sentiment tendencies, positive sentiments and negative sentiments are used as input parameters of the target sentiment estimation model.

在步驟S501執行圖1實施例的過程中,已經確定各個短文本的情感傾向。由於各個特徵的情感傾向與短文本的情感傾向是一致的。因此,在短文本對應正面情感時,確定特徵集合中每個特徵對應正面情感;在短文本對應負面情感時,確定特徵集合中每個特徵對應負面情感。 During the execution of the embodiment of FIG. 1 in step S501, the emotional tendency of each short text has been determined. The emotional tendency of each feature is consistent with that of short text. Therefore, when the short text corresponds to a positive emotion, it is determined that each feature in the feature set corresponds to a positive emotion; when the short text corresponds to a negative emotion, it is determined that each feature in the feature set corresponds to a negative emotion.

步驟S504:按照預設分類器模型進行訓練,並獲得訓練後得到的目標情感度估測模型。 Step S504: Perform training according to a preset classifier model, and obtain a target emotion degree estimation model obtained after training.

預設分類器模型可以包括最大熵模型、支持向量機、神經網路算法等等。有關訓練過程已有相關技術手段,在此不再贅述。 The preset classifier model may include a maximum entropy model, a support vector machine, a neural network algorithm, and so on. Relevant technical means have been provided for the training process, which will not be repeated here.

圖5為構建一個類目的情感度估測模型,圖3為構建所有類目的情感度估測模型的過程,兩者的處理步驟很類似,因此,圖5的實施例的執行過程,可以參考圖4的具體執行過程,在此不再贅述。 Fig. 5 is a process for constructing a sentiment estimation model for a category, and Fig. 3 is a process for constructing the sentiment estimation models for all categories. The processing steps of the two are very similar. The specific implementation process of 4 is not repeated here.

在第二實現方式中,每個類目對應一個情感度估測模型。因此,為了避免混淆,處理設備在一個情感度估測模型構建完畢之後,還會構建情感度估測模型與類目標識之間的映射,以便後續處理器在使用時,可以準確確定與每個類目對應的情感度估測模型。 In the second implementation manner, each category corresponds to a sentiment estimation model. Therefore, in order to avoid confusion, after a sentiment estimation model is constructed, the processing device will also construct a mapping between the sentiment estimation model and the category identifier, so that subsequent processors can accurately determine the relationship between Emotional estimation model corresponding to category.

下面介紹處理設備構建情感度估測模型的第三種實現方式。 The third implementation manner of the processing device to construct the sentiment estimation model is described below.

在第三種實現方式中,可以包括:兩個或兩個以上的 類目對應的情感度估測模型,和/或,一個類目對應的情感度估測模型。兩個或兩個以上類目對應的情感估測模型的構建過程,可以參考圖4所示的實施例。一個類目對應的情感度估測模型,可參考圖5所示的實施例,在此不再贅述。 In a third implementation manner, it may include: an emotion degree estimation model corresponding to two or more categories, and / or, an emotion degree estimation model corresponding to one category. For the construction process of the emotion estimation models corresponding to two or more categories, reference may be made to the embodiment shown in FIG. 4. For the sentiment estimation model corresponding to a category, reference may be made to the embodiment shown in FIG. 5, and details are not described herein again.

結合圖2a和圖2b,若上述構建情感度估測模型的處理設備為處理器200自身的情況下,處理器200完成情感度估測模型後,便可以直接使用,以利用情感度估測模型確定待處理短文本的情感傾向。 With reference to Figs. 2a and 2b, if the processing device for constructing the sentiment estimation model is the processor 200 itself, after the processor 200 completes the sentiment estimation model, it can be used directly to use the sentiment estimation model. Determine the emotional orientation of the short text to be processed.

在處理設備為模型構建設備300的情況下,模型構建設備300會將情感度估測模型發送至處理器200,以便處理器200利用情感度估測模型確定待處理短文本的情感傾向。 When the processing device is the model construction device 300, the model construction device 300 sends the sentiment estimation model to the processor 200, so that the processor 200 uses the sentiment estimation model to determine the sentiment tendency of the short text to be processed.

下面介紹處理器200依據情感度估測模型確定待處理短文本的情感傾向的過程。由於情感度估測模型有三種不同的實現方式,在不同實現方式下,處理器200的執行過程也不盡相同,所以,下面分別介紹在情感度估測模型的不同實現方式下,處理器的執行過程。 The process of determining the emotional tendency of the short text to be processed by the processor 200 according to the sentiment estimation model is described below. Since there are three different implementations of the sentiment estimation model, the execution process of the processor 200 is not the same under different implementations. Therefore, the following describes the processor implementation in different implementations of the sentiment estimation model. Implementation process.

第一種:     The first:    

在情感度估測模型採用第一種實現方式(所有類目對應一個情感度估測模型)實現的情況下,處理器200採用以下方式來確定待處理的短文本的情感傾向。 In the case where the emotion degree estimation model is implemented in the first implementation manner (all categories correspond to one emotion degree estimation model), the processor 200 adopts the following manner to determine the emotion tendency of the short text to be processed.

參見圖7,本發明一種情感傾向的識別方法,具體包 括以下步驟: Referring to FIG. 7, a method for identifying an emotional tendency according to the present invention includes the following steps:

步驟S701:確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:待處理短文本的分詞和所述待處理文本所屬的類目標識。 Step S701: Determine a feature set corresponding to the short text to be processed; wherein each feature in the feature set includes a word segmentation of the short text to be processed and a category identifier to which the text to be processed belongs.

假設第一種實現方式在確定情感度估測模型的過程中,採用第一種執行方式確定短文本的特徵集合;則在本步驟中也採用第一種執行方式確定待處理短文本特徵集合。 Assuming that the first implementation method determines the feature set of short text in the process of determining the sentiment estimation model, the first execution method is also used in this step to determine the feature set of short text to be processed.

參見圖8a,確定待處理短文本對應的特徵集合的第一種執行方式,具體包括以下步驟: Referring to FIG. 8a, a first execution manner of determining a feature set corresponding to a short text to be processed specifically includes the following steps:

步驟S801:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果。 Step S801: Obtain a category identifier corresponding to the short text to be processed and a segmentation result obtained after performing a word segmentation operation on the short text to be processed.

步驟S802:將分詞結果中的各個分詞和所述類目標識進行組合,獲得各個特徵。 Step S802: combining each participle in the segmentation result with the category identifier to obtain each feature.

步驟S803:對所述各個特徵進行n元組合,獲得若干個組合特徵。 Step S803: performing n-element combination on each feature to obtain several combined features.

步驟S804:將各個特徵和若干個組合特徵的集合,確定為所述待處理短文本的特徵集合。 Step S804: Determine each feature and a set of several combined features as the feature set of the short text to be processed.

圖8a的執行過程可參見圖4a的執行過程,在此不再贅述。 For the execution process of FIG. 8a, refer to the execution process of FIG. 4a, and details are not described herein again.

假設第一種實現方式在確定情感度估測模型的過程中,採用第二種執行方式確定短文本的特徵集合;則在本步驟中也採用第二種執行方式確定待處理短文本的特徵集合。 Suppose that the first implementation method determines the feature set of short text in the second execution method in the process of determining the sentiment estimation model; then in this step, the second execution method is also used to determine the feature set of short text to be processed. .

參見圖8b,確定待處理短文本對應的特徵集合的第二種執行方式,具體包括以下步驟: Referring to FIG. 8b, a second execution manner of determining a feature set corresponding to the short text to be processed specifically includes the following steps:

步驟S811:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果。 Step S811: Obtain a category identifier corresponding to the short text to be processed and a segmentation result obtained after performing a word segmentation operation on the short text to be processed.

步驟S812:將分詞結果中的各個分詞和所述類目標識進行組合,獲得各個特徵。 Step S812: Combine each participle in the segmentation result with the category identifier to obtain each feature.

步驟S813:將各個特徵的集合,確定為所述待處理短文本的特徵集合。 Step S813: Determine the feature set as the feature set of the short text to be processed.

圖8b的執行過程可參見圖4b的執行過程,在此不再贅述。 For the execution process of FIG. 8b, refer to the execution process of FIG. 4b, and details are not described herein again.

接著返回圖7,步驟S702:依據預先訓練的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型包括:依據至少兩個類目、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型。 Then, return to FIG. 7 and step S702: according to the pre-trained sentiment estimation model and the feature set of the short text to be processed, perform the sentiment estimation on the short text to be processed; wherein the sentiment estimation model includes: A model that outputs positive and negative sentiment after training in two categories and several short text samples with emotional tendencies.

處理器將所述特徵集合輸入至所述情感度估測模型,由所述情感度估測模型估算後輸出所述特徵集合對應的正面情感度和負面情感度。 The processor inputs the feature set to the emotion degree estimation model, and outputs the positive emotion degree and the negative emotion degree corresponding to the feature set after being estimated by the emotion degree estimation model.

步驟S703:基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。 Step S703: Determine an emotional tendency corresponding to the short text to be processed based on the positive emotion level and the negative emotion level corresponding to the short text to be processed.

在確定所述待處理短文本對應的情感傾向,還可以輸出所述待處理短文本對應的情感傾向,以便進行其它方面的使用。 After determining the emotional tendency corresponding to the short text to be processed, the emotional tendency corresponding to the short text to be processed may also be output for use in other aspects.

在步驟S702中估測得到待處理短文本屬正面情感的正 面情感度,以及待處理文本屬負面情感的負面情感度之後,為了進一步確定待處理短文本的情感傾向,可以將正面情感度與負面情感度進行對比。若正面情感度大於負面情感度,則確定待處理短文本屬對應正面情感;若負面情感度大於正面情感度,則確定待處理短文本對應負面情感。 After estimating the positive sentiment degree of the short text to be processed as positive emotion and the negative sentiment degree of the negative text to be processed in step S702, in order to further determine the emotional tendency of the short text to be processed, the positive sentiment degree and negative Emotional levels are compared. If the positive sentiment is greater than the negative sentiment, it is determined that the short text to be processed is a corresponding positive sentiment; if the negative sentiment is greater than the positive sentiment, it is determined that the short text to be processed corresponds to the negative sentiment.

在一些情況下,正面情感度和負面情感度相差不大。以情感度採用機率表示為例,正面情感度的機率值為0.51,負面情感度的機率值為0.49。可以理解的是,由於正面情感度和負面情感度非常接近,所以理論上是無法準確確定待處理短文本的情感傾向的。但是,在此情況下,仍然按照上段方式確定待處理短文本的情感傾向,則會出現誤差。 In some cases, there is not much difference between positive affection and negative affection. Taking the probability expression as an example, the probability value of positive emotion is 0.51, and the probability value of negative emotion is 0.49. It can be understood that, because the degree of positive emotion and the degree of negative emotion are very close, it is theoretically impossible to accurately determine the emotional tendency of the short text to be processed. However, in this case, the emotional tendency of the short text to be processed is still determined according to the above paragraph, and an error will occur.

因此,參見圖9,本發明提供以下方式來待處理短文本的情感傾向。 Therefore, referring to FIG. 9, the present invention provides the following ways to deal with the emotional tendency of short text.

步驟S901:確定所述正面情感度和所述負面情感度兩者中的較大情感度。 Step S901: Determine a larger emotion degree between the positive emotion degree and the negative emotion degree.

將正面情感度和負面情感度進行對比,確定兩者中的較大情感度。若正面情感度大於負面情感度,則確定正面情感度為較大情感度;若負面情感度大於正面情感度,則確定負面情感度為較大情感度。 Compare the positive sentiment and negative sentiment to determine the larger sentiment. If the degree of positive emotion is greater than the degree of negative emotion, the degree of positive emotion is determined to be a larger degree of emotion; if the degree of negative emotion is greater than the degree of positive emotion, the degree of negative emotion is determined to be a larger degree of emotion.

步驟S902:判斷所述較大情感度是否大於預設置信度。 Step S902: It is determined whether the greater emotion degree is greater than a preset reliability.

為了判定較大情感度是否可信,本發明預先設定了預 設置信度。預設置信度為確定較大情感度可信的程度。然後,判斷較大情感度與預設置信度的大小。 In order to determine whether the larger emotional degree is credible, the present invention presets a preset reliability degree. The preset reliability is to determine the credibility of the larger emotion. Then, determine the magnitude of the larger emotion level and the preset reliability level.

步驟S903:若所述較大情感度大於預設置信度,則確定所述待處理短文本對應的情感傾向與所述較大情感度的情感傾向一致。 Step S903: if the larger emotional degree is greater than the preset reliability, determine that the emotional tendency corresponding to the short text to be processed is consistent with the emotional tendency of the larger emotional degree.

若較大情感度大於預設置信度,則確定較大情感度的可信度較高。因此,可以準確確定待處理短文本的情感傾向。此時,待處理短文本的情感傾向與較大情感度的情感傾向一致。 If the greater sentiment is greater than the preset reliability, the credibility of determining the greater sentiment is higher. Therefore, the emotional tendency of the short text to be processed can be accurately determined. At this time, the emotional tendency of the short text to be processed is consistent with the emotional tendency of a larger emotional degree.

即,若較大情感度對應正面情感度,則確定待處理短文本屬對應正面情感;若較大情感度對應負面情感度,則確定待處理短文本對應負面情感。 That is, if the larger emotion degree corresponds to the positive emotion degree, it is determined that the short text to be processed belongs to the positive emotion; if the larger emotion degree corresponds to the negative emotion degree, it is determined that the short text to be processed corresponds to the negative emotion.

假設較大情感度為0.8,預設置信度為0.7,則在此情況下,可以準確確定待處理短文本的情感傾向。 Assuming the large sentiment is 0.8 and the preset reliability is 0.7, in this case, the sentiment tendency of the short text to be processed can be accurately determined.

步驟S904:若所述較大情感度不大於預設置信度,則執行其它處理過程確定待處理文本的情感傾向。 Step S904: if the greater sentiment is not greater than the preset reliability, perform other processing procedures to determine the sentiment tendency of the text to be processed.

若較大情感度不大於預設置信度,則確定較大情感度的可信度較低。因此,可以無法準確確定待處理短文本的情感傾向。假設較大情感度為0.55,預設置信度為0.7,則在此情況下,無法準確確定待處理短文本的情感傾向。 If the larger emotional degree is not greater than the preset reliability, the reliability of determining the larger emotional degree is low. Therefore, the emotional tendency of the short text to be processed cannot be accurately determined. Assuming that the large sentiment is 0.55 and the preset reliability is 0.7, in this case, the emotional tendency of the short text to be processed cannot be accurately determined.

在此情況下,可以執行一些其它處理過程,以便進一步確定待處理短文本的情感傾向。此過程不是本發明的重點,在此不再贅述。 In this case, some other processing may be performed to further determine the emotional tendency of the short text to be processed. This process is not the focus of the present invention and will not be repeated here.

在圖2a和圖2b所示的系統中還可以包括與處理器相連 的接收設備(圖示中未示出)。在處理器確定待處理短文本的情感傾向後,處理器,還用於輸出所述待處理文本的情感傾向;所述接收設備,用於接收所述待處理文本的情感傾向,以便接收設備可以利用待處理文本的情感傾向。 The system shown in Figs. 2a and 2b may further include a receiving device (not shown in the figure) connected to the processor. After the processor determines the emotional tendency of the short text to be processed, the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the emotional tendency of the text to be processed, so that the receiving device can Take advantage of the emotional inclination of the text to be processed.

第二種:     The second:    

在情感度估測模型採用第二種實現方式實現的情況下,處理器200採用以下方式來確定待處理的短文本的情感傾向。參見圖10,本發明一種情感傾向的識別方法,具體包括以下步驟: In the case where the emotion degree estimation model is implemented by using the second implementation manner, the processor 200 uses the following manner to determine the emotional tendency of the short text to be processed. Referring to FIG. 10, a method for identifying an emotional tendency according to the present invention specifically includes the following steps:

步驟S1001:確定待處理短文本對應的特徵集合和類目標識。 Step S1001: Determine a feature set and a category identifier corresponding to the short text to be processed.

假設第二種實現方式在確定情感度估測模型的過程中,採用第一種執行方式確定短文本的特徵集合;則在本步驟中也採用第一種執行方式確定待處理短文本特徵集合。 It is assumed that in the process of determining the sentiment estimation model in the second implementation manner, the first execution manner is used to determine the feature set of short text; then in this step, the first execution manner is also used to determine the feature set of short text to be processed.

參見圖11a,為確定待處理短文本的特徵集合的第一種執行方式的具體過程: Referring to FIG. 11a, a specific process of determining a first implementation manner of a feature set of short text to be processed:

步驟1101:獲取所述待處理短文本執行分詞操作後獲得的分詞結果。 Step 1101: Obtain a segmentation result obtained after performing a segmentation operation on the short text to be processed.

步驟1102:利用n元語言模型對各個分詞進行分詞組合,獲得若干個組合分詞。 Step 1102: Use the n-gram language model to perform segmentation and combination on each segmentation to obtain several combined segmentation.

步驟1103:將各個分詞和若干個組合分詞的集合,確定為所述待處理短文本的特徵集合,一個分詞對應一個特 徵。 Step 1103: Determine the set of each participle and several combined participles as the feature set of the short text to be processed, and one participle corresponds to one feature.

在圖11a的執行過程與圖6a的執行過程類似,具體執行過程可參見圖6a的執行過程,在此不再贅述。 The execution process in FIG. 11a is similar to the execution process in FIG. 6a. For the specific execution process, refer to the execution process in FIG. 6a, and details are not described herein again.

假設第二種實現方式在確定情感度估測模型的過程中,採用第二種執行方式確定短文本的特徵集合;則在本步驟中也採用第二種執行方式確定待處理短文本特徵集合。 Assuming that the second implementation method determines the feature set of short text in the process of determining the sentiment estimation model, the second execution method is also used in this step to determine the feature set of short text to be processed.

參見圖11b,為確定待處理短文本的特徵集合的第二種執行方式的具體過程: Referring to FIG. 11b, the specific process of the second execution manner for determining the feature set of the short text to be processed:

步驟1111:獲取所述待處理短文本執行分詞操作後獲得的分詞結果。 Step 1111: Obtain a segmentation result obtained after performing a segmentation operation on the short text to be processed.

步驟1112:將所述分詞結果,確定為所述待處理短文本的特徵集合,一個分詞對應一個特徵。 Step 1112: Determine the segmentation result as the feature set of the short text to be processed, and one segmentation corresponds to one feature.

在圖11b的執行過程與圖6b的執行過程類似,具體執行過程可參見圖6a的執行過程,在此不再贅述。 The execution process in FIG. 11b is similar to the execution process in FIG. 6b. For the specific execution process, refer to the execution process in FIG. 6a, and details are not described herein again.

接著返回圖10,進入步驟S1002:依據與所述類目標識對應的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型為:依據所述類目標識對應的、帶有情感傾向的若干個短文本樣本的特徵集合訓練後得到的、輸出正面情感度和負面情感度的模型。 Then return to FIG. 10 and proceed to step S1002: according to the sentiment degree estimation model corresponding to the category identifier, and combine the feature set of the short text to be processed, perform the sentiment degree estimation on the short text to be processed; The test model is a model that outputs the positive emotion degree and the negative emotion degree after training according to the feature set of several short text samples with emotional tendency corresponding to the category identifier.

在第二種實現方式中,具有多個情感度估測模型。為了獲得適用於待處理短文本的情感度估測模型,可以依據類目標識在多個情感度估測模型進行查找,從而確定與類 目標識對應的情感度估測模型。 In the second implementation, there are multiple sentiment estimation models. In order to obtain the sentiment estimation model suitable for the short text to be processed, multiple sentiment estimation models can be searched according to the category identifier to determine the sentiment estimation model corresponding to the category identification.

處理器將所述特徵集合輸入至所述情感度估測模型,由所述情感度估測模型估算後輸出所述特徵集合對應的正面情感度和負面情感度。 The processor inputs the feature set to the emotion degree estimation model, and outputs the positive emotion degree and the negative emotion degree corresponding to the feature set after being estimated by the emotion degree estimation model.

步驟S1003:基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。本步驟的執行過程與圖7的步驟703的執行過程一致,在此不再贅述。 Step S1003: Determine an emotional tendency corresponding to the short text to be processed based on the positive emotion level and the negative emotion level corresponding to the short text to be processed. The execution process of this step is consistent with the execution process of step 703 in FIG. 7, and details are not described herein again.

在圖2a和圖2b所示的系統中,還可以包括與處理器相連的接收設備(圖示中未示出)。在處理器確定所述待處理短文本對應的情感傾向後,處理器,還用於輸出所述待處理文本的情感傾向;所述接收設備,用於接收所述待處理文本的情感傾向。 In the system shown in FIG. 2a and FIG. 2b, a receiving device (not shown in the figure) connected to the processor may be further included. After the processor determines the emotional tendency corresponding to the short text to be processed, the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the emotional tendency of the text to be processed.

在情感度估測模型採用第三種實現方式實現的情況下,處理器200會預先儲存類目標識與情感度估測模型的對應關係,並預先構建每個類目標識和情感度估測模型的構建方式的對應關係。 In the case where the emotion degree estimation model is implemented in the third implementation manner, the processor 200 stores the correspondence between the category identifier and the emotion degree estimation model in advance, and constructs each category identifier and the emotion degree estimation model in advance. The corresponding relationship of the construction method.

若處理器200接收到一個類目標識後,首先判斷與類目標識對應的情感度估測模型的構建方式;若情感度估測模型採用第一種實現方式構建,則適應性的按圖4所示的過程確定待處理短文本的情感傾向;即:確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:所述待處理短文本的分詞和所述待處理短文本所屬的類目標識;依據預先訓練的情感度估測 模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型包括:依據至少兩種類目的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。 If the processor 200 receives a category identifier, it first judges the construction method of the emotion degree estimation model corresponding to the category identifier; if the emotion degree estimation model is constructed using the first implementation method, the adaptability is as shown in FIG. 4. The process shown determines the emotional tendency of the short text to be processed; that is, determines the feature set corresponding to the short text to be processed; wherein each feature in the feature set includes: the word segmentation of the short text to be processed and the to-be-processed The category identification to which the short text belongs; based on a pre-trained sentiment estimation model and the feature set of the short text to be processed, the sentiment estimation is performed on the short text to be processed; wherein the sentiment estimation model includes: A model that outputs positive and negative sentiment for two types of short text samples with sentiment tendency training; based on the positive and negative sentiment corresponding to the short text to be processed, determining the Emotional tendency corresponding to the short text to be processed.

若情感度估測模型採用第二種實現方式構建,則按適應性的按圖5所示的過程確定待處理短文本的情感傾向。即:確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:所述待處理短文本的分詞;依據與所述類目標識對應的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型為:依據所述類目標識對應的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。通過圖7和圖10所示的實施例,可以看出本發明具有以下有益效果:本發明提供一種情感傾向的識別方法,本方法利用若干個帶情感傾向的短文本進行訓練,並獲得情感度估測模型。由於每個特徵集合包含短文本的分詞和類目標識,所以,申請構建的情感度估測模型充分考慮了短文本所屬的類目。因此,基於情感度估測模型確定出的待處理短文本的正面情感度和負面情感度,相對于現有技術而言更加準 確。進而,利用正面情感度和負面情感度確定出的情感傾向也更加準確。 If the sentiment estimation model is constructed using the second implementation method, the sentiment tendency of the short text to be processed is determined adaptively according to the process shown in FIG. 5. That is, a feature set corresponding to the short text to be processed is determined; wherein each feature in the feature set includes: a word segmentation of the short text to be processed; an emotion estimation model corresponding to the category identifier is combined with the Feature set for processing short texts, and sentiment estimation for short texts to be processed; wherein, the sentiment estimation model is obtained by training a number of short text samples corresponding to the category identifier with emotional tendencies. A model that outputs a positive sentiment and a negative sentiment; based on the positive sentiment and the negative sentiment corresponding to the short text to be processed, determines an emotional tendency corresponding to the short text to be processed. According to the embodiments shown in FIG. 7 and FIG. 10, it can be seen that the present invention has the following beneficial effects: The present invention provides a method for identifying emotional tendencies. This method uses several short texts with emotional tendencies for training and obtains the degree of emotion. Estimate model. Since each feature set contains the word segmentation and category identification of the short text, the sentiment estimation model constructed by the application fully considers the category to which the short text belongs. Therefore, the positive sentiment and negative sentiment of the short text to be processed determined based on the sentiment estimation model are more accurate than the prior art. Furthermore, the emotional tendency determined by using the positive emotion degree and the negative emotion degree is also more accurate.

下面以最大熵模型為例,對本發明構建情感度估測模型的訓練過程進行詳細介紹:首先構建兩個矩陣:矩陣A和矩陣B,矩陣A中包含各個特徵和各個特徵對應的正面情感度和負面情感度。矩陣B包含兩個分類結果:正面情感和負面情感。對於矩陣A中的任意個特徵a,採用b表示其情感傾向。f i (a,b)表示(a,b)共同出現情況。 The following uses the maximum entropy model as an example to describe in detail the training process for constructing the sentiment estimation model of the present invention. First, two matrices are constructed: matrix A and matrix B. Matrix A contains each feature and the positive sentiment and Negative emotion. Matrix B contains two classification results: positive emotions and negative emotions. For any feature a in matrix A, use b to represent its emotional tendency. f i ( a , b ) indicates that (a, b) occurs together.

首先計算f i (a,b)在訓練樣本中的期望,由於訓練模型中沒有變量,所以在計算完畢後該期望值為一個常數。具體計算公式如下所示: First calculate the expectation of f i ( a , b ) in the training sample. Since there are no variables in the training model, the expected value is a constant after the calculation is completed. The specific calculation formula is as follows:

其中,(f i )表示f i (a,b)在訓練樣本i中的期望,(a,b)表示f i (a,b)在訓練樣本的經驗機率分佈。 among them, ( f i ) represents the expectation of f i ( a , b ) in training sample i, ( a , b ) represents the empirical probability distribution of f i ( a , b ) in the training sample.

f i (a,b)在模型中的機率分佈的公式如下: The formula for the probability distribution of f i ( a , b ) in the model is as follows:

其中,(b)表示訓練樣本中短文本對應的情感傾向是b的機率,p(a|b)表示短文本的情感傾向是b的前提下,特徵a的條件機率。 among them, ( b ) represents the probability that the emotional tendency corresponding to the short text in the training sample is b, and p ( a | b ) represents the conditional probability of the feature a on the premise that the emotional tendency of the short text is b.

f i (a,b)在最大熵模型中的計算公式為: Then the calculation formula of f i ( a , b ) in the maximum entropy model is:

在最大熵模型中,f i (a,b)在訓練樣本中的期望,與 f i (a,b)在模型中的期望應該是一致的。即: In the maximum entropy model, the expectations of f i ( a , b ) in the training samples should be consistent with the expectations of f i ( a , b ) in the model. which is:

採用拉格朗日乘子法,在滿足公式(4)的約束條件下求解目標方程(2)的最優解,最優解如下所示: The Lagrange multiplier method is used to solve the optimal solution of the objective equation (2) under the constraint condition of formula (4). The optimal solution is as follows:

其中,為歸一化因子,使得 w i 為特徵f i 的權重。 among them, Is the normalization factor such that , W i is the weight of the feature f i .

將公式(5)代入到公式(1)中從而獲得最大熵模型的訓練的結果,也即情感度估測模型。 The formula (5) is substituted into the formula (1) to obtain the training result of the maximum entropy model, that is, the sentiment estimation model.

如圖12所示,本發明提供了一種對象分類方法。應用於處理器中,在本實施例中,可以直接利用待處理對象的短文本的情感傾向來對對象進行分類。具體包括以下步驟: As shown in FIG. 12, the present invention provides an object classification method. Applied to the processor, in this embodiment, the emotion tendency of the short text of the object to be processed can be directly used to classify the object. It includes the following steps:

步驟S1201:確定待處理對象的短文本資訊,其中,所述短文本資訊包括短文本的情感傾向。 Step S1201: Determine short text information of the object to be processed, wherein the short text information includes an emotional tendency of the short text.

處理器可以利用標點符號將待處理對象分為若干個短文本,每個短文本可以按照本發明圖7或圖10提供的過程確定其情感傾向,從而可以確定出待處理對象中每個短文本的情感傾向。此外,短文本資訊還可以包括:待處理對象中屬正面情感的短文本數量、屬負面情感的短文本數量、正面短文本的所占比例、負面短文本的所占比例等等。 The processor can use punctuation to divide the object to be processed into several short texts, and each short text can determine its emotional tendency according to the process provided in FIG. 7 or FIG. 10 of the present invention, so that each short text in the object to be processed can be determined Emotional tendencies. In addition, the short text information may also include: the number of short texts that are positive emotions, the number of short texts that are negative emotions, the proportion of positive short texts, the proportion of negative short texts, and so on.

步驟S1202:依據預先訓練的類別識別模型,對所述 短文本資訊進行類別識別;其中,所述類別識別特徵模型為:依據若干對象的短文本資訊訓練得到的、第一類別和第二類別的分類器。 Step S1202: Perform category recognition on the short text information according to a pre-trained category recognition model; wherein the category recognition feature model is: the first category and the second category obtained by training based on short text information of several objects Classifier.

類別識別模型為預先利用若干個對象的短文本資訊訓練後,得到的輸出第一類別和第二類別的分類器。具體而言,可以利用最大熵模型、神經網路算法或者支持向量機等分類模型,對若干個對象的短文本資訊進行訓練,從而獲得類別識別模型。相關技術手段,可以採用現有技術中的訓練方式,在此不再贅述。 The class recognition model is a classifier that outputs the first class and the second class after training with short text information of several objects in advance. Specifically, a classification model such as a maximum entropy model, a neural network algorithm, or a support vector machine can be used to train short text information of several objects to obtain a category recognition model. Relevant technical means may adopt the training methods in the prior art, and details are not described herein again.

在獲得待處理對象的短文本資訊後,將待處理對象的短文本輸入至類別識別模型,類別識別模型處理後,可以確定待處理對象的類別。 After the short text information of the object to be processed is obtained, the short text of the object to be processed is input to the category recognition model. After the category recognition model is processed, the category of the object to be processed can be determined.

在實際過程中發現,針對一個對象而言,對象除了包括文本之外還可以包括圖像。以對象為電商系統的用戶評價為例,用戶評價中除了具有文本(字符用戶評價)之外,還可以具有商品的圖像。 In the actual process, it was found that for an object, in addition to the text, the object can also include an image. Taking the user evaluation whose object is the e-commerce system as an example, in addition to the text (character user evaluation), the user evaluation may also include an image of a product.

可以理解的是,單獨通過對象的短文本資訊確定出的對象類別不準確,因為並沒有考慮到對象的圖像特徵資訊;同理,單獨採用對象的圖像特徵資訊確定出的對象類別也不準確,因為並沒有考慮到對象的短文本資訊。因此,本實施例將短文本資訊和圖像特徵資訊進行合併,採用短文本資訊和圖像特徵資訊一併確定對象類別,從而提高對象類別的準確率。 It can be understood that the object category determined by the short text information of the object alone is not accurate because the image feature information of the object is not considered; similarly, the object category determined by the image feature information of the object alone is not the same Accurate because the short text information of the subject is not taken into account. Therefore, in this embodiment, the short text information and the image feature information are combined, and the short text information and the image feature information are used to determine the object category together, thereby improving the accuracy of the object category.

本發明又提供了一種對象分類方法,在本實施例中利 用待處理對象的多個特徵來對對象進行分類。如圖13所示,具體包括以下步驟: The present invention further provides an object classification method. In this embodiment, multiple features of an object to be processed are used to classify the object. As shown in FIG. 13, it specifically includes the following steps:

步驟S1301:確定與待處理對象對應的特徵資訊;其中,所述特徵資訊包括短文本資訊和圖像特徵資訊,並且,所述短文本資訊包括短文本的情感傾向。 Step S1301: Determine feature information corresponding to the object to be processed; wherein, the feature information includes short text information and image feature information, and the short text information includes emotional tendency of the short text.

處理器可以利用標點符號將待處理對象分為若干個短文本,每個短文本可以按照本發明圖7或圖10提供的過程確定其情感傾向,從而可以確定出待處理對象中每個短文本的情感傾向。此外,短文本資訊還可以包括:待處理對象中屬正面情感的短文本數量、屬負面情感的短文本數量、正面短文本的所占比例、負面短文本的所占比例等等。 The processor can use punctuation to divide the object to be processed into several short texts, and each short text can determine its emotional tendency according to the process provided in FIG. 7 or FIG. 10 of the present invention, so that each short text in the object to be processed can be determined Emotional tendencies. In addition, the short text information may also include: the number of short texts that are positive emotions, the number of short texts that are negative emotions, the proportion of positive short texts, the proportion of negative short texts, and so on.

處理器可以對圖像進行處理,從而獲得圖像特徵資訊。圖像特徵資訊可以包括下述圖像特徵中的一個或多個:圖像寬度、圖像高度、圖像中人臉個數、圖像包含的子圖的個數、圖像的背景是否是純色、圖像包含文字區域占比是多少、圖像顯著區域主顏色個數、圖像主顏色個數、圖像牛皮癬分數、圖像主體質量分數、圖像是假人模特的機率得分、圖像中是真人模特的機率得分、圖像展示的是商品細節的機率得分等等。 The processor can process the image to obtain image feature information. The image feature information may include one or more of the following image features: image width, image height, number of faces in the image, number of sub-images included in the image, and whether the background of the image is Solid color, what is the percentage of the text area containing the image, the number of main colors in the significant area of the image, the number of main colors in the image, the score of the image psoriasis, the quality score of the image subject, the probability score of the image being a dummy model, Like the probability score of a real model, the image shows the probability score of product details, and so on.

步驟S1302:依據預先訓練的類別識別模型,對所述特徵資訊進行類別識別;其中,所述類別識別特徵模型為:依據若干對象的特徵資訊訓練得到的、第一類別和第二類別的分類器。 Step S1302: classify the feature information according to a pre-trained class recognition model; wherein the class recognition feature model is: a classifier of the first class and the second class, which is trained based on the feature information of several objects .

類別識別模型為預先利用若干個對象的短文本資訊和圖像特徵資訊訓練後,得到的輸出第一類別和第二類別的分類器。具體而言,可以利用最大熵模型、神經網路算法或者支持向量機等分類模型,對若干個對象的短文本資訊進行訓練,從而獲得類別識別模型。相關技術手段,可以採用現有技術中的訓練方式,在此不再贅述。 The class recognition model is a classifier that outputs the first class and the second class after being trained in advance using short text information and image feature information of several objects. Specifically, a classification model such as a maximum entropy model, a neural network algorithm, or a support vector machine can be used to train short text information of several objects to obtain a category recognition model. Relevant technical means may adopt the training methods in the prior art, and details are not described herein again.

在獲得待處理對象的短文本資訊後,將待處理對象的短文本發送至類別識別模型,從而確定待處理對象的類別。 After obtaining the short text information of the object to be processed, the short text of the object to be processed is sent to the category recognition model, so as to determine the category of the object to be processed.

可以理解的是,待處理對象的特徵資訊中的特徵種類越多,則最終獲得的結果越準確。所以,為了進一步提高待處理對象的類別的準確率,特徵資訊還可以包括:所述待處理對象所附屬於第一主體的特徵資訊;和/或,所述待處理對象所附屬於第二主體的特徵資訊。當然還可以包括其它特徵資訊,在此不再一一列舉。 It can be understood that the more feature types in the feature information of the object to be processed, the more accurate the final result obtained. Therefore, in order to further improve the accuracy of the category of the object to be processed, the feature information may further include: the feature information attached to the to-be-processed object belonging to the first subject; and / or, the object to be processed belongs to the second subject. Feature information. Of course, other characteristic information can also be included, which will not be listed here one by one.

例如,以用戶評價為例,所述待處理對象所附屬於第一主體的特徵資訊具體為:商品的所附屬於賣家(第一主體)特徵資訊,例如,賣家的信用等級、賣家的銷售量等。所述待處理對象所附屬於第二主體的特徵資訊具體為:商品的所附屬於買家(第二主體)特徵資訊,例如,買家的信用等級、發佈非默認的用戶評價資料量、發佈帶圖的用戶評價數量、發佈帶圖的用戶評價占比。 For example, taking the user evaluation as an example, the characteristic information attached to the first subject that belongs to the object to be processed is specifically: the characteristic information that belongs to the seller (the first subject) is attached to the product, for example, the credit rating of the seller and the sales volume of the seller Wait. The characteristic information belonging to the second subject attached to the object to be processed is specifically: the characteristic information belonging to the buyer (second subject) attached to the product, for example, the buyer's credit rating, the amount of non-default user evaluation data issued, and the release The number of user comments with pictures, and the percentage of user reviews with pictures.

在特徵資訊中增加短文本資訊、圖像特徵資訊以及其它特徵資訊後,對象的特徵資訊便會具有多個特徵資訊。 為了綜合考慮多個特徵資訊,本實施例提出採用梯度提升決策樹模型對若干個訓練樣本進行訓練,從而獲得類別識別模型。 After adding short text information, image feature information, and other feature information to the feature information, the feature information of the object will have multiple feature information. In order to comprehensively consider multiple feature information, this embodiment proposes to use a gradient boosting decision tree model to train several training samples to obtain a class recognition model.

梯度提升決策樹模型是以決策樹為基函數的提升方法。梯度提升決策樹模型包括多棵決策樹,之所以採用多棵決策樹是考慮對於單棵決策樹會因為過度分裂而造成過擬合,失去泛化能力;如果分裂太少,又會造成學習不夠充分。 Gradient lifting decision tree model is a lifting method based on decision tree. The gradient boosting decision tree model includes multiple decision trees. The reason for using multiple decision trees is to consider that for a single decision tree, it will cause overfitting due to excessive splitting and lose generalization ability; if there are too few splits, it will cause insufficient learning. full.

下面介紹梯度提升決策樹模型的訓練過程: The following describes the training process of the gradient boosting decision tree model:

第一,估計初值F 0First, estimate the initial value F 0 .

初值F 0可以是一個隨機的數值,也可以等於0,具體數值可以根據實際情況而定,在此不做限定。 The initial value F 0 may be a random value or may be equal to 0. The specific value may be determined according to actual conditions, and is not limited herein.

第二,按照下述方式迭代M次,獲得M棵決策樹 Second, iterate M times as follows to obtain M decision trees

A)利用上一梯度提升決策樹更新全部訓練樣本對應多個特徵資訊的估計值。 A) Use the previous gradient boosting decision tree to update the estimated values of all the training samples corresponding to multiple feature information.

B)從所有訓練樣本中隨機選擇部分樣本,作為本次構建決策樹的訓練樣本。 B) Randomly select some samples from all the training samples and use them as training samples for constructing the decision tree this time.

C)根據樣本所包含的特徵,計算每種特徵的資訊增益,選擇資訊增益最大的特徵進行第一次劃分,左側代表第一類別,右側代表第二類別。計算本次的梯度,結合梯度重新估計樣本的特徵資訊的特徵值。 C) According to the features contained in the sample, calculate the information gain of each feature, and select the feature with the largest information gain for the first division. The left side represents the first category, and the right side represents the second category. Calculate the gradient of this time, and re-estimate the eigenvalue of the feature information of the sample in combination with the gradient.

將上段步驟重複J次,得到J層葉子節點的決策樹。 The above steps are repeated J times to obtain the decision tree of J-level leaf nodes.

D)根據獲得M棵決策樹,計算訓練樣本在該棵決策樹上的準確率,將準確率作為該棵決策樹的權重。 D) According to the obtained M decision trees, calculate the accuracy of the training samples on the decision tree, and use the accuracy rate as the weight of the decision tree.

第三,將M棵決策樹進行線性組合,得到最終的梯度提升決策樹模型。 Third, the M decision trees are linearly combined to obtain the final gradient boosted decision tree model.

梯度提升決策樹模型包括多棵決策樹,可以表示為多棵決策樹的加法模型:F(X)=F 0+β 1 T 1(X)+β 2 T 2(X)+...β i T i (X)...+β M T M (X)……公式(6) The gradient boosting decision tree model includes multiple decision trees, which can be expressed as an addition model of multiple decision trees: F ( X ) = F 0 + β 1 T 1 ( X ) + β 2 T 2 ( X ) + ... β i T i ( X ) ... + β M T M ( X ) ... Equation (6)

其中,F 0是一個初值,T i (X)表示待處理對象的特徵資訊與一個決策樹的匹配度,β i 表示一個決策樹的權重,M表示決策樹的總數量。 Among them, F 0 is an initial value, T i ( X ) represents the matching degree of the feature information of the object to be processed with a decision tree, β i represents the weight of a decision tree, and M represents the total number of decision trees.

梯度提升決策樹模型使用多棵決策樹正是希望能夠在訓練精度和泛化能力兩個方面都達到較好的結果。梯度提升決策樹模型作為一種boosting算法,梯度提升決策樹模型自然包含boosting的思想:將一系列弱分類器組合起來,構成一個強分類器。它不要求每棵決策樹學到太多的東西,每顆樹都學一點知識,然後將每個決策樹學到的知識累加起來構成一個強大的模型。 The use of multiple decision trees in the gradient boosting decision tree model is precisely to achieve good results in both training accuracy and generalization ability. As a boosting algorithm, the gradient boosting decision tree model naturally includes the idea of boosting: a series of weak classifiers are combined to form a strong classifier. It does not require each decision tree to learn too much, each tree learns a little knowledge, and then the knowledge learned by each decision tree is added up to form a powerful model.

本發明又提供了一種對象分類方法,如圖14所示,具體包括以下步驟: The present invention further provides a method for classifying an object. As shown in FIG. 14, the method specifically includes the following steps:

步驟S1401:確定與待處理對象對應的特徵資訊。 Step S1401: Determine feature information corresponding to the object to be processed.

其中,所述特徵資訊包括短文本資訊、圖像特徵資訊、待處理對象所附屬於第一主體的特徵資訊、所述待處理對象所附屬於第二主體的特徵資訊。並且,所述短文本資訊包括短文本的情感傾向。 The feature information includes short text information, image feature information, feature information belonging to the first subject attached to the object to be processed, and feature information belonging to the second subject attached to the object to be processed. And, the short text information includes an emotional tendency of the short text.

以對象對用戶評價為例,則本步驟可以為:確定待處理用戶評價的特徵資訊;其中,所述特徵資訊包括用戶評 價的文本特徵資訊、用戶評價的圖像特徵資訊、賣家的特徵資訊和買家的特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向。 Taking the object evaluation of the user as an example, this step may be: determining characteristic information of the user evaluation to be processed; wherein, the characteristic information includes text characteristic information of the user evaluation, image characteristic information of the user evaluation, seller characteristic information, Buyer characteristic information, and the text characteristic information includes the emotional tendency of the short text.

步驟S1402:將所述特徵資訊與預先訓練的梯度提升決策樹模型進行識別。 Step S1402: identify the feature information and a pre-trained gradient boosting decision tree model.

繼續以對象為用戶評價為例,則本步驟為依據預先訓練的梯度提升決策樹模型,對所述待處理用戶評價的特徵資訊進行類別識別;其中,所述類別識別模型為:依據若干用戶評價樣本的特徵資訊訓練後得到的、第一類用戶評價和第二類用戶評價的分類器。 Taking the object as the user evaluation as an example, this step is to classify the feature information of the to-be-processed user evaluation based on a pre-trained gradient promotion decision tree model; wherein the class recognition model is based on several user evaluations. A classifier of the first type of user evaluation and the second type of user evaluation obtained after training on the feature information of the sample.

如圖15所示,具體而言本步驟包括以下步驟: As shown in FIG. 15, specifically, this step includes the following steps:

步驟S1501:將所述特徵資訊輸入至所述類別識別模型,也即梯度提升決策樹模型。 Step S1501: The feature information is input to the category recognition model, that is, a gradient boosting decision tree model.

梯度提成決策樹模型有M棵樹,將特徵資訊分別與M棵樹進行匹配,從而獲得與每棵樹匹配後確定的類別。 The gradient commission decision tree model has M trees, and the feature information is matched with the M trees, respectively, so as to obtain the category determined after matching each tree.

步驟S1502:確定所述待處理對象對應的第一類別匹配度和第二類別匹配度。 Step S1502: Determine a first category matching degree and a second category matching degree corresponding to the object to be processed.

按上述公式6確定第一類別匹配度和第二類別匹配度。 The first category matching degree and the second category matching degree are determined according to the above formula 6.

第一類別匹配度F 1(X)=F 0+β 1 T 1(X)+β 2 T 2(X)+...β i T i (X)...+β M T M (X)。其中,T i (X)表示特徵資訊與一棵樹的匹配度,β i 表示該樹對應的權重。若一棵樹確定特徵資訊對應第一類別,則權重為β i ;若一棵樹確定特徵資訊對應第二類別,則權重為0。 First class matching degree F 1 ( X ) = F 0 + β 1 T 1 ( X ) + β 2 T 2 ( X ) + ... β i T i ( X ) ... + β M T M ( X ). Among them, T i ( X ) represents the matching degree of feature information with a tree, and β i represents the weight corresponding to the tree. If a tree determines that the feature information corresponds to the first category, the weight is β i ; if a tree determines that the feature information corresponds to the second category, the weight is 0.

第二類別匹配度F 2(X)=F 0+β 1 T 1(X)+β 2 T 2(X)+...β i T i (X)...+β M T M (X)。其中,T i (X)表示特徵資訊與一棵樹的匹配度,β i 表示該樹對應的權重。若一棵樹確定特徵資訊對應第二類別,則權重為β i ;若一棵樹確定特徵資訊對應第一類別,則權重為0。 Second category matching degree F 2 ( X ) = F 0 + β 1 T 1 ( X ) + β 2 T 2 ( X ) + ... β i T i ( X ) ... + β M T M ( X ). Among them, T i ( X ) represents the matching degree of the feature information with a tree, and β i represents the weight corresponding to the tree. If a tree determines that the feature information corresponds to the second category, the weight is β i ; if a tree determines that the feature information corresponds to the first category, the weight is 0.

步驟S1503:對所述第一類別匹配度和第二類別匹配度進行比較。若第一類別匹配度大於第二類別匹配度,進入步驟S1504;若第二類別匹配度大於第一類別匹配度,則進入步驟S1505。 Step S1503: Compare the first category matching degree and the second category matching degree. If the first category matching degree is greater than the second category matching degree, proceed to step S1504; if the second category matching degree is greater than the first category matching degree, proceed to step S1505.

步驟S1504:確定所述待處理對象的類別為第一類別。 Step S1504: Determine the category of the object to be processed as a first category.

繼續以對象為用戶評價為例,則本步驟為確定待處理用戶評價的類別為第一類別。第一類別為優質用戶評價,那麼本步驟即為確定待處理用戶評價的類別為優質用戶評價。步驟S1505:確定所述待處理對象的類別為第二類別。 Continue taking the object as the user evaluation as an example, then this step is to determine that the category of the user evaluation to be processed is the first category. The first category is high-quality user reviews, so this step is to determine the category of the user reviews to be processed as high-quality user reviews. Step S1505: Determine the category of the object to be processed as the second category.

繼續以對象為用戶評價為例,則本步驟為確定待處理用戶評價的類別為第二類別。第二類別為劣質用戶評價,那麼本步驟即為確定待處理用戶評價的類別為劣質用戶評價。 Continue to take the object as the user evaluation as an example, this step is to determine the category of the user evaluation to be processed as the second category. The second category is inferior user evaluation, then this step is to determine the category of the user evaluation to be processed as inferior user evaluation.

在確定所述待處理對象為第一類別之後,將所述待處理對象添加至對象集合中;發送所述對象集合中的對象。對象集合可以被其它設備使用,在使用過程中,可以再次經過篩選確定出多個更優的對象樣本,然後將對象樣本再 發送至處理器,以便處理器利用更優的對象樣本,重新訓練類別識別模型,以便類別識別模型更加準確。即,處理器可以接收多個對象樣本,所述對象樣本來源於所述對象集合;將所述多個對象樣本,添加至訓練類別識別模型的已有對象樣本中;基於更新後的已有對象樣本,重新訓練類別識別模型。 After determining that the object to be processed is a first category, the object to be processed is added to an object set; and the objects in the object set are sent. The object set can be used by other devices. In the process of use, it can be screened again to determine multiple better object samples, and then send the object samples to the processor, so that the processor can use the better object samples to retrain the category. Recognize the model so that the category recognition model is more accurate. That is, the processor may receive multiple object samples, which are derived from the object set; add the multiple object samples to an existing object sample of the training category recognition model; based on the updated existing object Samples and retrain the category recognition model.

繼續以對象為用戶評價為例,則本過程為:在確定所述待處理用戶評價為第一類用戶評價之後,將所述待處理用戶評價添加至第一類用戶評價集合中;發送所述第一類用戶評價集合。第一用戶評價集合可以對用戶進行使用,在使用過程中可以在第一類用戶評價集合中確定出更優的用戶評價。然後,可以將更優的用戶評價發送至處理設備,以便處理設備重新訓練類別識別模型。即本系統可以形成閉環系統。 Continue to take the object as the user evaluation as an example, then the process is: after determining that the user evaluation to be processed is the first type of user evaluation, adding the user evaluation to be processed to the first type user evaluation set; and sending the The first type of user evaluation collection. The first user evaluation set can be used by users, and a better user evaluation can be determined in the first type of user evaluation set during use. A better user evaluation can then be sent to the processing device so that the processing device retrains the category recognition model. That is, the system can form a closed loop system.

即,處理器接收多個第一類用戶評價,所述第一類用戶評價來源於所述第一類用戶評價集合;將所述多個第一類用戶評價,添加至類別識別模型已有的用戶評價樣本中;基於更新後的已有的用戶評價樣本,重新訓練類別識別模型。 That is, the processor receives a plurality of first-type user evaluations that are derived from the first-type user evaluation set; adding the plurality of first-type user evaluations to an existing category recognition model. In the user evaluation samples; based on the updated existing user evaluation samples, the class recognition model is retrained.

參見圖16,本發明提供了一種對象分類系統,包括:資料提供設備100,用於發送若干個對象。 Referring to FIG. 16, the present invention provides an object classification system, including: a material providing device 100 for sending a plurality of objects.

處理器200,用於接收所述資料提供設備送的若干個對象,依據若干對象的特徵資訊訓練後得到、輸出第一類別和第二類別的類別識別模型;用於確定待處理對象的特 徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據所述類別識別模型,對所述待處理對象的特徵資訊進行類別識別;還用於輸出第一類別的對象。 The processor 200 is configured to receive a plurality of objects sent by the data providing device, and obtain and output category identification models of the first category and the second category after training according to the feature information of the objects; and used to determine the feature information of the object to be processed Wherein, the feature information includes text feature information and image feature information, and the text feature information includes short text's emotional tendency; class recognition of the feature information of the object to be processed according to the category recognition model ; Also used to output objects of the first category.

資料接收設備400,用於接收並使用所述第一類別的對象。 The material receiving device 400 is configured to receive and use the objects of the first category.

資料接收設備400在使用對象集合的過程中,可以再次經過篩選確定出多個更優的對象樣本,然後將對象樣本再發送至處理器200,以便處理器利用更優的對象樣本,重新訓練類別識別模型,以便類別識別模型更加準確。 In the process of using the object collection, the data receiving device 400 may select a plurality of better object samples through screening again, and then send the object samples to the processor 200 so that the processor uses the better object samples to retrain the category. Recognize the model so that the category recognition model is more accurate.

參見圖17,本發明還提供了一種對象分類系統,包括:資料提供設備100,用於發送若干個對象。 Referring to FIG. 17, the present invention further provides an object classification system, including: a material providing device 100 for sending a plurality of objects.

模型構建設備300,用於接收所述資料提供設備送的若干個對象,依據若干個對象的特徵資訊訓練後得到、輸出第一類別和第二類別的類別識別模型,並發送所述類別識別模型。 A model building device 300 is configured to receive a plurality of objects sent by the data providing device, obtain and output category recognition models of the first category and the second category after training according to the feature information of the objects, and send the category recognition models. .

處理器200,用於接收所述類別識別模型,並確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據所述類別識別模型,對所述待處理對象的特徵資訊進行類別識別;還用於輸出第一類別的對象。 The processor 200 is configured to receive the category recognition model and determine feature information of an object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature information includes emotion of short text Tendency; class recognition of feature information of the object to be processed according to the class recognition model; and also for outputting objects of a first class.

資料接收設備400,用於接收並使用所述第一類別的 對象。 The material receiving device 400 is configured to receive and use the objects of the first category.

資料接收設備400在使用對象集合的過程中,可以再次經過篩選確定出多個更優的對象樣本,然後將對象樣本再發送至處理器200,以便處理器利用更優的對象樣本,重新訓練類別識別模型,以便類別識別模型更加準確。 In the process of using the object collection, the data receiving device 400 may select a plurality of better object samples through screening again, and then send the object samples to the processor 200 so that the processor uses the better object samples to retrain the category. Recognize the model so that the category recognition model is more accurate.

下面以一個具體場景實施例,來詳細描述對象分類方法。 The following uses a specific scenario embodiment to describe the object classification method in detail.

在電商系統中有很多用戶評價,如何從眾多用戶評價中篩選出優質用戶評價,是本實施例所要解決的問題。由於電商系統中用戶評價數量和種類繁多,商家需要花費很多時間找出店鋪中的優質用戶評價,這無形中需要花費巨大的人力成本。目前在優質用戶評價識別領域,工業界常用的技術主要有兩種:第一種,基於短文本的識別技術;第二種,基於圖像特徵的識別技術。 There are many user evaluations in the e-commerce system, and how to select high-quality user evaluations from the many user evaluations is a problem to be solved in this embodiment. Due to the large number and variety of user reviews in the e-commerce system, merchants need to spend a lot of time finding out good user reviews in the store, which invariably requires huge labor costs. At present, in the field of high-quality user evaluation and recognition, there are mainly two technologies commonly used in the industry: the first is a short text-based recognition technology; the second is an image feature-based recognition technology.

基於短文本的識別技術相對比較容易實現,但是存在著一些局限性:不關注用戶評價中買家發佈的圖像資訊。在實際場景中,比如服飾類,用戶不單單關心用戶評價中的文字描述部分,還關心商品真實的樣子,即圖像特徵資訊。 Recognition technology based on short text is relatively easy to implement, but it has some limitations: it does not pay attention to the image information posted by buyers in user evaluation. In actual scenarios, such as clothing, users are not only concerned about the text description part of the user evaluation, but also about the true appearance of the product, that is, the image feature information.

基於圖像特徵的識別技術效果顯著,但也有一定的局限性。基於圖像特徵的優質用戶評價識別技術僅僅利用用戶評價中的圖像資訊進行識別,並不關心已購買者具體購買後的心得體會,即短文本資訊。因此,可以看出用戶評價中的短文本資訊和圖像特徵資訊同樣重要。 The recognition technology based on image features has significant effects, but it also has certain limitations. The high-quality user evaluation and recognition technology based on image features only uses the image information in user evaluation for identification, and does not care about the experience of the purchaser after the specific purchase, that is, short text information. Therefore, it can be seen that short text information and image feature information in user evaluation are equally important.

此外,申請人發現還有一些其它特徵對確定優質用戶評價,可以起到輔助作用。例如,賣家特徵和買家特徵。因此,本實施例將以上特徵均作為確定用戶評價為優質用戶評價或劣質用戶評價的依據。為此,本實施例提出基於多種特徵融合的機器學習方法,即梯度提升決策樹模型,來訓練若干個訓練樣本,從而獲得類別識別模型。 In addition, the applicant found that there are other features that can assist in determining quality user reviews. For example, seller characteristics and buyer characteristics. Therefore, in this embodiment, the above features are used as the basis for determining whether the user evaluation is a high-quality user evaluation or a low-quality user evaluation. For this reason, this embodiment proposes a machine learning method based on multiple feature fusion, that is, a gradient boosting decision tree model, to train several training samples, thereby obtaining a class recognition model.

如圖18所示,為本發明提供確定優質用戶評價的流程圖。從圖中可以清晰地整個確定優質用戶評價的過程。主要由三部分組成: As shown in FIG. 18, a flowchart for determining a quality user evaluation is provided for the present invention. From the figure, you can clearly determine the entire process of quality user evaluation. It consists of three parts:

(1)構建用戶評價庫     (1) Construction of user evaluation library    

在用戶評價伺服器中獲取大量的用戶評價,首先利用預處理規則過濾掉一部分劣質用戶評價。預處理規則可以為:優質用戶評價中圖像和文本所需要滿足的一些要求,即使用短文本和圖像特徵中少量維度的特徵對大量用戶評價進行過濾。 To obtain a large number of user evaluations in the user evaluation server, firstly use a preprocessing rule to filter out some inferior user evaluations. The preprocessing rules can be: some requirements that images and texts need to meet in high-quality user evaluation, that is, using short text and a small number of dimensions in image features to filter a large number of user evaluations.

具體而言為,優質用戶評價中的短文本不能均為負面情感,基於此,若用戶評價中的短文本均對應負面情感,則判定為非優質用戶評價。對於優質用戶評價中的圖像也有基本要求,圖像的分辨率達到預設分辨率、圖像為非對話截屏、圖像中的明顯廣告宣傳語以及水印占比小於預設值,等等。 Specifically, the short texts in the high-quality user evaluation cannot be all negative emotions. Based on this, if the short texts in the user evaluations all correspond to negative emotions, it is determined as a non-high-quality user evaluation. There are also basic requirements for images in high-quality user evaluations. The resolution of the image reaches the preset resolution, the image is a non-conversation screen shot, the obvious advertising slogan in the image, and the watermark ratio is less than the preset value, etc.

將用戶評價伺服器中滿足上述短文本要求和圖像特徵要求的用戶評價,將其放入用戶評價庫中。針對不滿足短 文本要求和圖像特徵要求的用戶評價,則將這些用戶評價判定為優質用戶評價,不放入用戶評價庫中。 The user evaluations in the user evaluation server that satisfy the above short text requirements and image feature requirements are put into a user evaluation database. For user evaluations that do not meet the requirements of short text and image features, these user evaluations are judged as high-quality user evaluations and are not placed in the user evaluation database.

通過預處理規則的過濾可以過濾出一些非優質用戶評價,這樣不僅能夠減少優質用戶評價識別模型的使用次數,而且,還可以有效地過濾掉非優質用戶評價,提升優質用戶評價識別模型預測的準確率。 The filtering of pre-processing rules can filter out some non-quality user evaluations. This can not only reduce the number of times the high-quality user evaluation recognition model is used, but also effectively filter out non-quality user evaluations, and improve the prediction accuracy of quality user evaluation recognition models Indeed.

(2)確定優質用戶評價集合     (2) Determine the quality user evaluation set    

利用優質用戶評價識別模型對用戶評價庫中用戶評價進行識別,若識別結果為優質用戶評價,則放入到優質用戶評價集合中。 The high-quality user evaluation recognition model is used to identify user evaluations in the user evaluation database. If the recognition result is a high-quality user evaluation, it is put into a high-quality user evaluation set.

(3)使用優質用戶評價集合。     (3) Use a collection of high-quality user reviews.    

資料接收設備可以從優質用戶評價集合中獲取優質用戶評價,並在實際應用過程中使用優質評價。資料接收設備在使用優質用戶評價集合中優質用戶評價的過程中,會根據預先設定準則重新對優質評價集合中的優質用戶評價進行篩選,從而篩選出符合預先設定準則的優質用戶評價。然後,將符合預先設定準則的優質用戶評價發送至處理器或模型構建設備,以便處理器或模型構建設備對優質用戶評價識別模型進行迭代更新。 The data receiving device can obtain high-quality user evaluation from the high-quality user evaluation set, and use the high-quality evaluation in the actual application process. In the process of using the high-quality user evaluation in the high-quality user evaluation set by the data receiving device, the high-quality user evaluation in the high-quality evaluation set is re-screened according to the preset criteria, so as to select the high-quality user reviews that meet the preset criteria. Then, the high-quality user evaluation that meets the preset criteria is sent to the processor or model building device, so that the processor or model-building device iteratively updates the high-quality user evaluation recognition model.

(4)優質用戶評價識別模型的迭代更新。     (4) Iterative update of high-quality user evaluation and recognition model.    

利用符合預先設定準則的優質用戶評價,重新對優質 用戶評價識別模型進行訓練,以便優質用戶評價識別模型能夠盡可能的輸出滿足用戶需求的優質用戶評價。 Utilize high-quality user evaluations that meet pre-set criteria and retrain the high-quality user evaluation identification model so that the high-quality user evaluation identification model can output high-quality user evaluations that meet user needs as much as possible.

由於在優質用戶評價集合中挑選出的優質用戶評價,均滿足賣家或運行人員的預設規則,所以將這些優質用戶評價重新加入用戶評價庫中,重新對優質用戶評價識別模型的更新優化,以便優質用戶評價識別模型更好地識別出滿足用戶期望的優質用戶評價。 Since the high-quality user evaluations selected in the high-quality user evaluation collection all meet the preset rules of the seller or the operating staff, these high-quality user evaluations are re-added into the user evaluation database, and the high-quality user evaluation identification model is updated and optimized so The high-quality user evaluation recognition model better identifies high-quality user evaluations that meet user expectations.

基於上述過程可以發現:本實施例中用戶可以不再需要從原始用戶評價庫中一條一條去篩選,只需要在優質用戶評價集合中進行挑選就能快速期望的優質用戶評價,有效地降低人力成本。與此同時,優質用戶評價模型能夠有效地利用商家提供的優質用戶評價進行迭代更新,從而進一步識別出滿足商家期望的優質用戶評價。 Based on the above process, it can be found that, in this embodiment, the user no longer needs to select one by one from the original user evaluation database, and only needs to select from the high-quality user evaluation set, which can quickly expect high-quality user evaluation, effectively reducing labor costs. . At the same time, the high-quality user evaluation model can effectively utilize the high-quality user evaluation provided by the merchant to iteratively update, thereby further identifying the high-quality user evaluation that meets the merchant's expectations.

本實施例方法所述的功能如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時,可以儲存在一個計算設備可讀取儲存媒體中。基於這樣的理解,本發明實施例對現有技術做出貢獻的部分或者該技術方案的部分可以以軟體產品的形式體現出來,該軟體產品儲存在一個儲存媒體中,包括若干指令用以使得一台計算設備(可以是個人計算機,伺服器,行動計算設備或者網路設備等)執行本發明各個實施例所述方法的全部或部分步驟。而前述的儲存媒體包括:U盤、移動硬盤、只讀儲存器(ROM,Read-Only Memory)、隨機存取儲存器(RAM,Random Access Memory)、磁碟或者光碟等各種可以儲存程式代 碼的媒體。 If the functions described in the method of this embodiment are implemented in the form of a software functional unit and sold or used as an independent product, they can be stored in a computing device readable storage medium. Based on this understanding, the part of the embodiment of the present invention that contributes to the existing technology or the technical solution may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for making one A computing device (which may be a personal computer, a server, a mobile computing device, or a network device, etc.) performs all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc. media.

本說明書中各個實施例採用遞進的方式描述,每個實施例重點說明的都是與其它實施例的不同之處,各個實施例之間相同或相似部分互相參見即可。 Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on differences from other embodiments, and the same or similar parts between the various embodiments may refer to each other.

對所公開的實施例的上述說明,使本領域專業技術人員能夠實現或使用本發明。對這些實施例的多種修改對本領域的專業技術人員來說將是顯而易見的,本文中所定義的一般原理可以在不脫離本發明的精神或範圍的情況下,在其它實施例中實現。因此,本發明將不會被限制于本文所示的這些實施例,而是要符合與本文所公開的原理和新穎特點相一致的最寬的範圍。 The above description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but should conform to the widest scope consistent with the principles and novel features disclosed herein.

Claims (30)

一種情感傾向的識別方法,包括:確定待處理短文本對應類目標識;其中,一個文本相鄰兩個標點符號之間文字稱為短文本;確定與所述類目標識對應的情感度估測模型的實現方式;若所述情感度估測模型的實現方式為所有類目對應一個情感度估測模型,則確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:所述待處理短文本的分詞和所述待處理短文本所屬的類目標識;依據預先訓練的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型包括:依據至少兩種類目的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向;若所述情感度估測模型的實現方式為一個類目對應一個情感度估測模型,確定待處理短文本對應的特徵集合;其中,所述特徵集合中每個特徵包括:所述待處理短文本的分詞;依據與所述類目標識對應的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型為:依據所述類目標識對應的、帶有情感傾向的若干個短文本樣本訓練後得到 的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。     A method for identifying sentiment tendency includes determining a category identifier corresponding to short text to be processed; wherein a text between two punctuation marks adjacent to a text is referred to as a short text; determining an emotion degree estimation corresponding to the category identifier Implementation of the model; if the implementation of the sentiment estimation model is that all categories correspond to one sentiment estimation model, then a feature set corresponding to the short text to be processed is determined; wherein each feature in the feature set includes : The word segmentation of the short text to be processed and the category identifier to which the short text to be processed belong; according to a pre-trained sentiment estimation model, combined with the feature set of the short text to be processed, the sentiment estimation of the short text to be processed ; Wherein the sentiment estimation model includes: a model that outputs positive sentiment and negative sentiment obtained after training according to at least two categories and several short text samples with sentiment tendency; based on the pending essay The corresponding positive sentiment and negative sentiment, determine the sentiment corresponding to the short text to be processed; if the sentiment is estimated The implementation of the type is that one category corresponds to one sentiment estimation model and determines the feature set corresponding to the short text to be processed; wherein each feature in the feature set includes: the word segmentation of the short text to be processed; The sentiment degree estimation model corresponding to the category identifier is combined with the feature set of the short text to be processed, and the sentiment degree estimation is performed on the short text to be processed. The emotion degree estimation model is: A model that outputs positive sentiment and negative sentiment obtained after training several short text samples with an emotional tendency; based on the positive sentiment and negative sentiment corresponding to the short text to be processed, determines the short processed text The corresponding emotional tendencies.     如請求項1所述的方法,其中,在確定所述待處理短文本對應的情感傾向後,還包括:輸出所述待處理短文本對應的情感傾向。     The method according to claim 1, wherein after determining an emotional tendency corresponding to the short text to be processed, the method further comprises: outputting an emotional tendency corresponding to the short text to be processed.     一種情感傾向的識別方法,包括:確定待處理短文本對應的特徵集合;其中,一個文本相鄰兩個標點符號之間的文字稱為短文本;所述特徵集合中每個特徵包括:所述待處理短文本的分詞和所述待處理短文本所屬的類目標識;依據預先訓練的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型包括:依據至少兩種類目的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。     A method for identifying an emotional tendency includes: determining a feature set corresponding to short text to be processed; wherein a text between two punctuation marks adjacent to a text is called short text; each feature in the feature set includes: the The word segmentation of the short text to be processed and the category identifier to which the short text to be processed belong; according to a pre-trained sentiment estimation model and the feature set of the short text to be processed, the sentiment estimation of the short text to be processed; The sentiment estimation model includes: a model outputting positive sentiment and negative sentiment obtained after training according to at least two types of short text samples with emotional tendencies; and based on the corresponding short text to be processed The positive emotion degree and the negative emotion degree determine the emotion tendency corresponding to the short text to be processed.     如請求項3所述的方法,其中,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本對應的類目標識,以及所述待 處理短文本執行分詞操作後獲得的分詞結果;將所述分詞結果中的各個分詞和所述類目標識進行組合,獲得各個特徵;將各個特徵的集合,確定為所述待處理短文本的特徵集合。     The method according to claim 3, wherein the determining a feature set corresponding to the short text to be processed comprises: obtaining a category identifier corresponding to the short text to be processed, and obtaining the word segmentation operation after the short text to be processed is obtained The segmentation result of the combination; combining each segmentation in the segmentation result with the category identifier to obtain each feature; and determining the set of each feature as the feature set of the short text to be processed.     如請求項3所述的方法,其中,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本對應的類目標識,以及所述待處理短文本執行分詞操作後獲得的分詞結果;將所述分詞結果中的各個分詞和所述類目標識進行組合,獲得各個特徵;利用n元語言模型對所述各個特徵進行特徵組合,獲得若干個組合特徵;將各個特徵和所述若干個組合特徵的集合,確定為所述待處理短文本的特徵集合。     The method according to claim 3, wherein the determining a feature set corresponding to the short text to be processed comprises: obtaining a category identifier corresponding to the short text to be processed, and obtaining the word segmentation operation after the short text to be processed is obtained Word segmentation result; combining each segmentation word in the word segmentation result with the category identifier to obtain each feature; using an n-gram language model to combine the features to obtain several combined features; combining each feature with The set of several combined features is determined as a feature set of the short text to be processed.     如請求項5所述的方法,其中,所述利用n元語言模型對所述各個特徵進行特徵組合,獲得若干個組合特徵,包括:利用二元語言模型對所述各個特徵進行特徵組合,獲得若干個組合特徵。     The method according to claim 5, wherein the using the n-ary language model to combine the features to obtain several combined features includes: using a binary language model to combine the features to obtain Several combined features.     如請求項3所述的方法,其中,所述依據預先訓練的 情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測,包括:將所述特徵集合輸入至所述情感度估測模型;由所述情感度估測模型估算後、輸出待處理短文本對應的正面情感度和負面情感度。     The method according to claim 3, wherein the estimating the sentiment of the short text to be processed according to the pre-trained sentiment estimation model and the feature set of the short text to be processed includes: inputting the feature set To the emotion degree estimation model; after the estimation by the emotion degree estimation model, the positive emotion degree and the negative emotion degree corresponding to the short text to be processed are output.     如請求項3所述的方法,其中,所述基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向,包括:確定所述正面情感度和所述負面情感度兩者中的較大情感度;判斷所述較大情感度是否大於預設置信度;若所述較大情感度大於預設置信度,則確定所述待處理短文本對應的情感傾向與所述較大情感度的情感傾向一致。     The method according to claim 3, wherein determining the emotional tendency corresponding to the short text to be processed based on the positive emotion level and the negative emotion level corresponding to the short text to be processed includes: determining the positive emotion level And the negative emotion degree; determine whether the larger emotion degree is greater than the preset reliability; if the larger emotion degree is greater than the preset reliability, determine the short text to be processed The corresponding emotional tendency is consistent with the larger emotional degree.     如請求項3所述的方法,其中,所述情感度估測模型包括:利用最大熵模型,依據至少兩個類目標識對應的若干個短文本的特徵集合訓練後得到的、輸出正面情感度和負面情感度的模型。     The method according to claim 3, wherein the sentiment estimation model includes: using a maximum entropy model and outputting a positive sentiment obtained after training according to a feature set of several short texts corresponding to at least two category identifiers And negative sentiment models.     如請求項3所述的方法,其中,在確定所述待處理短文本對應的情感傾向後,還包括: 輸出所述待處理短文本對應的情感傾向。     The method according to claim 3, wherein after determining an emotional tendency corresponding to the short text to be processed, the method further comprises: outputting an emotional tendency corresponding to the short text to be processed.     一種情感傾向的識別方法,包括:確定待處理短文本對應的特徵集合和類目標識;其中,一個文本相鄰兩個標點符號之間的文字稱為短文本;所述特徵集合中每個特徵包括:所述待處理短文本的分詞;依據與所述類目標識對應的情感度估測模型,結合待處理短文本的特徵集合,對待處理短文本進行情感度估測;其中,所述情感度估測模型為:依據所述類目標識對應的、帶有情感傾向的若干個短文本樣本訓練後得到的、輸出正面情感度和負面情感度的模型;基於所述待處理短文本對應的正面情感度和負面情感度,確定所述待處理短文本對應的情感傾向。     A method for identifying emotional tendency includes: determining a feature set and a category identifier corresponding to short text to be processed; wherein a text between two punctuation marks adjacent to a text is called short text; each feature in the feature set The method includes the word segmentation of the short text to be processed, an emotion level estimation model based on the sentiment degree estimation model corresponding to the category identifier, and a feature set of the short text to be processed, and the emotion. The degree estimation model is: a model that outputs positive sentiment and negative sentiment after training according to the short text samples corresponding to the category identifier and several short text samples with emotional tendencies; based on the corresponding short text to be processed The positive emotion degree and the negative emotion degree determine the emotion tendency corresponding to the short text to be processed.     如請求項11所述的方法,其中,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本執行分詞操作後獲得的分詞結果;利用n元語言模型對各個分詞進行分詞組合,獲得若干個組合分詞;將各個分詞和若干個組合分詞的集合,確定為所述待處理短文本的特徵集合,一個分詞對應一個特徵。     The method according to claim 11, wherein the determining a feature set corresponding to the short text to be processed includes: obtaining a segmentation result obtained after performing the word segmentation operation on the short text to be processed; and using an n-gram language model to perform each segmentation. Combine word segmentation to obtain several combined word segmentation; determine the set of each word segmentation and several combined word segmentation as the feature set of the short text to be processed, one word segment corresponds to one feature.     如請求項11所述的方法,其中,所述確定待處理短文本對應的特徵集合,包括:獲取所述待處理短文本執行分詞操作後獲得的分詞結果;將所述分詞結果,確定為所述待處理短文本的特徵集合,一個分詞對應一個特徵。     The method according to claim 11, wherein the determining a feature set corresponding to the short text to be processed comprises: obtaining a segmentation result obtained after performing a word segmentation operation on the short text to be processed; and determining the segmentation result as the The feature set of the short text to be processed is described, and one segmentation corresponds to one feature.     如請求項11所述的方法,其中,在確定所述待處理短文本對應的情感傾向後,還包括:輸出所述待處理短文本對應的情感傾向。     The method according to claim 11, wherein after determining the emotional tendency corresponding to the short text to be processed, the method further comprises: outputting the emotional tendency corresponding to the short text to be processed.     一種情感傾向的識別系統,包括:資料提供設備,用於發送若干個對象;處理器,用於接收所述資料提供設備送的若干個對象,依據若干個對象的短文本構建情感度估測模型,並利用情感度估測模型確定待處理短文本的情感傾向。     An emotion tendency recognition system includes: a data providing device for sending a plurality of objects; a processor for receiving a plurality of objects sent by the data providing device, and constructing a sentiment estimation model based on short texts of the plurality of objects , And use the sentiment estimation model to determine the sentiment tendency of short text to be processed.     如請求項15所述的系統,其中,所述處理器,還用於構建情感度估測模型與對象所屬的類目標識的對應關係。     The system according to claim 15, wherein the processor is further configured to construct a correspondence between an emotion degree estimation model and a category identifier to which the object belongs.     如請求項15所述的系統,其中,所述系統還包括接收設備;所述處理器,還用於輸出所述待處理文本的情感傾 向;所述接收設備,用於接收所述待處理文本的情感傾向。     The system according to claim 15, wherein the system further comprises a receiving device; the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the text to be processed Emotional tendencies.     一種情感傾向的識別系統,包括:資料提供設備,用於發送若干個對象;模型構建設備,用於接收所述資料提供設備送的若干個對象,依據若干個對象的短文本構建情感度估測模型,並發送所述情感度估測模型;處理器,用於接收所述情感度估測模型,並利用情感度估測模型確定待處理短文本的情感傾向。     An emotional tendency recognition system includes: a data providing device for sending several objects; a model building device for receiving several objects sent by the data providing device; and constructing an emotion degree estimation based on short texts of the objects A model, and sends the sentiment estimation model; a processor is configured to receive the sentiment estimation model and use the sentiment estimation model to determine an emotional tendency of a short text to be processed.     如請求項18所述的系統,其中,所述模型構建設備,還用於構建情感度估測模型與對象所屬的類目標識的對應關係,並將對應關係發送至所述處理器。     The system according to claim 18, wherein the model construction device is further configured to construct a correspondence between the sentiment estimation model and a category identifier to which the object belongs, and send the correspondence to the processor.     如請求項18所述的系統,其中,所述系統還包括接收設備;所述處理器,還用於輸出所述待處理文本的情感傾向;所述接收設備,用於接收所述待處理文本的情感傾向。     The system according to claim 18, wherein the system further comprises a receiving device; the processor is further configured to output the emotional tendency of the text to be processed; and the receiving device is configured to receive the text to be processed Emotional tendencies.     一種對象分類方法,包括:確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據預先訓練的類別識別模型,對所述待處理對象的特徵資訊進行類別識別;其中,所述類別識別模型為:依據若干對象樣本的特徵資訊訓練後得到的、第一類別和第二類別的分類器。     An object classification method includes: determining feature information of an object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature information includes an emotional tendency of a short text; according to a pre-trained category The recognition model performs category recognition on the feature information of the object to be processed. The category recognition model is a classifier of the first category and the second category, which is obtained by training based on the feature information of a plurality of object samples.     如請求項21所述的方法,其中,所述特徵資訊還包括:構建所述對象的第一主體的特徵資訊;和/或,所述對象所附屬於第二主體的特徵資訊。     The method according to claim 21, wherein the feature information further comprises: constructing feature information of a first subject of the object; and / or, attaching feature information belonging to the second subject to the object.     如請求項21所述的方法,其中,所述依據預先訓練的類別識別模型,對所述特徵資訊進行類別識別,包括:將所述特徵資訊輸入至所述類別識別模型;確定所述待處理對象對應的第一類別匹配度和第二類別匹配度;對所述第一類別匹配度和第二類別匹配度進行比較;若第一類別匹配度大於第二類別匹配度,則確定所述待處理對象的類別為第一類別;若第二類別匹配度大於第一類別匹配度,則確定所述待處理對象的類別為第二類別。     The method according to claim 21, wherein the classifying the feature information according to a pre-trained class recognition model comprises: entering the feature information into the class recognition model; determining the pending processing The first category matching degree and the second category matching degree corresponding to the object; comparing the first category matching degree and the second category matching degree; if the first category matching degree is greater than the second category matching degree, determining the The category of the processing object is the first category; if the matching degree of the second category is greater than the matching degree of the first category, it is determined that the category of the object to be processed is the second category.     如請求項23所述的方法,其中,還包括:在確定所述待處理對象為第一類別之後,將所述待處理對象添加至對象集合中;發送所述對象集合中的對象。     The method according to claim 23, further comprising: after determining that the object to be processed is a first category, adding the object to be processed to an object collection; and sending the objects in the object collection.     如請求項24所述的方法,其中,還包括:接收多個對象樣本,所述對象樣本來源於所述對象集合,且,滿足預設規則;將所述多個對象樣本,添加至訓練類別識別模型的已有對象樣本中;基於更新後的已有對象樣本,重新訓練類別識別模型。     The method according to claim 24, further comprising: receiving a plurality of object samples, the object samples originating from the object set, and satisfying a preset rule; adding the plurality of object samples to a training category Among the existing object samples of the recognition model; based on the updated existing object samples, the class recognition model is retrained.     一種用戶評價的分類方法,其中,包括:確定待處理用戶評價的特徵資訊;其中,所述特徵資訊包括用戶評價的文本特徵資訊、用戶評價的圖像特徵資訊、賣家的特徵資訊和買家的特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據預先訓練的梯度提升決策樹模型,對所述待處理用戶評價的特徵資訊進行類別識別;其中,所述類別識別模型為:依據若干用戶評價樣本的特徵資訊訓練後得到的、第一類用戶評價和第二類用戶評價的分類器。     A method for classifying user evaluations, comprising: determining characteristic information of user evaluations to be processed; wherein the characteristic information includes text characteristic information of user evaluations, image characteristic information of user evaluations, characteristic information of sellers, and buyer's Feature information, and the text feature information includes the emotional tendency of short text; class recognition is performed on the feature information evaluated by the user to be processed according to a pre-trained gradient promotion decision tree model; wherein the category recognition model is: Classifiers of the first type of user evaluation and the second type of user evaluation obtained after training based on the feature information of several user evaluation samples.     如請求項26所述的方法,其中,還包括: 在確定所述待處理用戶評價為第一類用戶評價之後,將所述待處理用戶評價添加至第一類用戶評價集合中;發送所述第一類用戶評價集合。     The method according to claim 26, further comprising: after determining that the to-be-processed user evaluation is a first-type user evaluation, adding the to-be-processed user evaluation to a first-type user evaluation set; and sending the The first type of user evaluation collection.     如請求項26所述的方法,其中,還包括:接收多個第一類用戶評價,所述第一類用戶評價來源於所述第一類用戶評價集合;將所述多個第一類用戶評價,添加至類別識別模型已有的用戶評價樣本中;基於更新後的已有的用戶評價樣本,重新訓練類別識別模型。     The method according to claim 26, further comprising: receiving a plurality of first-type user evaluations, the first-type user evaluations originating from the first-type user evaluation set; and combining the plurality of first-type users The evaluation is added to the existing user evaluation samples of the category recognition model; based on the updated existing user evaluation samples, the category recognition model is retrained.     一種對象分類系統,包括:資料提供設備,用於發送若干個對象;處理器,用於接收所述資料提供設備送的若干個對象,依據若干對象的特徵資訊訓練後得到、輸出第一類別和第二類別的類別識別模型;用於確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據所述類別識別模型,對所述待處理對象的特徵資訊進行類別識別;還用於輸出第一類別的對象;資料接收設備,用於接收並使用所述第一類別的對象。     An object classification system includes: a data providing device for sending a plurality of objects; a processor for receiving a plurality of objects sent by the data providing device, and obtaining and outputting a first category and A category recognition model of the second category; used to determine feature information of an object to be processed; wherein the feature information includes text feature information and image feature information, and the text feature information includes short-text emotional tendencies; The category recognition model performs category recognition on the feature information of the object to be processed; it is also used to output objects of the first category; and a data receiving device for receiving and using the objects of the first category.     一種對象分類系統,包括:資料提供設備,用於發送若干個對象;模型構建設備,用於接收所述資料提供設備送的若干個對象,依據若干個對象的特徵資訊訓練後得到、輸出第一類別和第二類別的類別識別模型,並發送所述類別識別模型;處理器,用於接收所述類別識別模型,並確定待處理對象的特徵資訊;其中,所述特徵資訊包括文本特徵資訊和圖像特徵資訊,並且,所述文本特徵資訊包括短文本的情感傾向;依據所述類別識別模型,對所述待處理對象的特徵資訊進行類別識別;還用於輸出第一類別的對象;資料接收設備,用於接收並使用所述第一類別的對象。     An object classification system includes: a data providing device for sending a plurality of objects; a model building device for receiving a plurality of objects sent by the data providing device, and obtaining and outputting a first number after training based on characteristic information of the plurality of objects A category recognition model of a category and a second category, and send the category recognition model; a processor, configured to receive the category recognition model and determine feature information of an object to be processed; wherein the feature information includes text feature information and Image feature information, and the text feature information includes an emotional tendency of a short text; class recognition is performed on the feature information of the object to be processed according to the category recognition model; and is also used to output the object of the first category; data A receiving device for receiving and using the objects of the first category.    
TW106123845A 2016-09-09 2017-07-17 Sentiment orientation recognition method, object classification method and data processing system TW201812615A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
??201610812853.4 2016-09-09
CN201610812853.4A CN107807914A (en) 2016-09-09 2016-09-09 Recognition methods, object classification method and the data handling system of Sentiment orientation

Publications (1)

Publication Number Publication Date
TW201812615A true TW201812615A (en) 2018-04-01

Family

ID=61562512

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106123845A TW201812615A (en) 2016-09-09 2017-07-17 Sentiment orientation recognition method, object classification method and data processing system

Country Status (3)

Country Link
CN (1) CN107807914A (en)
TW (1) TW201812615A (en)
WO (1) WO2018045910A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032645A (en) * 2019-04-17 2019-07-19 携程旅游信息技术(上海)有限公司 Text emotion recognition methods, system, equipment and medium

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036570B (en) * 2018-05-31 2021-08-31 云知声智能科技股份有限公司 Method and system for filtering non-medical record content of ultrasound department
CN109299782B (en) * 2018-08-02 2021-11-12 奇安信科技集团股份有限公司 Data processing method and device based on deep learning model
CN109271627B (en) * 2018-09-03 2023-09-05 深圳市腾讯网络信息技术有限公司 Text analysis method, apparatus, computer device and storage medium
CN110929026B (en) * 2018-09-19 2023-04-25 阿里巴巴集团控股有限公司 Abnormal text recognition method, device, computing equipment and medium
CN109344257B (en) * 2018-10-24 2024-05-24 平安科技(深圳)有限公司 Text emotion recognition method and device, electronic equipment and storage medium
CN109492226B (en) * 2018-11-10 2023-03-24 上海五节数据科技有限公司 Method for improving low text pre-segmentation accuracy rate of emotional tendency proportion
CN109684627A (en) * 2018-11-16 2019-04-26 北京奇虎科技有限公司 A kind of file classification method and device
CN109871807B (en) * 2019-02-21 2023-02-10 百度在线网络技术(北京)有限公司 Face image processing method and device
CN110427519A (en) * 2019-07-31 2019-11-08 腾讯科技(深圳)有限公司 The processing method and processing device of video
CN110516416B (en) * 2019-08-06 2021-08-06 咪咕文化科技有限公司 Identity authentication method, authentication end and client
CN111506733B (en) * 2020-05-29 2022-06-28 广东太平洋互联网信息服务有限公司 Object portrait generation method and device, computer equipment and storage medium
CN112069311B (en) * 2020-08-04 2024-06-11 北京声智科技有限公司 Text extraction method, device, equipment and medium
CN113450010A (en) * 2021-07-07 2021-09-28 中国工商银行股份有限公司 Method and device for determining evaluation result of data object and server
CN114443849B (en) 2022-02-09 2023-10-27 北京百度网讯科技有限公司 Labeling sample selection method and device, electronic equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510254A (en) * 2009-03-25 2009-08-19 北京中星微电子有限公司 Method for updating gender classifier in image analysis and the gender classifier
US20110251973A1 (en) * 2010-04-08 2011-10-13 Microsoft Corporation Deriving statement from product or service reviews
CN103365867B (en) * 2012-03-29 2017-07-21 腾讯科技(深圳)有限公司 It is a kind of that the method and apparatus for carrying out sentiment analysis are evaluated to user
CN102682124B (en) * 2012-05-16 2014-07-09 苏州大学 Emotion classifying method and device for text
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103455562A (en) * 2013-08-13 2013-12-18 西安建筑科技大学 Text orientation analysis method and product review orientation discriminator on basis of same
CN105095181B (en) * 2014-05-19 2017-12-29 株式会社理光 Review spam detection method and equipment
CN105069072B (en) * 2015-07-30 2018-08-21 天津大学 Hybrid subscriber score information based on sentiment analysis recommends method and its recommendation apparatus
CN105005560A (en) * 2015-08-26 2015-10-28 苏州大学张家港工业技术研究院 Maximum entropy model-based evaluation type emotion sorting method and system
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032645A (en) * 2019-04-17 2019-07-19 携程旅游信息技术(上海)有限公司 Text emotion recognition methods, system, equipment and medium

Also Published As

Publication number Publication date
WO2018045910A1 (en) 2018-03-15
CN107807914A (en) 2018-03-16

Similar Documents

Publication Publication Date Title
TW201812615A (en) Sentiment orientation recognition method, object classification method and data processing system
Cetinic et al. A deep learning perspective on beauty, sentiment, and remembrance of art
Kao et al. Visual aesthetic quality assessment with a regression model
Cheng et al. HFS: Hierarchical feature selection for efficient image segmentation
US10810494B2 (en) Systems, methods, and computer program products for extending, augmenting and enhancing searching and sorting capabilities by learning and adding concepts on the fly
TW201834462A (en) Method and apparatus for generating video data using textual data
WO2016062095A1 (en) Video classification method and apparatus
Li et al. CLMLF: A contrastive learning and multi-layer fusion method for multimodal sentiment detection
CN107832663A (en) A kind of multi-modal sentiment analysis method based on quantum theory
WO2020238229A1 (en) Transaction feature generation model training method and devices, and transaction feature generation method and devices
CN109740686A (en) A kind of deep learning image multiple labeling classification method based on pool area and Fusion Features
Hidru et al. EquiNMF: Graph regularized multiview nonnegative matrix factorization
CN114998602B (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN113627151B (en) Cross-modal data matching method, device, equipment and medium
Kerroumi et al. VisualWordGrid: information extraction from scanned documents using a multimodal approach
Li et al. Publication date estimation for printed historical documents using convolutional neural networks
Meena et al. Sentiment analysis on images using convolutional neural networks based Inception-V3 transfer learning approach
Huang et al. Learning natural colors for image recoloring
Dufourq A survey on factors affecting facial expression recognition based on convolutional neural networks
WO2020170803A1 (en) Augmentation device, augmentation method, and augmentation program
Lu et al. HEp-2 cell image classification method based on very deep convolutional networks with small datasets
Tian et al. A multitask convolutional neural network for artwork appreciation
Mazo et al. Tissues classification of the cardiovascular system using texture descriptors
Sun et al. Enabling 5G: sentimental image dominant graph topic model for cross-modality topic detection
Bai et al. Prediction model of football world cup championship based on machine learning and mobile algorithm