CN104572613A

CN104572613A - Data processing device, data processing method and program

Info

Publication number: CN104572613A
Application number: CN201310495278.6A
Authority: CN
Inventors: 孙健; 夏迎炬; 王云芝; 李中华
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-10-21
Filing date: 2013-10-21
Publication date: 2015-04-29

Abstract

The invention discloses a data processing device for judging whether or not a text released in a social service network by a user is a question. The data processing device comprises a topic feature acquisition unit, an emotion feature acquisition unit, a question label feature extraction unit and a classifier. The topic feature acquisition unit is configured to be used for acquiring topic features of the text according to a pre-trained topic model; the emotion feature acquisition unit is configured to be used for acquiring emotion features of the text according to a pre-trained emotion model; the question label feature extraction unit is configured to be used for acquiring question label features of the text; the classifier is configured to be used for classifying the text according to the topic features, the emotion features and the question label features.

Description

Data processing equipment, data processing method and program

Technical field

The disclosure relates to data processing field, relates to particularly, relates to a kind of for judging that whether text that user in social service network issues is the data processing equipment of problem, data processing method and program.In addition, the disclosure also relates to the method for a kind of training for the topic model in above-mentioned data processing equipment, data processing method or program, and a kind of training is used for the method for emotion model wherein.

Background technology

In social service network, such as in the social networks such as microblogging, facebook, user usually issues some viewpoints for certain topic, comment, evaluation etc.Such as, user may issue some views for physical health issues or emotion expression service.Therefore, a kind of method that identification problem is provided is needed.

Summary of the invention

Give hereinafter about brief overview of the present invention, to provide about the basic comprehension in some of the present invention.Should be appreciated that this general introduction is not summarize about exhaustive of the present invention.It is not that intention determines key of the present invention or pith, and nor is it intended to limit the scope of the present invention.Its object is only provide some concept in simplified form, in this, as the preorder in greater detail discussed after a while.

In view of the demand described in background technology part, the present invention pays close attention to the text apparatus and method whether problem identifies issued user in social service network.Particularly, the present invention proposes a kind of by using the model of training in advance to obtain correlated characteristic in text thus judging that whether the text is data processing equipment and the method for problem based on these correlated characteristics.

According to an aspect of the present invention, provide a kind of for judging that whether text that user in social service network issues is the data processing equipment of problem, comprise: theme feature acquiring unit, be configured to utilize the topic model of training in advance to obtain the theme feature of text; Affective characteristics acquiring unit, is configured to utilize the emotion model of training in advance to obtain the affective characteristics of text; Question mark feature extraction unit, is configured to the question mark feature obtaining text; And sorter, be configured to utilize theme feature, affective characteristics and question mark feature to classify to text.

According to another aspect of the present invention, providing a kind of for judging that whether text that user in social service network issues is the data processing method of problem, comprising: utilize the topic model of training in advance to obtain the theme feature of text; The emotion model of training in advance is utilized to obtain the affective characteristics of text; Obtain the question mark feature of text; And use sorter to utilize theme feature, affective characteristics and question mark feature to classify to text.

According to a further aspect of the invention, additionally providing a kind of training for judging that whether text in social service network is the method for the topic model of problem, comprising: prepare expertise corpus; Participle is carried out to each text in expertise corpus; Extract the keyword of one or more notional word in text as the theme of reflection text; And calculate using lower probability at least partially as topic model: text, keyword and theme, and the probability of aforementioned every various combinations, joint probability or conditional probability.

According to another aspect of the present invention, additionally providing a kind of training for judging that whether text in social service network is the method for the emotion model of problem, comprising: prepare for whether being the problem data collection that problem marked; Participle is carried out to each text that problem data is concentrated; Extract one or more non-noun in text and/or symbol as the emotion word of the Sentiment orientation of reflection text and/or symbol; Calculate using lower probability at least partially as emotion model: text, emotion word and/or symbol and Sentiment orientation, and the probability of aforementioned every various combinations, joint probability or conditional probability.

According to other side of the present invention, additionally provide corresponding computer program code, computer-readable recording medium and computer program.

By below in conjunction with accompanying drawing the following detailed description of the embodiment of the present invention, these and other advantage of the present invention will be more obvious.

Accompanying drawing explanation

In order to set forth above and other advantage and the feature of the application further, be described in further detail below in conjunction with the embodiment of accompanying drawing to the application.Described accompanying drawing comprises in this manual together with detailed description below and forms the part of this instructions.The element with identical function and structure is denoted by like references.Should be appreciated that these accompanying drawings only describe the typical case of the application, and should not regard the restriction of the scope to the application as.In the accompanying drawings:

Fig. 1 shows the structured flowchart of the data processing equipment of an embodiment according to the application;

Fig. 2 shows the structured flowchart according to the theme feature acquiring unit in the data processing equipment of an embodiment of the application;

Fig. 3 shows the schematic diagram of the production process of the topic model of an embodiment according to the application;

Fig. 4 shows the structured flowchart according to the affective characteristics acquiring unit in the data processing equipment of an embodiment of the application;

Fig. 5 shows the schematic diagram of the production process of the emotion model of an embodiment according to the application;

Fig. 6 shows the process flow diagram of the data processing method of an embodiment according to the application;

Fig. 7 shows the process flow diagram according to the theme feature obtaining step in the process disposal route of an embodiment of the application;

Fig. 8 shows the process flow diagram according to the affective characteristics obtaining step in the process disposal route of an embodiment of the application;

Fig. 9 shows the process flow diagram of the topic model training method of an embodiment according to the application;

Figure 10 shows the process flow diagram of the emotion model training method of an embodiment according to the application; And

Figure 11 is the block diagram of the example arrangement of the general purpose personal computer that wherein can realize method and/or device according to an embodiment of the invention.

Embodiment

To be described one exemplary embodiment of the present invention by reference to the accompanying drawings hereinafter.For clarity and conciseness, all features of actual embodiment are not described in the description.But, should understand, must make a lot specific to the decision of embodiment in the process of any this practical embodiments of exploitation, to realize the objectives of developer, such as, meet those restrictive conditions relevant to system and business, and these restrictive conditions may change to some extent along with the difference of embodiment.In addition, although will also be appreciated that development is likely very complicated and time-consuming, concerning the those skilled in the art having benefited from present disclosure, this development is only routine task.

At this, also it should be noted is that, in order to avoid the present invention fuzzy because of unnecessary details, illustrate only in the accompanying drawings with according to the closely-related device structure of the solution of the present invention and/or treatment step, and eliminate other details little with relation of the present invention.

Description is hereinafter carried out in the following order:

1. data processing equipment

2. data processing method

3. topic model training method

4. emotion model training method

5. in order to implement the computing equipment of the apparatus and method of the application

[1. data processing equipment]

First with reference to Fig. 1, the structure according to the data processing equipment 100 of an embodiment of the application is described.As shown in Figure 1, data processing equipment 100 comprises: theme feature acquiring unit 101, is configured to utilize the topic model of training in advance to obtain the theme feature of text; Affective characteristics acquiring unit 102, is configured to utilize the emotion model of training in advance to obtain the affective characteristics of text; Question mark feature extraction unit 103, is configured to the question mark feature obtaining text; And sorter 104, be configured to utilize theme feature, affective characteristics and question mark feature to classify to text.

Particularly, when usage data treating apparatus 100 judges whether the text that user issues is problem, theme feature acquiring unit 101, affective characteristics acquiring unit 102 and interrogative marker characteristic extraction unit 103 obtain its theme feature, affective characteristics and question mark feature respectively from the text, then sorter uses these features obtained to classify to the text, namely judges whether the text is problem.

Wherein, theme feature represents one or more theme involved by the text, and affective characteristics represents the Sentiment orientation of the publisher that the text reflects, and question mark feature refers in text the word or symbol etc. that represent question.These features how obtaining text will be described in the following description in detail.

First the 26S Proteasome Structure and Function of theme feature acquiring unit 101 and affective characteristics acquiring unit 102 is described below in detail with reference to Fig. 2 to 5.

< theme feature acquiring unit >

As shown in Figure 2, theme feature acquiring unit 101 comprises word-dividing mode 1001, is configured to carry out participle to text; Keyword extracting module 1002, is configured to extract the keyword of one or more notional word in text as the theme of reflection text; And theme feature computing module 1003, be configured to utilize topic model to calculate the theme feature of text based on keyword, wherein, topic model comprises with lower probability at least partially: text, keyword and theme, and the probability of aforementioned every various combinations, joint probability or conditional probability.

Wherein, word-dividing mode 1001 can use existing various technology to carry out participle to pending text.For the text after participle, keyword extracting module 1002 extracts the keyword of the notional word such as noun, verb, adjective as the theme of reflection text of text, and these keywords may be used for the theme feature calculating text.

In some text, there is the part being called thematic indicia, such as, content between two No. # is arranged as thematic indicia.Therefore, keyword extracting module 1002 be also configured to extract the content with thematic indicia at least partially as the keyword of the theme of reflection text.

As previously mentioned, theme feature computing module 1003 utilizes the topic model of training in advance to calculate the theme feature of text based on extracted keyword.Wherein, topic model obtains by carrying out training to expertise corpus.Expertise corpus has certain intellectual thus can with the set of helping the language material that problematic user deals with problems, these language materials can be such as the knowledge microbloggings etc. of expert intelligent, and these experts comprise the customer service representative etc. of the intelligent that provides knowledge in a certain field or some brand, company.

For each language material (text) in expertise corpus, first carry out participle, then extract the notional words such as noun, verb, adjective as keyword.Because every bar text is all to express one or more theme, or in order to solve a class problem, or provide a kind of technical support, therefore, these keywords reflect the theme of text.In other words, between text layers and keyword layer, there is subject layer.This subject layer can not explicitly point out but implicit expression, and therefore theme can be hidden variable.

In one embodiment, production model can be set up to text, theme and keyword, such as, use PLSA or LDA model etc.Such as, in PLSA model, the Probability p (d) of a selected text, each text belongs to a theme t with Probability p (t|d), and a given theme t, each keyword w produce with Probability p (w|t), as shown in Figure 3.By using each language material, this production model is trained, can obtain as lower probability at least partially as topic model: text (d), keyword (w) and theme (t), and the probability of aforementioned every various combinations, joint probability or conditional probability.

Illustrate how to obtain topic model for PLSA model below.As mentioned above, one or more theme of each text representation in corpus, and for each theme, all need keyword to fill, the production process shown in composition graphs 3, following joint probability expression formula can be obtained:

p (w, t, d) = p (d) Σ_{k = 1}^{N} p (w | t_{k}) * p (t_{k} | d) - - - (1)

Wherein, N is the number of theme, t _krepresent a kth theme, its dependent variable has the implication identical with aforementioned definitions.Using maximum likelihood probability and EM(greatest hope) derivation algorithm solves p (t|d) and p (w|t).The objective function of maximum likelihood probability is as shown in the formula shown in (2).

L = Σ_{i = 1}^{N} Σ_{j = 1}^{M} n (d_{j}, w_{j}) * \log (p (d_{i}, w_{j})) - - - (2)

Wherein, n (d _i, w _j) represent i-th text d _ia middle jth keyword w _jnumber, N and M represents the quantity of text and keyword respectively.

First expected by E() step tries to achieve:

p (t_{k} | d_{i}, w_{j}) = \frac{p (w_{j} | t_{k}) p (t_{k}) p (d_{i} | t_{k})}{Σ_{k^{'}} p (w_{j} | t_{k^{'}}) p (t_{k^{'}}) p (d_{i} | t_{k^{'}})} - - - (3)

Then M(maximization is carried out) step:

p (w_{j} | d_{j}) &Proportional; \underset{d}{Σ} n (d_{i}, w_{j}) * p (t_{k} | d_{i}, w_{j})

p (d_{i} | t_{k}) &Proportional; \underset{w}{Σ} n (d_{i}, w_{j}) * p (t_{k} | d_{i}, w_{j}) - - - (4)

p (t_{k}) &Proportional; \underset{d}{Σ} \underset{w}{Σn} (d_{i}, w_{j}) * p (t_{k} | d_{i}, w_{j})

By above-mentioned algorithm, Probability p (d), p (w|t can be obtained _k), p (t _k| d) etc.

Therefore, as an example, topic model can comprise: the conditional probability (p (t|d)) of the theme of the probability (p (d)) of each text, the text premised on each text, with the joint probability (p (d of the keyword in the theme of the probability (p (w|t)) theming as the keyword in the text of prerequisite of each text and each text, text and text, t, w)).

Based on this, in one embodiment, theme feature computing module 1003 is configured to the conditional probability of each theme calculated premised on text.

Exemplarily, this conditional probability p (t|d) is with this product theming as the conditional probability of each keyword of prerequisite and the prior probability of this theme, is shown below.

p (t | d) = \frac{p (d | t) \times p (t)}{p (d)} &Proportional; p (d | t) \times p (t) = \underset{w &Element; T}{Π} p {(w | t)}^{n (w, T)} \times p (t) - - - (5)

Wherein, p (d|t) is the probability of the text premised on theme t, and p (w|t) is the probability of the keyword w premised on theme t, subscript n (w, T) represent the number of times that keyword w occurs in text T, p (t) is the prior probability of theme t.This prior probability is in expertise corpus, after training production model, and the distribution situation of each theme obtained.The conditional probability p (t|d) obtained as the theme feature of text for classification.

It should be noted that the mode producing topic model is not limited to above-mentioned PLSA or LDA, but any mode that can obtain above-mentioned various probability can be used.

< affective characteristics acquiring unit >

As shown in Figure 4, affective characteristics acquiring unit 102 comprises: word-dividing mode 2001, is configured to carry out participle to text; Emotion word and/or symbol extraction module 2002, be configured to extract one or more non-noun in text and/or symbol as the emotion word of the Sentiment orientation of reflection text and/or symbol; And affective characteristics computing module 2003, be configured to utilize emotion model to calculate the affective characteristics of text based on emotion word and/or symbol, wherein, emotion model comprises with lower probability at least partially: text, emotion word and/or symbol and Sentiment orientation, and the probability of aforementioned every various combinations, joint probability or conditional probability.

Wherein, word-dividing mode 2001 and aforementioned word-dividing mode 1001 have similar function and structure, and existing various technology can be used to carry out participle to pending text.In concrete enforcement, word-dividing mode 2001 and word-dividing mode 1001 can be same module or element, also can be disparate modules or the element with identical function.

For the text after participle, emotion word and/or symbol extraction module 2002 extract symbol such as emotion animation or emoticon etc. in other words of non-noun and text as the emotion word of the Sentiment orientation of reflection text and/or symbol.Therefore, in one embodiment, for the text comprising symbol and represent, after participle, also needing these symbol transition is corresponding word, such as " goes mad ", " having cried soon ", " laugh " etc.In the following description, for simplicity, emotion word and/or symbol are referred to as emotion word.

In addition, in one embodiment, emotion word and/or symbol extraction module 2002 are also configured to the emotion word of the most contiguous negative word and/or symbol transition is its antonym.Negative word includes, but are not limited to: avoid, be not, or not, can not have, can not, be difficult to, not too, reduce, do not have, no longer, could not, change, how can, how can, cannot, seldom.Such as, for text " today goes to have seen " A Fanda ", really do not allow me disappointed ", wherein, " disappointment " is pessimistic word, but the words is not downbeat mood, thus " disappointment " is become its antonym such as " hope " as emotion word.

Generally speaking, Sentiment orientation has three kinds: positive emotion (such as actively, optimistic), negative emotion (such as passive, pessimistic) and neutral emotion.The text that user issues embodies above-mentioned certain or several emotion by emotion word.

As previously mentioned, affective characteristics computing module 2003 utilizes the emotion model of training in advance to calculate the affective characteristics of text based on extracted emotion word.Wherein, emotion model obtains by carrying out training to the problem data collection of artificial mark.Particularly, use artificial mark means to mark the text such as microblogging text that the user captured and arrange issues in advance, the label of mark can be such as that { 0,1}, wherein such as 0 represents that text is not problem, and 1 represents that text is problem.The training of emotion model be use label be 1 text, i.e. question text carry out.

For each question text, first carry out participle, the emotion word of the most contiguous negative word, as emotion word, can also be converted to its antonym by other words then extracting non-noun and the word that is converted by symbols such as emotion animations in this process.After the emotion word obtaining text, need to calculate the model of these emotion word and various Sentiment orientation (front, negative and neutral).

In one embodiment, be similar to the situation of topic model, production model can be set up to text, Sentiment orientation and emotion word, such as, use PLSA or LDA model etc.By using each question text, this production model is trained, can obtain as lower probability at least partially as emotion model: text (d), emotion word (w) and Sentiment orientation (s), and the probability of aforementioned every various combinations, joint probability or conditional probability.

Illustrate how to obtain emotion model for PLSA model below.Concentrate in problem data, the Probability p (d) of a selected text, each text belongs to a class emotion s with Probability p (s|d), and a given class emotion s, each emotion word w produce with Probability p (w|s), as shown in Figure 5.It is following joint probability expression formula (6) by this procedural representation.

p (w, s, d) = p (d) Σ_{k = 1}^{3} p (w | s_{k}) * p (s_{k} | d) - - - (6)

Wherein, there are 3 kinds of Sentiment orientation described above, s _krepresent s kind Sentiment orientation, its dependent variable has the implication identical with aforementioned definitions.Using maximum likelihood probability and EM(greatest hope) derivation algorithm solves p (s|d) and p (w|s).The objective function of maximum likelihood probability is as shown in the formula shown in (7).

L = Σ_{i = 1}^{N} Σ_{j = 1}^{M} n (d_{j}, w_{j}) * \log (p (d_{i}, w_{j})) - - - (7)

Wherein, n (d _i, w _j) represent a jth emotion word w in i-th text di _jnumber, N and M represents the quantity of text and emotion word respectively.

First expected by E() step tries to achieve:

p (s_{k} | d_{i}, w_{j}) = \frac{p (w_{j} | s_{k}) p (s_{k}) p (d_{i} | s_{k})}{Σ_{k^{'}} p (w_{j} | s_{k^{'}}) p (s_{k^{'}}) p (d_{i} | t_{k^{'}})} - - - (8)

Then M(maximization is carried out) step:

p (w_{j} | d_{j}) &Proportional; \underset{d}{Σ} n (d_{i}, w_{j}) * p (s_{k} | d_{i}, w_{j})

p (d_{i} | s_{k}) &Proportional; \underset{w}{Σ} n (d_{i}, w_{j}) * p (s_{k} | d_{i}, w_{j}) - - - (9)

p (s_{k}) &Proportional; \underset{d}{Σ} \underset{w}{Σn} (d_{i}, w_{j}) * p (s_{k} | d_{i}, w_{j})

By above-mentioned algorithm, Probability p (d), p (w|s can be obtained _k), p (s _k| d) etc.

As an example, emotion model can comprise: the joint probability (p (d of the emotion word in the Sentiment orientation of the probability (p (w|s)) of the emotion word in the conditional probability (p (s|d)) of the probability (p (d)) of each text, each Sentiment orientation premised on each text, the text premised on the Sentiment orientation of each text and each text, text and text, s, w)).

Based on this, in one embodiment, affective characteristics computing module 2003 is configured to the conditional probability of each Sentiment orientation calculated premised on text.

Exemplarily, this conditional probability is the product of the conditional probability of each emotion word premised on this Sentiment orientation and the prior probability of this Sentiment orientation, is shown below.

p (s | d) = \frac{p (d | s) \times p (s)}{p (d)} &Proportional; p (d | s) \times p (s) = \underset{w &Element; T}{Π} p {(w | s)}^{n (w, T)} \times p (s) - - - (10)

Wherein, p (d|s) is the probability of the text premised on Sentiment orientation s, p (w|s) is the probability of the emotion word w premised on Sentiment orientation s, subscript n (w, T) represent the number of times that emotion word w occurs in text T, p (s) is the prior probability of Sentiment orientation s.This prior probability concentrates in the problem data marked, after training production model, and the distribution situation of each Sentiment orientation obtained.The conditional probability p (s|d) obtained as the affective characteristics of text for classification.

It should be noted that similarly, the mode producing emotion model is not limited to above-mentioned PLSA or LDA, but can use any mode that can obtain above-mentioned various probability.

< question mark feature extraction unit >

Question mark feature extraction unit 103 extracts the question mark feature in text.This can use existing any extracting method to carry out.This question mark feature can be interrogative such as what, how etc. or query symbol such as question mark.This feature such as can use Boolean type, and { 0,1} limits, and namely represents its presence or absence.

< sorter >

After as above obtaining the theme feature of text, affective characteristics and question mark feature, sorter 104 utilizes these features to classify to text, namely predicts that the text is problem or is not problem.

The sorter that existing various sorting technique can be used to build is classified to text, includes but not limited to: support vector machine, random forest, decision tree, K k-nearest neighbor, maximum entropy etc.

Owing to considering theme feature, affective characteristics and question mark feature in assorting process comprehensively, therefore, compared with conventional apparatus, the data processing equipment 100 of the application can obtain classification results more accurately, namely can judge with higher accuracy whether the text that user issues is problem.

[2. data processing method]

Below describe the embodiment according to data processing equipment of the present invention by reference to the accompanying drawings, in fact also illustrate a kind of data processing method in the process.Briefly describe described methods combining accompanying drawing 6 to 8 below, details wherein can see above to the description of data processing equipment.

This data processing method is for judging whether the text that in social service network, user issues is problem, and as shown in Figure 6, the method comprises the steps: to utilize the topic model of training in advance to obtain the theme feature (S11) of text; The emotion model of training in advance is utilized to obtain the affective characteristics (S12) of text; Obtain the question mark feature (S13) of text; And use sorter to utilize theme feature, affective characteristics and question mark feature to classify (S14) to text.

Although it should be noted that in Fig. 6 and illustrated that step S11 to S13 performs in turn, and not necessarily is like this, but can with other various orders or partly or entirely parallel execution of steps S11 to S13.

Wherein, in one embodiment, topic model utilizes production model to obtain based on expertise corpus training in advance, and emotion model utilizes production model to obtain based on the problem data collection training in advance manually marked.

Fig. 7 shows a kind of embodiment of theme feature obtaining step S11, and as shown in Figure 7, step S11 comprises: carry out participle (S101) to text; Extract the keyword (S102) of one or more notional word in text as the theme of reflection text; And utilize topic model to calculate the theme feature (S103) of text based on keyword.Wherein, this topic model comprises with lower probability at least partially: text, keyword and theme, and the probability of aforementioned every various combinations, joint probability or conditional probability.As mentioned above, subject layer is the hidden layer between text layers and keyword layer.

In one embodiment, step S102 also comprise extract have the content of thematic indicia at least partially as keyword.Exemplarily, the conditional probability that theme feature S103 comprises each theme calculated premised on text is calculated.In one embodiment, this conditional probability is with the corresponding product theming as the conditional probability of each keyword of prerequisite and the prior probability of this theme.

Fig. 8 shows a kind of embodiment of affective characteristics obtaining step S12, and as shown in Figure 8, step S12 comprises: carry out participle (S201) to text; Extract one or more non-noun in text and/or symbol as the emotion word of the Sentiment orientation of reflection text and/or symbol (S202); And utilize emotion model to calculate the affective characteristics (S203) of text based on emotion word and/or symbol.Wherein, this emotion model comprises with lower probability at least partially: text, emotion word and/or symbol and Sentiment orientation, and the probability of aforementioned every various combinations, joint probability or conditional probability.Wherein, Sentiment orientation can comprise positive emotion, negative emotion and neutral emotion three class.

In one embodiment, step S202 also comprises the emotion word of the most contiguous negative word and/or symbol transition is its antonym.Exemplarily, the conditional probability that affective characteristics S203 comprises each Sentiment orientation calculated premised on text is calculated.In one embodiment, this conditional probability is the product of each emotion word premised on corresponding Sentiment orientation and/or the conditional probability of symbol and the prior probability of this Sentiment orientation.

Correlative detail in above embodiment provides in detail in the description of data processing equipment, does not repeat them here.

[3. topic model training method]

In the process of foregoing description data processing equipment 100, in fact a kind of topic model training method is also disclosed, this topic model is for judging whether the text in social service network is problem, and as shown in Figure 9, the method comprises: prepare expertise corpus (S21); Participle (S22) is carried out to each text in expertise corpus; Extract the keyword (S23) of one or more notional word in text as the theme of reflection text; And calculate using lower probability at least partially as topic model: text, keyword and theme, and the probability of aforementioned every various combinations, joint probability or conditional probability (S24).

In one embodiment, this topic model comprises: the conditional probability of the theme of the probability of each text, the text premised on each text, with the joint probability of the keyword in the theme of the probability theming as the keyword in the text of prerequisite of each text and each text, text and text.

Train the topic model obtained to reflect text, distribution relation between theme and keyword by said method, thus may be used for the theme feature calculating text to be predicted.Correlative detail in above embodiment provides in detail in the description of data processing equipment, does not repeat them here.

[4. emotion model training method]

In the process of foregoing description data processing equipment 100, in fact a kind of emotion model training method is also disclosed, this emotion model is for judging whether the text in social service network is problem, as shown in Figure 10, whether the method comprises: prepare for being the problem data collection (S31) that problem marked; Participle (S32) is carried out to each text that described problem data is concentrated; Extract one or more non-noun in text and/or symbol as the emotion word of the Sentiment orientation of the described text of reflection and/or symbol (S33); Calculate using lower probability at least partially as described emotion model: text, emotion word and/or symbol and Sentiment orientation, and the probability of aforementioned every various combinations, joint probability or conditional probability (S34).

In one embodiment, this emotion model comprises: the emotion word in the Sentiment orientation of the emotion word in the conditional probability of the probability of each text, each Sentiment orientation premised on each text, the text premised on the Sentiment orientation of each text and/or the probability of symbol and each text, text and text and/or the joint probability of symbol.

Train the emotion model obtained to reflect text, distribution relation between Sentiment orientation and emotion word by said method, thus may be used for the affective characteristics calculating text to be predicted.Correlative detail in above embodiment provides in detail in the description of data processing equipment, does not repeat them here.

[5. in order to implement the computing equipment of the apparatus and method of the application]

In said apparatus, all modules, unit are configured by software, firmware, hardware or its mode combined.Configure spendable concrete means or mode is well known to those skilled in the art, do not repeat them here.When being realized by software or firmware, to the computing machine (multi-purpose computer 1100 such as shown in Figure 11) with specialized hardware structure, the program forming this software is installed from storage medium or network, this computing machine, when being provided with various program, can perform various functions etc.

In fig. 11, CPU (central processing unit) (CPU) 1101 performs various process according to the program stored in ROM (read-only memory) (ROM) 1102 or from the program that storage area 1108 is loaded into random-access memory (ram) 1103.In RAM1103, also store the data required when CPU1101 performs various process etc. as required.CPU1101, ROM1102 and RAM1103 are connected to each other via bus 1104.Input/output interface 1105 is also connected to bus 1104.

Following parts are connected to input/output interface 1105: importation 1106(comprises keyboard, mouse etc.), output 1107(comprises display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.), storage area 1108(comprises hard disk etc.), communications portion 1109(comprises network interface unit such as LAN card, modulator-demodular unit etc.).Communications portion 1109 is via network such as the Internet executive communication process.As required, driver 1110 also can be connected to input/output interface 1105.Removable media 1111 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed on driver 1110 as required, and the computer program therefrom read is installed in storage area 1108 as required.

When series of processes above-mentioned by software simulating, from network such as the Internet or storage medium, such as removable media 1111 installs the program forming software.

It will be understood by those of skill in the art that this storage medium is not limited to wherein having program stored therein shown in Figure 11, distributes the removable media 1111 to provide program to user separately with equipment.The example of removable media 1111 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Or hard disk that storage medium can be ROM1102, comprise in storage area 1108 etc., wherein computer program stored, and user is distributed to together with comprising their equipment.

The present invention also proposes a kind of program product storing the instruction code of machine-readable.When described instruction code is read by machine and performs, the above-mentioned method according to the embodiment of the present invention can be performed.

Correspondingly, be also included within of the present invention disclosing for carrying the above-mentioned storage medium storing the program product of the instruction code of machine-readable.Described storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc.

Finally, also it should be noted that, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.In addition, when not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.

Although describe embodiments of the invention in detail by reference to the accompanying drawings above, it should be understood that embodiment described above is just for illustration of the present invention, and be not construed as limiting the invention.For a person skilled in the art, can make various changes and modifications above-mentioned embodiment and not deviate from the spirit and scope of the invention.Therefore, scope of the present invention is only limited by appended claim and equivalents thereof.

By above-mentioned description, The embodiment provides following technical scheme.

Whether remarks 1. 1 kinds is the data processing equipment of problem for the text that judges user in social service network and issue, comprising:

Theme feature acquiring unit, is configured to utilize the topic model of training in advance to obtain the theme feature of described text;

Affective characteristics acquiring unit, is configured to utilize the emotion model of training in advance to obtain the affective characteristics of described text;

Question mark feature extraction unit, is configured to the question mark feature obtaining described text; And

Sorter, is configured to utilize described theme feature, described affective characteristics and described question mark feature to classify to described text.

The data processing equipment of remarks 2. according to remarks 1, wherein, described theme feature acquiring unit comprises:

Word-dividing mode, is configured to carry out participle to described text;

Keyword extracting module, is configured to extract the keyword of one or more notional word in described text as the theme of the described text of reflection; And

Theme feature computing module, is configured to utilize described topic model to calculate the theme feature of described text based on described keyword,

Wherein, described topic model comprises with lower probability at least partially: text, keyword and theme, and the probability of aforementioned every various combinations, joint probability or conditional probability.

The data processing equipment of remarks 3. according to remarks 1, wherein, described affective characteristics acquiring unit comprises:

Word-dividing mode, is configured to carry out participle to described text;

Emotion word and/or symbol extraction module, be configured to extract one or more non-noun in described text and/or symbol as the emotion word of the Sentiment orientation of the described text of reflection and/or symbol; And

Affective characteristics computing module, is configured to utilize described emotion model to calculate the affective characteristics of described text based on described emotion word and/or symbol,

Wherein, described emotion model comprises with lower probability at least partially: text, emotion word and/or symbol and Sentiment orientation, and the probability of aforementioned every various combinations, joint probability or conditional probability.

The data processing equipment of remarks 4. according to remarks 2, wherein, described theme feature computing module is configured to the conditional probability of each theme calculated premised on described text.

The data processing equipment of remarks 5. according to remarks 3, wherein, described affective characteristics computing module is configured to the conditional probability of each Sentiment orientation calculated premised on described text.

The data processing equipment of remarks 6. according to remarks 4, wherein, the conditional probability of the theme premised on described text is theme as the product of the conditional probability of each keyword of prerequisite and the prior probability of this theme with this.

The data processing equipment of remarks 7. according to remarks 5, wherein, the conditional probability of the Sentiment orientation premised on described text is the product of each emotion word premised on this Sentiment orientation and/or the conditional probability of symbol and the prior probability of this Sentiment orientation.

The data processing equipment of remarks 8. according to remarks 2, wherein, described keyword extracting module be configured to extract the content with thematic indicia at least partially as the described keyword of theme of the described text of reflection.

The data processing equipment of remarks 9. according to remarks 3, wherein, it is its antonym that described emotion word and/or symbol extraction module are also configured to the emotion word of the most contiguous negative word and/or symbol transition.

Whether remarks 10. 1 kinds is the data processing method of problem for the text that judges user in social service network and issue, comprising:

The topic model of training in advance is utilized to obtain the theme feature of described text;

The emotion model of training in advance is utilized to obtain the affective characteristics of described text;

Obtain the question mark feature of described text; And

Sorter is used to utilize described theme feature, described affective characteristics and described question mark feature to classify to described text.

The data processing method of remarks 11. according to remarks 10, wherein, obtains described theme feature and comprises:

Participle is carried out to described text;

Extract the keyword of one or more notional word in described text as the theme of the described text of reflection; And

Described topic model is utilized to calculate the theme feature of described text based on described keyword,

The data processing method of remarks 12. according to remarks 10, wherein, obtains described affective characteristics and comprises:

Participle is carried out to described text;

Extract one or more non-noun in described text and/or symbol as the emotion word of the Sentiment orientation of the described text of reflection and/or symbol; And

Described emotion model is utilized to calculate the affective characteristics of described text based on described emotion word and/or symbol,

The data processing method of remarks 13. according to remarks 11, wherein, calculates the conditional probability that described theme feature comprises each theme calculated premised on described text.

The data processing method of remarks 14. according to remarks 12, wherein, calculates the conditional probability that described affective characteristics comprises each Sentiment orientation calculated premised on described text.

The data processing method of remarks 15. according to remarks 13, wherein, the conditional probability of the theme premised on described text is theme as the product of the conditional probability of each keyword of prerequisite and the prior probability of this theme with this.

The data processing equipment of remarks 16. according to remarks 14, wherein, the conditional probability of the Sentiment orientation premised on described text is the product of each emotion word premised on this Sentiment orientation and/or the conditional probability of symbol and the prior probability of this Sentiment orientation.

Remarks 17. 1 kinds training, for judging that whether text in social service network is the method for the topic model of problem, comprising:

Prepare expertise corpus;

Participle is carried out to each text in described expertise corpus;

Extract the keyword of one or more notional word in text as the theme of the described text of reflection; And

Calculate using lower probability at least partially as described topic model: text, keyword and theme, and the probability of aforementioned every various combinations, joint probability or conditional probability.

The method of remarks 18. according to remarks 17, wherein, described topic model comprises: the conditional probability of the theme of the probability of each text, the text premised on each text, with the joint probability of the keyword in the theme of the probability theming as the keyword in the text of prerequisite of each text and each text, text and text.

Remarks 19. 1 kinds training, for judging that whether text in social service network is the method for the emotion model of problem, comprising:

Prepare for whether being the problem data collection that problem marked;

Participle is carried out to each text that described problem data is concentrated;

Extract one or more non-noun in text and/or symbol as the emotion word of the Sentiment orientation of the described text of reflection and/or symbol;

Calculate using lower probability at least partially as described emotion model: text, emotion word and/or symbol and Sentiment orientation, and the probability of aforementioned every various combinations, joint probability or conditional probability.

The method of remarks 20. according to remarks 19, wherein, described emotion model comprises: the emotion word in the Sentiment orientation of the emotion word in the conditional probability of the probability of each text, each Sentiment orientation premised on each text, the text premised on the Sentiment orientation of each text and/or the probability of symbol and each text, text and text and/or the joint probability of symbol.

Claims

1., for judging that whether text that user in social service network issues is a data processing equipment for problem, comprising:

2. data processing equipment according to claim 1, wherein, described theme feature acquiring unit comprises:

Word-dividing mode, is configured to carry out participle to described text;

3. data processing equipment according to claim 1, wherein, described affective characteristics acquiring unit comprises:

Word-dividing mode, is configured to carry out participle to described text;

4. data processing equipment according to claim 2, wherein, described theme feature computing module is configured to the conditional probability of each theme calculated premised on described text.

5. data processing equipment according to claim 3, wherein, described affective characteristics computing module is configured to the conditional probability of each Sentiment orientation calculated premised on described text.

6. data processing equipment according to claim 4, wherein, the conditional probability of the theme premised on described text is theme as the product of the conditional probability of each keyword of prerequisite and the prior probability of this theme with this.

7. data processing equipment according to claim 5, wherein, the conditional probability of the Sentiment orientation premised on described text is the product of each emotion word premised on this Sentiment orientation and/or the conditional probability of symbol and the prior probability of this Sentiment orientation.

8., for judging that whether text that user in social service network issues is a data processing method for problem, comprising:

Obtain the question mark feature of described text; And

9. training is for judging that whether text in social service network is a method for the topic model of problem, comprising:

Prepare expertise corpus;

Participle is carried out to each text in described expertise corpus;

10. method according to claim 9, wherein, described topic model comprises: the conditional probability of the theme of the probability of each text, the text premised on each text, with the joint probability of the keyword in the theme of the probability theming as the keyword in the text of prerequisite of each text and each text, text and text.