CN108090040B

CN108090040B - Text information classification method and system

Info

Publication number: CN108090040B
Application number: CN201611044117.5A
Authority: CN
Inventors: 郭秦龙
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2016-11-23
Filing date: 2016-11-23
Publication date: 2021-08-17
Anticipated expiration: 2036-11-23
Also published as: CN108090040A

Abstract

The embodiment of the invention discloses a text information classification method and a text information classification system, which are used for improving the accuracy of text emotion classification. The method provided by the embodiment of the invention comprises the following steps: acquiring text information; obtaining a first word segmentation, wherein the first word segmentation is obtained by performing word segmentation processing on the text information according to a first preset rule; placing the first score into a preset emotion score counter to calculate to obtain a first score; acquiring a second word segmentation, wherein the second word segmentation is obtained by performing word segmentation processing on the text information according to a second preset rule; placing the second segmentation into a preset training model to calculate to obtain a second score; when the language environment of the text information is determined according to a preset text rule, performing weight distribution on the first score and the second score by using a preset comprehensive logic; and obtaining the comprehensive score of the text information according to the weight distributed by the preset comprehensive logic, and obtaining the classification result of the text information according to the comprehensive score.

Description

Text information classification method and system

Technical Field

The present invention relates to the field of text information classification, and in particular, to a text information classification method and system.

Background

Emotion classification is a typical problem in the field of Natural Language Processing (NLP), which describes a given segment of text (which may be a sentence or an article) to determine whether the emotion expressed by the article is positive, negative, or neutral.

The sentiment classification problem itself is a topic that is widely and deeply studied both in academia and industry. The use of an emotion dictionary is one approach to solving the emotion classification problem. And manually setting scores of some emotional words, such as positive emotional words and negative emotional words. For the input text, the emotion classification of the text is determined by looking at the proportion of positive and negative emotion words.

The classification effect of the prior art is very dependent on the quality of the emotion dictionary. If the quality of the emotion dictionary is not good enough, such as some wrong word classification, or some words with fuzzy emotion classification, such as 'unexpected', the method is used in the field of household appliances, and generally indicates that the household appliances have an agnostic problem, but if the method is used in the field of movies, the method generally indicates that the movie drama is attractive.

In the prior art, a single emotion classification algorithm is utilized, flexible scoring can not be performed according to a specific field, and the emotion classification accuracy is low.

Disclosure of Invention

The embodiment of the invention provides a text information classification method and a text information classification system, which are used for improving the accuracy of text emotion classification.

A first aspect of an embodiment of the present invention provides a text information classification method, which specifically includes:

acquiring text information;

acquiring a first word, and performing word segmentation processing on the text information according to a first preset rule by the first word to obtain the first word;

placing the first score into a preset emotion score counter to calculate to obtain a first score;

acquiring a second word segmentation, and performing word segmentation processing on the text information according to a second preset rule by the second word segmentation to obtain the second word segmentation;

placing the second segmentation into a preset training model to obtain a second score;

performing weight distribution on the first score and the second score by using preset comprehensive logic;

obtaining the comprehensive score of the text information according to the weight distributed by the preset comprehensive logic,

and obtaining a classification result of the text information according to the comprehensive score.

A second aspect of the embodiments of the present invention provides a text classification system, which specifically includes:

a first acquisition unit configured to acquire text information;

the second acquisition unit is used for acquiring a first word, and the first word is obtained by performing word segmentation processing on the text information acquired by the first acquisition unit according to a first preset rule;

the first embedding unit is used for embedding the first score acquired by the second acquiring unit into a preset emotion score counter to calculate to obtain a first score;

the third obtaining unit is used for obtaining a second participle, and the second participle is obtained by carrying out participle processing on the text information obtained by the first unit according to a second preset rule;

the second embedding unit is used for embedding the second segmentation into a preset training model to obtain a second score through calculation;

the first distribution unit is used for carrying out weight distribution on the first score and the second score by utilizing preset comprehensive logic;

the calculation unit is used for obtaining the comprehensive score of the text information according to the weight distributed by the comprehensive logic;

and the processing unit is used for obtaining the classification result of the text information according to the comprehensive score obtained by the calculating unit.

A third aspect of the embodiments of the present invention provides a terminal, which specifically includes:

an input device, an output device, a processor, and a memory;

the input device performs the following steps:

acquiring text information;

the processor is used for executing the following steps by calling the operation instruction stored in the memory:

According to the technical scheme, the embodiment of the invention has the following advantages:

in the embodiment of the invention, firstly, text information is obtained; performing word segmentation processing on the text to obtain a first word segmentation; placing the first score into a preset emotion score counter to calculate to obtain a first score; performing word segmentation processing on the text to obtain a second word segmentation; placing the second segmentation into a preset training model to obtain a second score; and carrying out weight distribution on the first score and the second score by using preset comprehensive logic, obtaining a comprehensive score of the text information according to the weight distributed by the preset comprehensive logic, and obtaining a classification result of the text information according to the comprehensive score. The embodiment of the invention utilizes a serialized emotion classification method to distribute the weight of the scores obtained by different algorithms in combination with the language environment, thereby improving the accuracy of text classification.

Drawings

FIG. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention;

FIG. 2 is a diagram of an embodiment of a text information classification method according to an embodiment of the present invention;

FIG. 3 is a diagram of another embodiment of a text information classification method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an embodiment of a system in accordance with embodiments of the present invention;

FIG. 5 is a schematic diagram of another embodiment of the system in an embodiment of the invention;

fig. 6 is a schematic diagram of another embodiment of the system according to the embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the present invention may be applied to a network architecture as shown in fig. 1, in which a user may use a user device (e.g., a personal computer, a notebook computer, a tablet computer, a mobile phone, etc.) to obtain a text to be classified through a storage device, etc. And then analyzing the text needing emotion classification through a text classification system on the user equipment to obtain an analysis result.

In the embodiment of the invention, the text information to be classified is firstly acquired, then the first score of the text information is acquired by using the algorithm of the emotion dictionary, the second score of the text information is acquired by using the algorithm based on machine learning, when the language environment of the text information is determined according to the preset text rule, the first score and the second score are subjected to weight distribution by using the comprehensive logic, the comprehensive logic is obtained according to the language environment, and finally the classification result of the text information is obtained according to the weight distributed by the comprehensive logic. The embodiment of the invention utilizes a serialized emotion classification method to distribute the weight of the scores obtained by different algorithms in combination with the language environment, thereby improving the accuracy of text classification.

Referring to fig. 2, an embodiment of a text information classification method according to an embodiment of the present invention includes:

201. and acquiring text information.

In this embodiment, before the text information needs to be classified, the text information needs to be acquired first.

It should be noted that the system may obtain the text information through the internet, or may obtain the text information from other ways, for example, from a storage device, and the specific obtaining manner is not limited here.

202. And acquiring a first word segmentation.

In this embodiment, when the system acquires text information that needs emotion analysis, the text information is obtained by performing word segmentation processing on the text information according to a first preset rule, where the first preset rule is a rule for dividing a text module according to words and/or sentences, and the first word segmentation is a word segmentation set including all sub-words of the text information.

It should be noted that the first word includes words and sentences.

203. And placing the first score into a preset emotion score counter to calculate to obtain a first score.

In this embodiment, after the system obtains the first word, the system stores an emotion score counter, and the first word is placed in a preset emotion score counter to be calculated, so that the first score is obtained.

204. And acquiring a second word segmentation.

In this embodiment, after the system puts the first participle into the preset emotion score counter to calculate the first score, the first participle is screened according to a second preset rule, where the second preset rule is that after all the first participles in the first participle are compared with words stored in the preset emotion dictionary, the first participles stored in the preset emotion dictionary are screened and removed, and a set of the screened first participles is used as the second participle.

It should be noted that the second sub-word includes words and sentences.

205. And placing the second segmentation into a preset training model to calculate to obtain a second score.

In this embodiment, after the system obtains the second participle, the second participle is placed in the preset training model, and the second score is obtained through calculation, where the preset training model stores a corresponding relationship between a preset score vector and a score.

206. And performing weight distribution on the first score and the second score by using preset synthesis logic.

In this embodiment, when the first score and the second score are obtained, the system performs weight distribution on the first score and the second score by using a synthesis logic, where the preset synthesis logic is a rule set according to a language environment determined by a special word in a text and then according to the language environment.

207. And obtaining the comprehensive score of the text information according to the weight distributed by the preset comprehensive logic.

In this embodiment, after the system has utilized the synthesis logic to perform weight assignment on the first score and the second score, the synthesis score of the text message is obtained according to the weight assigned by the preset synthesis logic.

The total score is the first score and the first weight + the second score and the second weight, where the first weight is the weight assigned to the first score by the total logic, and the second weight is the weight assigned to the second score by the total logic, where the sum of the weights is 1. Typically, the first score is weighted higher than the second score.

208. And obtaining a classification result of the text information according to the comprehensive score.

In this embodiment, after the comprehensive score of the text information is obtained according to the weight assigned by the preset comprehensive logic, the classification result of the text information is obtained according to the comprehensive score.

In the embodiment of the invention, firstly, text information is obtained; performing word segmentation processing on the text to obtain a first word segmentation; placing the first score into a preset emotion score counter to calculate to obtain a first score; performing word segmentation processing on the text to obtain a second word segmentation; placing the second segmentation into a preset training model to obtain a second score; when the language environment of the text information is determined according to the preset text rule, the preset comprehensive logic is utilized to carry out weight distribution on the first score and the second score, the comprehensive score of the text information is obtained according to the weight distributed by the preset comprehensive logic, and the classification result of the text information is obtained according to the comprehensive score. The embodiment of the invention utilizes a serialized emotion classification method to distribute the weight of the scores obtained by different algorithms in combination with the language environment, thereby improving the accuracy of text classification.

Referring to fig. 3, another embodiment of the text information classification method according to the embodiment of the present invention includes:

301. and acquiring text information.

302. And acquiring a first word segmentation.

It should be noted that the first word includes words and sentences.

303. And placing the first score into a preset emotion score counter to calculate to obtain a first score.

In this embodiment, the system stores an emotion score counter, the preset emotion dictionary is included in the emotion sub-counter, a score value uniquely corresponding to a large number of words and phrases is stored in the preset emotion dictionary, the system compares a first sub-word in the acquired first sub-word with words and phrases stored in the preset emotion dictionary, and when a word or a sentence identical to a word or a sentence in the preset emotion dictionary is found, the emotion score counter adds a score uniquely corresponding to the found word or sentence, and the scores corresponding to all the sub-words are added to obtain the first score.

304. And acquiring a second word segmentation.

It should be noted that the second sub-word includes words and sentences.

305. And placing the second segmentation into a preset training model to calculate to obtain a second score.

In this embodiment, after the system obtains the second participle, the second sub-participle in the second participle is converted into a numerical vector through the preset training model, then the preset score vector closest to the numerical vector corresponding to the second sub-participle is searched from the preset score vectorization database in the preset training model, the score corresponding to the preset score vector closest to the numerical vector is used as the score of the second sub-participle, and finally the scores corresponding to the second sub-participle are added to obtain the second score. The preset score vectorization database stores the corresponding relation between the preset score vector and the score.

306. And acquiring a third score of the text information by using an emotion classification method.

In the embodiment, the system supports the expansion function, if a new appropriate algorithm is found with the evolution of a service scene in the future, the new algorithm is added as a sub-module of the algorithm through the algorithm self-defining function, and then the third score of the text information is obtained by using the emotion classification method.

It should be noted that the emotion classification method added later can be various, and is not limited herein.

307. And carrying out weight distribution on the first score, the second score and the third score by utilizing preset comprehensive logic.

In this embodiment, after the first score, the second score, and the third score are obtained, the system performs weight distribution on the first score, the second score, and the third score by using a comprehensive logic, where the preset comprehensive logic is a rule set according to a language environment determined by a special word in a text and then according to the language environment.

308. And obtaining the comprehensive score of the text information according to the weight distributed by the preset comprehensive logic.

In this embodiment, after the first score, the second score, and the third score are assigned with weights by the synthetic logic, the synthetic score of the text message is obtained according to the weights assigned by the preset synthetic logic.

The total score is a first score, a first weight, a second score, a second weight, a third score, and a third weight, where it is to be noted that the first weight is a weight assigned to the first score by the total logic, the second weight is a weight assigned to the second score by the total logic, and the third weight is a weight assigned to the third score by the total logic, where the sum of the weights is 1. Typically, the first score is weighted higher than the second score.

309. And obtaining a classification result of the text information according to the comprehensive score.

In the embodiment of the invention, the preset score threshold range of the comprehensive score is judged to obtain the judgment result, and then the classification result of the text information is obtained according to the judgment result.

The preset score threshold range of the positive emotion can be adjusted according to specific conditions, and the preset score threshold range of each emotion can also take other values, and is not limited herein.

It should be noted that the system may determine a preset score threshold range where the composite score is located to obtain a determination result, then obtain a classification result of the text information according to the determination result, and also obtain a classification result of the text information according to other determination methods, for example, determine which of the composite score and the preset emotion score is closer, specifically apply which method to obtain the classification result of the text information, and the specific place is not limited herein.

The preset emotion score may be that the positive emotion is 2 points, the neutral emotion is 0 points, and the negative emotion is-2 points, and the specific emotion score value may be adjusted according to the actual application situation, and is not limited here.

It should be noted that, when the language environment of the text information cannot be determined according to the preset text rule, the user-defined logic may be used to assign the weights, where the user-defined logic is a logic input by the user through the parameter configuration port.

For example, if the emotion text in a business scenario is mostly positive, and only a small amount is negative, then the classification effect can be improved by inputting custom logic to make all classifiers negative, and considering the text result as negative.

It should be noted that, when the language environment of the text information cannot be determined according to the preset text rule, besides the user-defined logic input by the user may be installed to perform weight assignment on each text information, there are other assignment methods, for example, directly configuring the weight assignment as an average assignment, and specifically using which assignment method, which is not limited herein.

Wherein the average assignment is such that the first score is assigned a weight of 0.5 and the second score is also assigned a weight of 0.5, wherein the final score is the first score, the first weight + the second score, and the final score is the first score, the second weight, and the final score is 0.5+ the second score, 0.5. Wherein the sum of the weights is 1. If there are multiple weights, each weight is 1 ÷ the number of weights.

In the embodiment of the invention, firstly, text information is obtained; performing word segmentation processing on the text to obtain a first word segmentation; placing the first score into a preset emotion score counter to calculate to obtain a first score; performing word segmentation processing on the text to obtain a second word segmentation; placing the second segmentation into a preset training model to obtain a second score; and acquiring a third score of the text information by using an emotion classification method. When the language environment of the text information is determined according to the preset text rule, the first score, the second score and the third score are subjected to weight distribution by using preset comprehensive logic, the comprehensive score of the text information is obtained according to the weight distributed by the preset comprehensive logic, and the classification result of the text information is obtained according to the comprehensive score. The embodiment of the invention utilizes a serialized emotion classification method to distribute the weight of the scores obtained by different algorithms in combination with the language environment, thereby improving the accuracy of text classification.

For ease of understanding, the present embodiment is described below with reference to specific application scenarios:

scenario 1, the system takes a text message, "Xiaoli is a good exam, her father knows this message and is happy with it. "

The system carries out word segmentation processing on the text to obtain four words of 'test', 'good', 'happy' and 'incapable of being described', then the four words are put into an emotion score counter to be searched, two words of 'good' and 'happy' are found, and corresponding scores are respectively 3 points and 4 points. The first score is therefore 3+4 to 7. Then, the four words of 'examination', 'good', 'happy', and 'incapable of being described' are screened to obtain two words of 'examination' and 'incapable of being described', then the two words are placed into a preset training model and converted into corresponding numerical vectors, after the preset score vector with the closest distance between the two numerical vectors is obtained through calculation, the scores corresponding to the two corresponding preset score vectors are obtained, the score corresponding to the numerical vector corresponding to the 'incapable of being described' is-2 points, the score corresponding to the numerical vector corresponding to the 'examination' is 0 point, and the second score is obtained through calculation and is-2 points. And analyzing the two words of 'test' and 'good' according to the memory language environment template to obtain the language environment of the text, wherein the language environment belongs to the narrative text, and the logic corresponding to the narrative text is that the first score corresponds to the weight of 0.7, and the second score corresponds to the weight of 0.3, so that the comprehensive score is 7 x 0.7+ (-2) x 0.3-4.3. The preset score threshold range of the forward emotion is (-100, -1), the preset score threshold range of the neutral emotion is (1, 1), the preset score threshold range of the forward emotion is (1, 100), and 4.3 points are in the preset score threshold range of the forward emotion, so that the system classifies the text as the text of the forward emotion.

Scene 2, the system acquires text information, and the movie that ' the former days and friends see ' crazy stones ' is originally thought to be boring, but the movie is really unexpected for me. "

The system carries out word segmentation processing on the text to obtain four words of 'friend', 'boring', 'movie' and 'unexpected', then the four words are put into an emotion score counter to be searched, the word of 'boring' is found, and corresponding scores are-3. The first score is-2. Then, the four words of 'friend', 'boring', 'movie', 'unexpected' are screened to obtain three words of 'friend', 'movie', 'unexpected', then the three words are placed in a preset training model and are converted into corresponding numerical vectors, after the preset score vectors with the three numerical vectors closest to each other are calculated, scores corresponding to the corresponding three preset score vectors are obtained, the score corresponding to the numerical vector corresponding to the 'friend' is 1 score, the score corresponding to the numerical vector corresponding to the 'movie' is 0 score, the score corresponding to the numerical vector corresponding to the 'unexpected' is 1 score, and a second score is 2 score. The method comprises the steps of analyzing two words of 'movie' and 'unexpected' according to a memory language environment template to obtain the language environment of a text, wherein the text belongs to the field of movies, the logics corresponding to the text in the field of movies are that a first score corresponds to the weight of 0.2, a second score corresponds to the weight of 0.8, a comprehensive score of-2 x 0.2+2 x 0.8 ═ 1.2 is obtained, a preset score threshold range of forward emotion is (-100, -1), a preset score threshold range of neutral emotion is (1, 1), a preset score threshold range of forward emotion is (1, 100), and 1.2 is in the preset score threshold range of forward emotion, so that the text is classified as the text of the forward emotion by a system.

The text information classification method in the embodiment of the present invention is described above, and a system in the embodiment of the present invention is described below with reference to fig. 4, where the system in the embodiment of the present invention includes:

a first acquisition unit 401 configured to acquire text information;

a second obtaining unit 402, configured to obtain a first word, where the first word is obtained by performing word segmentation processing on the text information obtained by the first obtaining unit according to a first preset rule;

a first embedding unit 403, configured to embed the first score obtained by the second obtaining unit into a preset emotion score counter to obtain a first score through calculation;

a third obtaining unit 404, configured to obtain a second participle, where the second participle is obtained by performing participle processing on the text information obtained by the first unit according to a second preset rule;

a second embedding unit 405, configured to embed the second score into a preset training model to obtain a second score;

the first distributing unit 406 is configured to, when the language environment of the text information is determined according to the preset text rule, perform weight distribution on the first score and the second score by using a preset comprehensive logic;

the calculation unit 407 is configured to obtain a comprehensive score of the text information according to the weight assigned by the comprehensive logic;

the processing unit 408 is configured to obtain a classification result of the text information according to the comprehensive score obtained by the calculating unit.

In the embodiment of the present invention, first, a first obtaining unit 401 obtains text information; the second obtaining unit 402 obtains a first word segmentation by performing word segmentation processing on the text; the first embedding unit 403 embeds the first word into a preset emotion score counter to obtain a first score through calculation; the third obtaining unit 404 obtains a second word by performing word segmentation processing on the text; the second embedding unit 405 embeds the second segmentation into a preset training model to obtain a second score through calculation; when the language environment of the text information is determined according to the preset text rule, the first distribution unit 406 performs weight distribution on the first score and the second score by using preset comprehensive logic, and the calculation unit 407 obtains the comprehensive score of the text information according to the weight distributed by the comprehensive logic; the processing unit 408 derives a classification result of the text information based on the composite score derived by the calculation unit. The embodiment of the invention utilizes a serialized emotion classification method to distribute the weight of the scores obtained by different algorithms in combination with the language environment, thereby improving the accuracy of text classification.

Referring to fig. 5, another embodiment of the system according to the embodiment of the present invention includes:

a first obtaining unit 501, configured to obtain text information;

a second obtaining unit 502, configured to obtain a first word, where the first word is obtained by performing word segmentation processing on the text information obtained by the first obtaining unit according to a first preset rule;

a first embedding unit 503, configured to embed the first score obtained by the second obtaining unit into a preset emotion score counter to obtain a first score through calculation;

the first embedding unit 503 includes:

a searching subunit 5031, configured to search, in the preset emotion dictionary, whether a first sub-participle exists, where the first sub-participle is included in the first participle;

the extracting sub-unit 5032 is configured to, when the finding unit finds that the first sub-participle exists, extract a score corresponding to the existing first sub-participle, and preset a corresponding relationship between the first sub-participle and the score in the emotion dictionary;

the first calculating sub-unit 5033 is configured to calculate, according to the preset emotion score counter, a score corresponding to the first sub-participle to obtain a first score.

A third obtaining unit 504, configured to obtain a second participle, where the second participle is obtained by performing participle processing on the text information obtained by the first unit according to a second preset rule;

a second embedding unit 505, configured to embed the second score into a preset training model to obtain a second score;

the second insertion unit 505 includes:

a conversion subunit 5051, configured to convert the second sub-participle into a numerical vector according to a preset training model, where the second sub-participle is included in the second participle;

a second calculating subunit 5052, configured to calculate a distance between the numerical vector and the preset fraction vector;

a determination subunit 5053 configured to determine a preset score vector closest to the numeric vector as a score of the second sub-participle;

and a third computing subunit 5054, configured to add the scores corresponding to the second sub-participles to obtain a second score.

A fourth obtaining unit 506, configured to obtain the third score of the text information by using an emotion classification method, where the emotion classification method is configured according to the change of the language environment.

A second assigning unit 507 for assigning a weight to the first score, the second score and the third score using a preset integration logic.

A calculating unit 508, configured to obtain a comprehensive score of the text information according to the weight assigned by the comprehensive logic;

a processing unit 509, configured to obtain a classification result of the text information according to the comprehensive score obtained by the calculating unit.

Wherein the processing unit 509 comprises:

a second determining subunit 5091, configured to determine a preset score threshold range where the comprehensive score is located, to obtain a determination result;

and the processing subunit 5092 is configured to obtain a text classification result according to the determination result.

In the embodiment of the present invention, first, a first obtaining unit 501 obtains text information; the second obtaining unit 502 obtains a first word segmentation by performing word segmentation processing on the text; the first embedding unit 503 embeds the first score into a preset emotion score counter to obtain a first score through calculation; the third obtaining unit 504 performs word segmentation processing on the text to obtain a second word segmentation; the second embedding unit 505 embeds the second segmentation into a preset training model to obtain a second score; the fourth obtaining unit 506 obtains the third score of the text information by using the emotion classification method. When the language environment of the text information is determined according to the preset text rule, the second allocating unit 507 performs weight allocation on the first score, the second score and the third score by using preset comprehensive logic, the calculating unit 508 obtains a comprehensive score of the text information according to the weight allocated by the preset comprehensive logic, and the processing unit 509 obtains a classification result of the text information according to the comprehensive score. The embodiment of the invention utilizes a serialized emotion classification method to distribute the weight of the scores obtained by different algorithms in combination with the language environment, thereby improving the accuracy of text classification.

Referring to fig. 6, fig. 6 is a schematic diagram of a server structure according to an embodiment of the present invention, the server 600 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 622 (e.g., one or more processors) and a memory 632, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 622 may be configured to communicate with the storage medium 630 and execute a series of instruction operations in the storage medium 630 on the server 600.

The server 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, and/or one or more operating systems 641, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 6.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A text information classification method is characterized by comprising the following steps:

acquiring text information;

obtaining a first word segmentation, wherein the first word segmentation is obtained by performing word segmentation processing on the text information according to a first preset rule;

obtaining a second word segmentation, wherein the second word segmentation is obtained by screening the first word segmentation according to a second preset rule;

placing the second segmentation into a preset training model to calculate to obtain a second score;

performing weight distribution on the first score and the second score by utilizing preset comprehensive logic based on the first score, wherein the preset comprehensive logic is a rule set according to a language environment in a text;

obtaining a comprehensive score of the text information according to the weight distributed by the preset comprehensive logic;

2. The method of claim 1, wherein the step of placing the first score into a preset sentiment score counter to calculate a first score comprises:

searching whether a first sub-participle exists in a preset emotion dictionary arranged in the preset emotion score counter, wherein the first sub-participle is contained in the first participle, and the preset emotion dictionary stores the corresponding relation between the first sub-participle and the score;

if the first sub-participle exists, extracting a score corresponding to the first existing sub-participle;

and calculating the score corresponding to the first sub-participle according to the preset emotion score counter to obtain the first score.

3. The method for classifying text information according to claim 1, wherein the step of placing the second score into a preset training model to obtain a second score comprises:

converting a second sub-participle into a numerical vector according to a preset training model, wherein the second sub-participle is contained in the second participle;

calculating the distance between the numerical value vector and a preset fraction vector;

taking the score corresponding to the preset score vector closest to the numerical vector as the score of the second sub-participle;

and adding the scores corresponding to the second sub-participles to obtain the second score.

4. The method of claim 1, wherein the deriving the classification result of the text message according to the composite score comprises:

judging a preset score threshold range in which the comprehensive score is positioned to obtain a judgment result;

and obtaining the classification result of the text information according to the judgment result.

5. The method according to any one of claims 1 to 4, wherein after the text information is acquired, the method further comprises:

and acquiring a third score of the text information by using an emotion classification method, wherein the emotion classification method is configured according to language environment change.

6. A text classification system, comprising:

a first acquisition unit configured to acquire text information;

the second obtaining unit is used for obtaining a first word segmentation, and the first word segmentation is obtained by performing word segmentation processing on the text information according to a first preset rule;

the first embedding unit is used for embedding the first segmentation into a preset emotion score counter to obtain a first score through calculation;

the third obtaining unit is used for obtaining a second participle, and the second participle is obtained by screening the first participle according to a second preset rule;

the first distribution unit is used for carrying out weight distribution on the first score and the second score by utilizing preset comprehensive logic based on the first score, and the preset comprehensive logic is a rule set according to a language environment in a text;

and the processing unit is used for obtaining the classification result of the text information according to the comprehensive score.

7. The system of claim 6, wherein the first placement unit comprises:

the searching subunit is used for searching whether a first sub-participle exists in a preset emotion dictionary, and the first sub-participle is contained in the first participle;

the extracting subunit is configured to, when the searching unit searches for the first sub-participle, extract a score corresponding to the first sub-participle, where a corresponding relationship between the first sub-participle and the score is stored in the preset emotion dictionary;

and the first calculating subunit is used for calculating the score corresponding to the first sub-participle according to the preset emotion score counter to obtain the first score.

8. The system of claim 6, wherein the second placement unit comprises:

the conversion subunit is used for converting a second sub-participle into a numerical vector according to a preset training model, wherein the second sub-participle is contained in the second participle;

the second calculating subunit is used for calculating the distance between the numerical value vector and a preset fraction vector;

the first determining subunit is used for taking the score corresponding to the preset score vector closest to the numerical vector as the score of the second sub-participle;

and the third calculating subunit is used for adding the scores corresponding to the second sub-participles to obtain the second score.

9. The system of claim 6, wherein the processing unit comprises:

the second determining subunit is used for judging the preset score threshold range in which the comprehensive score is positioned to obtain a judgment result;

and the processing subunit is used for obtaining a text classification result according to the judgment result.

10. The system according to any one of claims 6 to 8, further comprising:

and the fourth acquisition unit is used for acquiring a third score of the text information by utilizing an emotion classification method, wherein the emotion classification method is configured according to the language environment change.