CN106874291A - The processing method and processing device of text classification - Google Patents

The processing method and processing device of text classification Download PDF

Info

Publication number
CN106874291A
CN106874291A CN201510921141.1A CN201510921141A CN106874291A CN 106874291 A CN106874291 A CN 106874291A CN 201510921141 A CN201510921141 A CN 201510921141A CN 106874291 A CN106874291 A CN 106874291A
Authority
CN
China
Prior art keywords
probability
text
subordinate
classification
sorting technique
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510921141.1A
Other languages
Chinese (zh)
Inventor
何鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510921141.1A priority Critical patent/CN106874291A/en
Priority to PCT/CN2016/107313 priority patent/WO2017097118A1/en
Publication of CN106874291A publication Critical patent/CN106874291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of processing method and processing device of text classification.The method includes:Classification treatment is carried out to pending text using the first sorting technique, the first text categories to be confirmed and the first subordinate probability is obtained;According to the first subordinate probability and the first history subordinate probability calculation first object probability;Judge first object probability whether higher than predetermined threshold value;And when first object probability is less than predetermined threshold value, classification treatment is carried out to pending text using at least one sorting technique different from the first sorting technique successively, untill the destination probability that calculates is greater than or equal to predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.By the application, solve the problems, such as in correlation technique in order to the accuracy lifted to text classification causes the treatment effeciency to text classification low.

Description

The processing method and processing device of text classification
Technical field
The application is related to text-processing field, in particular to a kind of processing method and processing device of text classification.
Background technology
Text classification is one of vital task of natural language processing, and similar to the trade classification of article, sentiment analysis etc. are permitted Many natural language processing tasks its substantially be all text classification.It is at present, either rule-based to be also based on machine learning, The method for processing text classification problem has a lot.Generally, classification treatment is carried out to text using a kind of sorting technique, is obtained To classification results, output category result.Then the accurate for the treatment of of classifying is carried out to text only with a kind of sorting technique Property is relatively low.A series of sorting techniques are employed in order to lift the accuracy classified to text, in correlation technique, it is intended to Using multiple, less accurately sorting technique carries out classification treatment to text, obtains multiple classification results.Then it is right again Each classification result is voted, and selects the classification result of highest ticket as output.This method is very big Compensate for simply using a deficiency for sorting technique in degree, but regardless of whether be necessary, the method is for each The text of input is required for, using multiple sorting techniques, causing the decline to text-processing performance.
For in correlation technique in order to the accuracy lifted to text classification causes low to the treatment effeciency of text classification asking Topic, not yet proposes effective solution at present.
The content of the invention
The main purpose of the application is to provide a kind of processing method and processing device of text classification, to solve to be in correlation technique Lifting causes the problem low to the treatment effeciency of text classification to the accuracy of text classification.
To achieve these goals, according to the one side of the application, there is provided a kind of processing method of text classification.Should Method includes:Classification treatment is carried out to pending text using the first sorting technique, obtain the first text categories to be confirmed and First subordinate probability, wherein, the first subordinate probability is to judge that pending text belongs to first and treats really according to the first sorting technique Recognize the probability of text categories;According to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, First history subordinate probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database; Judge first object probability whether higher than predetermined threshold value;And when first object probability is less than predetermined threshold value, use successively At least one sorting techniques different from the first sorting technique carry out classification treatment to pending text, until the mesh for calculating Mark probability is greater than or equal to untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.
Further, before classification treatment is carried out to pending text using the first sorting technique, the method also includes: It is determined that carrying out various sorting techniques of classification treatment to pending text;And obtain the classification side of various sorting technique compositions Method set, wherein, sorting technique set includes the first sorting technique.
Further, included according to the first subordinate probability and the first history subordinate probability calculation first object probability:By first Subordinate probability and the first history subordinate probability multiplication, obtain first object subordinate probability;By the first non-dependent probability and first History non-dependent probability multiplication, obtains first object non-dependent probability, wherein, the first non-dependent probability is according to first point Class method judges that pending text is not belonging to the probability of the first text categories to be confirmed, and the first history non-dependent probability is default The pending text stored in database is not belonging to the probability of the first text categories to be confirmed;By first object subordinate probability with First object non-dependent probability is added, and obtains the sub- probability of first object;And by first object subordinate probability and first object Sub- probability is divided by, and obtains first object probability.
Further, in the text categories to be confirmed that will be finally given as after target text classification, the method also includes: With the destination probability that finally calculates update the history corresponding with the final sorting technique for using stored in presetting database from Category probability.
Further, in the text categories to be confirmed that will be finally given as after target text classification, the method also includes: Export target text classification to destination address.
To achieve these goals, according to the another aspect of the application, there is provided a kind of processing unit of text classification.Should Device includes:Processing unit, for carrying out classification treatment to pending text using the first sorting technique, obtains first and treats Confirm text categories and the first subordinate probability, wherein, the first subordinate probability is to judge pending text according to the first sorting technique Originally the probability of the first text categories to be confirmed is belonged to;Computing unit, for according to the first subordinate probability and the first history subordinate Probability calculation first object probability, wherein, the first history subordinate probability is the pending text category of storage in presetting database In the probability of the first text categories to be confirmed;Judging unit, for judging first object probability whether higher than predetermined threshold value; And first determining unit, for when first object probability is less than predetermined threshold value, successively using with the first sorting technique not Same at least one sorting technique carries out classification treatment to pending text, until the destination probability for calculating is greater than or equal to Untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.
Further, the device also includes:Second determining unit, for determining to carry out treatment of classifying to pending text Various sorting techniques;And acquiring unit, the sorting technique set for obtaining various sorting technique compositions, wherein, point The set of class method includes the first sorting technique.
Further, computing unit includes:First computing module, for the first subordinate probability and the first history subordinate is general Rate is multiplied, and obtains first object subordinate probability;Second computing module, for the first non-dependent probability and the first history is non- Subordinate probability multiplication, obtains first object non-dependent probability, wherein, the first non-dependent probability is according to the first sorting technique Judge that pending text is not belonging to the probability of the first text categories to be confirmed, the first history non-dependent probability is presetting database The pending text of middle storage is not belonging to the probability of the first text categories to be confirmed;3rd computing module, for by the first mesh Mark subordinate probability is added with first object non-dependent probability, obtains the sub- probability of first object;And the 4th computing module, use It is divided by by the sub- probability of first object subordinate probability and first object, obtains first object probability.
Further, the device also includes:Updating block, for updating preset data with the destination probability for finally calculating The history subordinate probability corresponding with the final sorting technique for using stored in storehouse.
Further, the device also includes:Output unit, for exporting target text classification to destination address.
By the application, using following steps:Classification treatment is carried out to pending text using the first sorting technique, is obtained First text categories to be confirmed and the first subordinate probability, wherein, the first subordinate probability is to be judged to treat according to the first sorting technique Treatment text belongs to the probability of the first text categories to be confirmed;According to the first subordinate probability and the first history subordinate probability calculation First object probability, wherein, the first history subordinate probability is treated for the pending text stored in presetting database belongs to first Confirm the probability of text categories;Judge first object probability whether higher than predetermined threshold value;And when first object probability is less than During predetermined threshold value, pending text is classified using at least one sorting technique different from the first sorting technique successively Treatment, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and the text class to be confirmed that will be finally given Not as target text classification, solve in correlation technique in order to the accuracy lifted to text classification causes to text classification The low problem for the treatment of effeciency.By introducing destination probability, the corresponding target text of pending text is determined according to destination probability This type, to make up and process determination target text type using only a kind of sorting technique and be effectively reduced by unnecessary many Subseries processing method goes to determine target text type, and then has reached in lifting to the accuracy of text classification while also carrying The effect to the treatment effeciency of text classification is risen.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing further understanding of the present application, the schematic implementation of the application Example and its illustrate for explaining the application, do not constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow chart of the processing method of the text classification according to the embodiment of the present application;And
Fig. 2 is the schematic diagram of the processing unit of the text classification according to the embodiment of the present application.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can be mutual Combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application in it is attached Figure, is clearly and completely described, it is clear that described embodiment is only to the technical scheme in the embodiment of the present application It is the embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, this area is common The every other embodiment that technical staff is obtained under the premise of creative work is not made, should all belong to the application guarantor The scope of shield.
It should be noted that term " first ", " second " in the description and claims of this application and above-mentioned accompanying drawing Etc. being for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so as to embodiments herein described herein.Additionally, term " including " A series of " having " and their any deformation, it is intended that covering is non-exclusive to be included, for example, containing steps The process of rapid or unit, method, system, product or equipment are not necessarily limited to those steps or the unit clearly listed, and Be may include not list clearly or for these processes, method, product or other intrinsic steps of equipment or unit.
According to embodiments herein, there is provided a kind of processing method of text classification.
Fig. 1 is the flow chart of the processing method of the text classification according to the embodiment of the present application.As shown in figure 1, the method bag Include following steps:
Step S101, classification treatment is carried out using the first sorting technique to pending text, obtains the first text class to be confirmed Not with the first subordinate probability, wherein, the first subordinate probability is to judge that pending text belongs to first according to the first sorting technique The probability of text categories to be confirmed.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, treated using the first sorting technique Before treatment text carries out classification treatment, the method also includes:It is determined that carrying out various points of classification treatment to pending text Class method;And the sorting technique set that various sorting techniques are constituted is obtained, wherein, sorting technique set includes first point Class method.
In natural language processing, there are a variety of methods for the processing method of text classification, such as using linguistic rules, Using the various sorting techniques of machine learning, logistic regression, naive Bayesian, SVMs, random forest etc. are more Sorting technique is planted, various sorting techniques constitute sorting technique set.For example, choosing the logistic regression in sorting technique set Sorting technique is classified as the first sorting technique to pending text, obtains the first text categories to be confirmed.For example, First text categories to be confirmed can be that the text type belonging to pending text is emotional category.System can judge to use The text type that one sorting technique carries out belonging to the pending text that classification treatment is obtained to pending text is the general of accuracy Rate (i.e. the first subordinate probability).
Step S102, according to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, first History subordinate probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, according to the first subordinate probability and first History subordinate probability calculation first object probability includes:By the first subordinate probability and the first history subordinate probability multiplication, obtain First object subordinate probability;By the first non-dependent probability and the first history non-dependent probability multiplication, obtain first object it is non-from Category probability, wherein, the first non-dependent probability is that to judge that pending text is not belonging to first according to the first sorting technique to be confirmed The probability of text categories, the first history non-dependent probability is treated for the pending text stored in presetting database is not belonging to first Confirm the probability of text categories;First object subordinate probability is added with first object non-dependent probability, first object is obtained Sub- probability;And the sub- probability of first object subordinate probability and first object is divided by, obtain first object probability.
First object probability is the probability that the pending text for calculating belongs to the first text categories to be confirmed.First history from Category probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database;First subordinate is general Rate is to judge that pending text belongs to the probability of the first text categories to be confirmed according to the first sorting technique in system.Therefore two It is the first history subordinate probability and the to think that the pending text belongs to the probability of the first text categories to be confirmed under the conditions of individual The product of one subordinate probability.
For example, the probability that the pending text stored in presetting database belongs to the first text categories to be confirmed is 0.6 (first History subordinate probability), that is, judge that the probability that pending text is not belonging to the first text categories to be confirmed is 0.4 (the first history Non-dependent probability);Judge that pending text belongs to the probability of the first text categories to be confirmed according to the first sorting technique in system It is that 0.8 (the first subordinate probability), i.e. system judge that the probability that pending text is not belonging to the first text categories to be confirmed is 0.2 (the first non-dependent probability);(pending text belongs to the first text to be confirmed to calculate first object probability according to data above The probability of this classification)=(0.6*0.8)/(0.6*0.8+0.4*0.2)=0.857, calculate pending text and be not belonging to first The probability of text categories to be confirmed=(0.4*0.2)/(0.6*0.8+0.4*0.2)=0.143.
Whether step S103, judge first object probability higher than predetermined threshold value.
Predetermined threshold value can be the value that user or party in request set according to the satisfaction to classification feature.For example preset Threshold value is 0.8.
Step S104, when first object probability be less than predetermined threshold value when, successively using it is different from the first sorting technique at least A kind of sorting technique carries out classification treatment to pending text, until the destination probability for calculating is greater than or equal to predetermined threshold value Untill, and the text categories to be confirmed that will be finally given are used as target text classification.
Specifically, when first object probability is less than predetermined threshold value, pending text is divided using the second sorting technique Class treatment, it is for instance possible to use Naive Bayes Classification method, obtains the second text categories to be confirmed and the second subordinate is general Rate, wherein, the second subordinate probability is to judge that pending text belongs to the second text categories to be confirmed according to the second sorting technique Probability;According to the second subordinate probability and second history subordinate the second destination probability of probability calculation, wherein, the second history from Category probability belongs to the probability of the second text categories to be confirmed for the pending text stored in presetting database;Judge the second mesh Whether mark probability is higher than predetermined threshold value, if being judged as YES, using the second text categories to be confirmed as target text classification, If being judged as NO, continue other sorting techniques using non-first sorting technique and the second sorting technique according to obtaining mistake above Journey carries out classification treatment to pending text, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and will The text categories to be confirmed for finally giving are used as target text classification.
For example, predetermined threshold value is 0.9, the above-mentioned first object probability for calculating is 0.857, judges first object probability Less than predetermined threshold value, then system then will be considered that the first text categories to be confirmed in the current first text categories to be confirmed are not Target text type, correspondingly, system can be using the second sorting technique (such as:Naive Bayes Classification method) treat place Reason text carries out classification treatment, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and will finally give Text categories to be confirmed as target text classification.
If predetermined threshold value is 0.8, the above-mentioned first object probability for calculating is 0.857, judges that first object probability is high In predetermined threshold value, it is determined that the first text categories to be confirmed are the target text classification belonging to pending text.For example, really It is emotion class to make the text type belonging to pending text.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, in the text to be confirmed that will be finally given After this classification is as target text classification, the method also includes:Export target text classification to destination address.
Text type belonging to pending text is exported to destination address, display or user enter to it in destination address Row analyzing and processing.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, in the text to be confirmed that will be finally given After this classification is as target text classification, the method also includes:Preset data is updated with the destination probability for finally calculating The history subordinate probability corresponding with the final sorting technique for using stored in storehouse.
Store in presetting database corresponding with the final sorting technique for using is updated by the destination probability that will finally calculate History subordinate probability, it is ensured that in presetting database store history subordinate probability accuracy.
In this application, destination probability is introduced by above step, determines that pending text is corresponding according to destination probability Target text type, to make up and process determination target text type using only a kind of sorting technique and be effectively reduced by not Necessary repeatedly classification processing method goes to determine target text type, and then has reached same to the accuracy of text classification in lifting When also improve effect to the treatment effeciency of text classification.
The processing method of the text classification that the embodiment of the present application is provided, is entered by using the first sorting technique to pending text Row classification is processed, and obtains the first text categories to be confirmed and the first subordinate probability, wherein, the first subordinate probability is according to the One sorting technique judges that pending text belongs to the probability of the first text categories to be confirmed;According to the first subordinate probability and first History subordinate probability calculation first object probability, wherein, the first history subordinate probability be presetting database in store wait locate Reason text belongs to the probability of the first text categories to be confirmed;Judge first object probability whether higher than predetermined threshold value;And work as When first object probability is less than predetermined threshold value, treated using at least one sorting technique different from the first sorting technique successively Treatment text carries out classification treatment, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and is obtained final The text categories to be confirmed for arriving are solved in correlation technique to be lifted to the accurate of text classification as target text classification Property causes the problem low to the treatment effeciency of text classification.By introducing destination probability, determined according to destination probability pending The corresponding target text type of text, to make up and process determination target text type using only a kind of sorting technique and effectively subtract Lack and gone to determine target text type by unnecessary multiple classification processing method, and then reached in lifting to text classification Accuracy also improve effect to the treatment effeciency of text classification simultaneously.
It should be noted that can be in the such as one group meter of computer executable instructions the step of the flow of accompanying drawing is illustrated Performed in calculation machine system, and, although logical order is shown in flow charts, but in some cases, can be with Shown or described step is performed different from order herein.
The embodiment of the present application additionally provides a kind of processing unit of text classification, it is necessary to illustrate, the embodiment of the present application The processing unit of text classification can be used for performing the processing method for text classification that the embodiment of the present application is provided.With Under the processing unit of text classification that the embodiment of the present application is provided is introduced.
Fig. 2 is the schematic diagram of the processing unit of the text classification according to the embodiment of the present application.As shown in Fig. 2 the device bag Include:Processing unit 10, computing unit 20, the determining unit 40 of judging unit 30 and first.
Processing unit 10, for carrying out classification treatment to pending text using the first sorting technique, obtains first to be confirmed Text categories and the first subordinate probability, wherein, the first subordinate probability is to judge pending text category according to the first sorting technique In the probability of the first text categories to be confirmed.
Computing unit 20, for according to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, First history subordinate probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database.
Judging unit 30, for judging first object probability whether higher than predetermined threshold value.
First determining unit 40, for when first object probability be less than predetermined threshold value when, successively using with the first sorting technique Different at least one sorting techniques carry out classification treatment to pending text, until the destination probability for calculating is higher than or waits Untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.
The processing unit of the text classification that the embodiment of the present application is provided, is treated by processing unit 10 using the first sorting technique Treatment text carries out classification treatment, obtains the first text categories to be confirmed and the first subordinate probability, wherein, the first subordinate is general Rate is to judge that pending text belongs to the probability of the first text categories to be confirmed according to the first sorting technique;Computing unit 20 According to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, the first history subordinate probability is pre- If the pending text stored in database belongs to the probability of the first text categories to be confirmed;Judging unit 30 judges the first mesh Whether mark probability is higher than predetermined threshold value;And first determining unit 40 when first object probability be less than predetermined threshold value when, successively Classification treatment is carried out to pending text using at least one sorting technique different from the first sorting technique, until calculating Destination probability greater than or equal to untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text class Not, solve in correlation technique in order to the accuracy lifted to text classification causes low to the treatment effeciency of text classification asking Topic, by introducing destination probability, the corresponding target text type of pending text is determined according to destination probability, and making up only makes Processed with a kind of sorting technique and determine target text type and be effectively reduced to be gone by unnecessary multiple classification processing method Determine target text type, and then reached in lifting to the accuracy of text classification while also improving to text classification The effect for the treatment of effeciency.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, the device also includes:Second determines Unit, for determine to pending text carry out classify treatment various sorting techniques;And acquiring unit, for obtaining The sorting technique set of various sorting technique compositions, wherein, sorting technique set includes the first sorting technique.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, computing unit 20 includes:First meter Module is calculated, for by the first subordinate probability and the first history subordinate probability multiplication, obtaining first object subordinate probability;Second Computing module, for by the first non-dependent probability and the first history non-dependent probability multiplication, obtaining first object non-dependent general Rate, wherein, the first non-dependent probability is to judge that pending text is not belonging to the first text to be confirmed according to the first sorting technique The probability of classification, the first history non-dependent probability is to be confirmed for the pending text stored in presetting database is not belonging to first The probability of text categories;3rd computing module, for first object subordinate probability to be added with first object non-dependent probability, Obtain the sub- probability of first object;And the 4th computing module, for by first object subordinate probability and the sub- probability of first object It is divided by, obtains first object probability.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, the device also includes:Updating block, The corresponding with the final sorting technique for using of storage is gone through in for updating presetting database with the destination probability for finally calculating History subordinate probability.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, the device also includes:Output unit, For exporting target text classification to destination address.
The processing unit of the text classification includes processor and memory, and above-mentioned processing unit, computing unit, judgement are single Unit and the first determining unit etc. in memory, are stored in memory as program unit storage by computing device Said procedure unit realizes corresponding function.Above-mentioned predetermined threshold value, presetting database may be stored in memory.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one or More than, process text classification by adjusting kernel parameter.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM), memory is deposited including at least one Storage chip.
Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, it is suitable to Perform the program code of initialization there are as below methods step:Classification treatment is carried out to pending text using the first sorting technique, The first text categories to be confirmed and the first subordinate probability are obtained, wherein, the first subordinate probability is to be sentenced according to the first sorting technique Fixed pending text belongs to the probability of the first text categories to be confirmed;According to the first subordinate probability and the first history subordinate probability First object probability is calculated, wherein, the first history subordinate probability is that the pending text stored in presetting database belongs to the The probability of one text categories to be confirmed;Judge first object probability whether higher than predetermined threshold value;And when first object probability During less than predetermined threshold value, pending text is carried out using at least one sorting technique different from the first sorting technique successively Classification is processed, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and the text to be confirmed that will be finally given This classification is used as target text classification.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the application is not limited by described sequence of movement, Because according to the application, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art Should know, embodiment described in this description belongs to preferred embodiment, and involved action and module might not Necessary to being the application.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the part described in detail in certain embodiment, May refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way Realize.For example, device embodiment described above is only schematical, such as the division of described unit is only A kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be combined Or another system is desirably integrated into, or some features can be ignored, or do not perform.
The unit illustrated as separating component can be or may not be physically separate, be shown as unit Part can be or may not be physical location, you can with positioned at a place, or multiple can also be distributed to On NE.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme Purpose.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, or Unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
Obviously, those skilled in the art should be understood that each module or each step of above-mentioned the application can be with general Computing device realizes that they can be concentrated on single computing device, or is distributed in multiple computing devices and is constituted Network on, alternatively, the program code that they can be can perform with computing device be realized, it is thus possible to by they Storage is performed by computing device in the storage device, or they are fabricated to each integrated circuit modules respectively, or Multiple modules or step in them are fabricated to single integrated circuit module to realize.So, the application is not restricted to appoint What specific hardware and software is combined.
The preferred embodiment of the application is the foregoing is only, the application is not limited to, for those skilled in the art For, the application can have various modifications and variations.All any modifications within spirit herein and principle, made, Equivalent, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. a kind of processing method of text classification, it is characterised in that including:
Classification treatment is carried out to pending text using the first sorting technique, obtain the first text categories to be confirmed and First subordinate probability, wherein, the first subordinate probability is to wait to locate according to first sorting technique judges Reason text belongs to the probability of the described first text categories to be confirmed;
According to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, it is described First history subordinate probability is that the described pending text of storage in presetting database belongs to the described first text to be confirmed The probability of this classification;
Judge the first object probability whether higher than predetermined threshold value;And
When the first object probability is less than the predetermined threshold value, successively using with first sorting technique not Same at least one sorting technique carries out the classification treatment to the pending text, until the target for calculating Untill probability is greater than or equal to the predetermined threshold value, and the text categories to be confirmed that will be finally given are literary as target This classification.
2. method according to claim 1, it is characterised in that using the first sorting technique to the pending text Originally before carrying out classification treatment, methods described also includes:
It is determined that carrying out various sorting techniques of classification treatment to the pending text;And
The sorting technique set of various sorting technique compositions is obtained, wherein, the sorting technique set includes First sorting technique.
3. method according to claim 1, it is characterised in that according to the first subordinate probability and the first history from Category probability calculation first object probability includes:
By the first subordinate probability and the first history subordinate probability multiplication, first object subordinate probability is obtained;
By the first non-dependent probability and the first history non-dependent probability multiplication, first object non-dependent probability is obtained, Wherein, the first non-dependent probability is to judge that the pending text is not belonging to according to first sorting technique The probability of first text categories to be confirmed, the first history non-dependent probability is in the presetting database The described pending text of storage is not belonging to the probability of the described first text categories to be confirmed;
The first object subordinate probability is added with the first object non-dependent probability, first object is obtained Probability;And
The first object subordinate probability is divided by with the sub- probability of the first object, the first object is obtained general Rate.
4. method according to claim 1, it is characterised in that in the text categories to be confirmed that will be finally given After as target text classification, methods described also includes:
The classification side with final use stored in the presetting database is updated with the destination probability for finally calculating The corresponding history subordinate probability of method.
5. method according to claim 1, it is characterised in that in the text categories to be confirmed that will be finally given After as target text classification, methods described also includes:
Export the target text classification to destination address.
6. a kind of processing unit of text classification, it is characterised in that including:
Processing unit, for carrying out classification treatment to pending text using the first sorting technique, obtains first and treats Confirm text categories and the first subordinate probability, wherein, the first subordinate probability is according to the first classification side Method judges that the pending text belongs to the probability of the described first text categories to be confirmed;
Computing unit, for general according to the first subordinate probability and the first history subordinate probability calculation first object Rate, wherein, the first history subordinate probability belongs to institute for the described pending text stored in presetting database State the probability of the first text categories to be confirmed;
Judging unit, for judging the first object probability whether higher than predetermined threshold value;And
First determining unit, for when the first object probability be less than the predetermined threshold value when, successively using with The different at least one sorting technique of first sorting technique is carried out at the classification to the pending text Reason, untill the destination probability for calculating is greater than or equal to the predetermined threshold value, and treats what is finally given really Text categories are recognized as target text classification.
7. device according to claim 6, it is characterised in that described device also includes:
Second determining unit, for determine to the pending text carry out classify treatment various sorting techniques; And
Acquiring unit, the sorting technique set for obtaining various sorting technique compositions, wherein, described point The set of class method includes first sorting technique.
8. device according to claim 6, it is characterised in that the computing unit includes:
First computing module, for by the first subordinate probability and the first history subordinate probability multiplication, obtaining To first object subordinate probability;
Second computing module, for by the first non-dependent probability and the first history non-dependent probability multiplication, obtaining One target non-dependent probability, wherein, the first non-dependent probability is to judge institute according to first sorting technique State the probability that pending text is not belonging to the described first text categories to be confirmed, the first history non-dependent probability For the described pending text stored in the presetting database is not belonging to the general of the described first text categories to be confirmed Rate;
3rd computing module, for by the first object subordinate probability and the first object non-dependent probability phase Plus, obtain the sub- probability of first object;And
4th computing module, for the first object subordinate probability to be divided by with the sub- probability of the first object, Obtain the first object probability.
9. device according to claim 6, it is characterised in that described device also includes:Updating block, for What is stored in the final destination probability renewal presetting database for calculating is corresponding with the final sorting technique for using History subordinate probability.
10. device according to claim 6, it is characterised in that described device also includes:Output unit, for defeated Go out the target text classification to destination address.
CN201510921141.1A 2015-12-11 2015-12-11 The processing method and processing device of text classification Pending CN106874291A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510921141.1A CN106874291A (en) 2015-12-11 2015-12-11 The processing method and processing device of text classification
PCT/CN2016/107313 WO2017097118A1 (en) 2015-12-11 2016-11-25 Text classification processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510921141.1A CN106874291A (en) 2015-12-11 2015-12-11 The processing method and processing device of text classification

Publications (1)

Publication Number Publication Date
CN106874291A true CN106874291A (en) 2017-06-20

Family

ID=59013723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510921141.1A Pending CN106874291A (en) 2015-12-11 2015-12-11 The processing method and processing device of text classification

Country Status (2)

Country Link
CN (1) CN106874291A (en)
WO (1) WO2017097118A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597985A (en) * 2019-08-15 2019-12-20 重庆金融资产交易所有限责任公司 Data classification method, device, terminal and medium based on data analysis
CN111191447A (en) * 2019-12-18 2020-05-22 东软集团股份有限公司 Equipment defect classification method, device and equipment
CN112380346A (en) * 2020-11-23 2021-02-19 宁波深擎信息科技有限公司 Financial news emotion analysis method and device, computer equipment and storage medium
CN113806542A (en) * 2021-09-18 2021-12-17 上海幻电信息科技有限公司 Text analysis method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1701324A (en) * 2001-11-02 2005-11-23 Dba西方集团西方出版社 Systems, methods, and software for classifying text
CN101059796A (en) * 2006-04-19 2007-10-24 中国科学院自动化研究所 Two-stage combined file classification method based on probability subject
CN101587493A (en) * 2009-06-29 2009-11-25 中国科学技术大学 Text classification method
CN102033964A (en) * 2011-01-13 2011-04-27 北京邮电大学 Text classification method based on block partition and position weight
CN103514174A (en) * 2012-06-18 2014-01-15 北京百度网讯科技有限公司 Text categorization method and device
US20140314311A1 (en) * 2013-04-23 2014-10-23 Wal-Mart Stores, Inc. System and method for classification with effective use of manual data input
US9104972B1 (en) * 2009-03-13 2015-08-11 Google Inc. Classifying documents using multiple classifiers

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141977A (en) * 2010-02-01 2011-08-03 阿里巴巴集团控股有限公司 Text classification method and device
CN103473356B (en) * 2013-09-26 2017-01-25 苏州大学 Document-level emotion classifying method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1701324A (en) * 2001-11-02 2005-11-23 Dba西方集团西方出版社 Systems, methods, and software for classifying text
CN101059796A (en) * 2006-04-19 2007-10-24 中国科学院自动化研究所 Two-stage combined file classification method based on probability subject
US9104972B1 (en) * 2009-03-13 2015-08-11 Google Inc. Classifying documents using multiple classifiers
CN101587493A (en) * 2009-06-29 2009-11-25 中国科学技术大学 Text classification method
CN102033964A (en) * 2011-01-13 2011-04-27 北京邮电大学 Text classification method based on block partition and position weight
CN103514174A (en) * 2012-06-18 2014-01-15 北京百度网讯科技有限公司 Text categorization method and device
US20140314311A1 (en) * 2013-04-23 2014-10-23 Wal-Mart Stores, Inc. System and method for classification with effective use of manual data input

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597985A (en) * 2019-08-15 2019-12-20 重庆金融资产交易所有限责任公司 Data classification method, device, terminal and medium based on data analysis
CN111191447A (en) * 2019-12-18 2020-05-22 东软集团股份有限公司 Equipment defect classification method, device and equipment
CN111191447B (en) * 2019-12-18 2023-07-14 东软集团股份有限公司 Equipment defect classification method, device and equipment
CN112380346A (en) * 2020-11-23 2021-02-19 宁波深擎信息科技有限公司 Financial news emotion analysis method and device, computer equipment and storage medium
CN112380346B (en) * 2020-11-23 2023-04-25 宁波深擎信息科技有限公司 Financial news emotion analysis method and device, computer equipment and storage medium
CN113806542A (en) * 2021-09-18 2021-12-17 上海幻电信息科技有限公司 Text analysis method and system
CN113806542B (en) * 2021-09-18 2024-05-17 上海幻电信息科技有限公司 Text analysis method and system

Also Published As

Publication number Publication date
WO2017097118A1 (en) 2017-06-15

Similar Documents

Publication Publication Date Title
Baird et al. Designing experiments to measure spillover effects
CN110929752B (en) Grouping method based on knowledge driving and data driving and related equipment
CN106529565A (en) Target identification model training and target identification method and device, and computing equipment
CN106874291A (en) The processing method and processing device of text classification
CN110263979B (en) Method and device for predicting sample label based on reinforcement learning model
US20160012544A1 (en) Insurance claim validation and anomaly detection based on modus operandi analysis
CN105929690B (en) A kind of Flexible Workshop Robust Scheduling method based on decomposition multi-objective Evolutionary Algorithm
CN107274543B (en) A kind of recognition methods of bank note, device, terminal device and computer storage medium
CN104076809B (en) Data processing equipment and data processing method
CN109034175B (en) Image processing method, device and equipment
WO2014176056A2 (en) Data classification
CN108733790A (en) Data reordering method, device, server and storage medium
Datta et al. Some convergence-based M-ary cardinal metrics for comparing performances of multi-objective optimizers
CN113780365B (en) Sample generation method and device
CN114638501A (en) Business data processing method and device, computer equipment and storage medium
CN111382250A (en) Question text matching method and device, computer equipment and storage medium
CN107729909B (en) Application method and device of attribute classifier
CN110262950A (en) Abnormal movement detection method and device based on many index
CN109872183A (en) Intelligent Service evaluation method, computer readable storage medium and terminal device
CN115660101A (en) Data service providing method and device based on service node information
CN107545347A (en) Attribute determining method, device and server for prevention and control risk
CN106844718A (en) The determination method and apparatus of data acquisition system
CN113344415A (en) Deep neural network-based service distribution method, device, equipment and medium
Popescu Statistical analysis of consumer price indices
CN110610378A (en) Product demand analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20170620

RJ01 Rejection of invention patent application after publication