CN106874291A - The processing method and processing device of text classification - Google Patents
The processing method and processing device of text classification Download PDFInfo
- Publication number
- CN106874291A CN106874291A CN201510921141.1A CN201510921141A CN106874291A CN 106874291 A CN106874291 A CN 106874291A CN 201510921141 A CN201510921141 A CN 201510921141A CN 106874291 A CN106874291 A CN 106874291A
- Authority
- CN
- China
- Prior art keywords
- probability
- text
- subordinate
- classification
- sorting technique
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of processing method and processing device of text classification.The method includes:Classification treatment is carried out to pending text using the first sorting technique, the first text categories to be confirmed and the first subordinate probability is obtained;According to the first subordinate probability and the first history subordinate probability calculation first object probability;Judge first object probability whether higher than predetermined threshold value;And when first object probability is less than predetermined threshold value, classification treatment is carried out to pending text using at least one sorting technique different from the first sorting technique successively, untill the destination probability that calculates is greater than or equal to predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.By the application, solve the problems, such as in correlation technique in order to the accuracy lifted to text classification causes the treatment effeciency to text classification low.
Description
Technical field
The application is related to text-processing field, in particular to a kind of processing method and processing device of text classification.
Background technology
Text classification is one of vital task of natural language processing, and similar to the trade classification of article, sentiment analysis etc. are permitted
Many natural language processing tasks its substantially be all text classification.It is at present, either rule-based to be also based on machine learning,
The method for processing text classification problem has a lot.Generally, classification treatment is carried out to text using a kind of sorting technique, is obtained
To classification results, output category result.Then the accurate for the treatment of of classifying is carried out to text only with a kind of sorting technique
Property is relatively low.A series of sorting techniques are employed in order to lift the accuracy classified to text, in correlation technique, it is intended to
Using multiple, less accurately sorting technique carries out classification treatment to text, obtains multiple classification results.Then it is right again
Each classification result is voted, and selects the classification result of highest ticket as output.This method is very big
Compensate for simply using a deficiency for sorting technique in degree, but regardless of whether be necessary, the method is for each
The text of input is required for, using multiple sorting techniques, causing the decline to text-processing performance.
For in correlation technique in order to the accuracy lifted to text classification causes low to the treatment effeciency of text classification asking
Topic, not yet proposes effective solution at present.
The content of the invention
The main purpose of the application is to provide a kind of processing method and processing device of text classification, to solve to be in correlation technique
Lifting causes the problem low to the treatment effeciency of text classification to the accuracy of text classification.
To achieve these goals, according to the one side of the application, there is provided a kind of processing method of text classification.Should
Method includes:Classification treatment is carried out to pending text using the first sorting technique, obtain the first text categories to be confirmed and
First subordinate probability, wherein, the first subordinate probability is to judge that pending text belongs to first and treats really according to the first sorting technique
Recognize the probability of text categories;According to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein,
First history subordinate probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database;
Judge first object probability whether higher than predetermined threshold value;And when first object probability is less than predetermined threshold value, use successively
At least one sorting techniques different from the first sorting technique carry out classification treatment to pending text, until the mesh for calculating
Mark probability is greater than or equal to untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.
Further, before classification treatment is carried out to pending text using the first sorting technique, the method also includes:
It is determined that carrying out various sorting techniques of classification treatment to pending text;And obtain the classification side of various sorting technique compositions
Method set, wherein, sorting technique set includes the first sorting technique.
Further, included according to the first subordinate probability and the first history subordinate probability calculation first object probability:By first
Subordinate probability and the first history subordinate probability multiplication, obtain first object subordinate probability;By the first non-dependent probability and first
History non-dependent probability multiplication, obtains first object non-dependent probability, wherein, the first non-dependent probability is according to first point
Class method judges that pending text is not belonging to the probability of the first text categories to be confirmed, and the first history non-dependent probability is default
The pending text stored in database is not belonging to the probability of the first text categories to be confirmed;By first object subordinate probability with
First object non-dependent probability is added, and obtains the sub- probability of first object;And by first object subordinate probability and first object
Sub- probability is divided by, and obtains first object probability.
Further, in the text categories to be confirmed that will be finally given as after target text classification, the method also includes:
With the destination probability that finally calculates update the history corresponding with the final sorting technique for using stored in presetting database from
Category probability.
Further, in the text categories to be confirmed that will be finally given as after target text classification, the method also includes:
Export target text classification to destination address.
To achieve these goals, according to the another aspect of the application, there is provided a kind of processing unit of text classification.Should
Device includes:Processing unit, for carrying out classification treatment to pending text using the first sorting technique, obtains first and treats
Confirm text categories and the first subordinate probability, wherein, the first subordinate probability is to judge pending text according to the first sorting technique
Originally the probability of the first text categories to be confirmed is belonged to;Computing unit, for according to the first subordinate probability and the first history subordinate
Probability calculation first object probability, wherein, the first history subordinate probability is the pending text category of storage in presetting database
In the probability of the first text categories to be confirmed;Judging unit, for judging first object probability whether higher than predetermined threshold value;
And first determining unit, for when first object probability is less than predetermined threshold value, successively using with the first sorting technique not
Same at least one sorting technique carries out classification treatment to pending text, until the destination probability for calculating is greater than or equal to
Untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.
Further, the device also includes:Second determining unit, for determining to carry out treatment of classifying to pending text
Various sorting techniques;And acquiring unit, the sorting technique set for obtaining various sorting technique compositions, wherein, point
The set of class method includes the first sorting technique.
Further, computing unit includes:First computing module, for the first subordinate probability and the first history subordinate is general
Rate is multiplied, and obtains first object subordinate probability;Second computing module, for the first non-dependent probability and the first history is non-
Subordinate probability multiplication, obtains first object non-dependent probability, wherein, the first non-dependent probability is according to the first sorting technique
Judge that pending text is not belonging to the probability of the first text categories to be confirmed, the first history non-dependent probability is presetting database
The pending text of middle storage is not belonging to the probability of the first text categories to be confirmed;3rd computing module, for by the first mesh
Mark subordinate probability is added with first object non-dependent probability, obtains the sub- probability of first object;And the 4th computing module, use
It is divided by by the sub- probability of first object subordinate probability and first object, obtains first object probability.
Further, the device also includes:Updating block, for updating preset data with the destination probability for finally calculating
The history subordinate probability corresponding with the final sorting technique for using stored in storehouse.
Further, the device also includes:Output unit, for exporting target text classification to destination address.
By the application, using following steps:Classification treatment is carried out to pending text using the first sorting technique, is obtained
First text categories to be confirmed and the first subordinate probability, wherein, the first subordinate probability is to be judged to treat according to the first sorting technique
Treatment text belongs to the probability of the first text categories to be confirmed;According to the first subordinate probability and the first history subordinate probability calculation
First object probability, wherein, the first history subordinate probability is treated for the pending text stored in presetting database belongs to first
Confirm the probability of text categories;Judge first object probability whether higher than predetermined threshold value;And when first object probability is less than
During predetermined threshold value, pending text is classified using at least one sorting technique different from the first sorting technique successively
Treatment, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and the text class to be confirmed that will be finally given
Not as target text classification, solve in correlation technique in order to the accuracy lifted to text classification causes to text classification
The low problem for the treatment of effeciency.By introducing destination probability, the corresponding target text of pending text is determined according to destination probability
This type, to make up and process determination target text type using only a kind of sorting technique and be effectively reduced by unnecessary many
Subseries processing method goes to determine target text type, and then has reached in lifting to the accuracy of text classification while also carrying
The effect to the treatment effeciency of text classification is risen.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing further understanding of the present application, the schematic implementation of the application
Example and its illustrate for explaining the application, do not constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow chart of the processing method of the text classification according to the embodiment of the present application;And
Fig. 2 is the schematic diagram of the processing unit of the text classification according to the embodiment of the present application.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can be mutual
Combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application in it is attached
Figure, is clearly and completely described, it is clear that described embodiment is only to the technical scheme in the embodiment of the present application
It is the embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, this area is common
The every other embodiment that technical staff is obtained under the premise of creative work is not made, should all belong to the application guarantor
The scope of shield.
It should be noted that term " first ", " second " in the description and claims of this application and above-mentioned accompanying drawing
Etc. being for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so as to embodiments herein described herein.Additionally, term " including "
A series of " having " and their any deformation, it is intended that covering is non-exclusive to be included, for example, containing steps
The process of rapid or unit, method, system, product or equipment are not necessarily limited to those steps or the unit clearly listed, and
Be may include not list clearly or for these processes, method, product or other intrinsic steps of equipment or unit.
According to embodiments herein, there is provided a kind of processing method of text classification.
Fig. 1 is the flow chart of the processing method of the text classification according to the embodiment of the present application.As shown in figure 1, the method bag
Include following steps:
Step S101, classification treatment is carried out using the first sorting technique to pending text, obtains the first text class to be confirmed
Not with the first subordinate probability, wherein, the first subordinate probability is to judge that pending text belongs to first according to the first sorting technique
The probability of text categories to be confirmed.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, treated using the first sorting technique
Before treatment text carries out classification treatment, the method also includes:It is determined that carrying out various points of classification treatment to pending text
Class method;And the sorting technique set that various sorting techniques are constituted is obtained, wherein, sorting technique set includes first point
Class method.
In natural language processing, there are a variety of methods for the processing method of text classification, such as using linguistic rules,
Using the various sorting techniques of machine learning, logistic regression, naive Bayesian, SVMs, random forest etc. are more
Sorting technique is planted, various sorting techniques constitute sorting technique set.For example, choosing the logistic regression in sorting technique set
Sorting technique is classified as the first sorting technique to pending text, obtains the first text categories to be confirmed.For example,
First text categories to be confirmed can be that the text type belonging to pending text is emotional category.System can judge to use
The text type that one sorting technique carries out belonging to the pending text that classification treatment is obtained to pending text is the general of accuracy
Rate (i.e. the first subordinate probability).
Step S102, according to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, first
History subordinate probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, according to the first subordinate probability and first
History subordinate probability calculation first object probability includes:By the first subordinate probability and the first history subordinate probability multiplication, obtain
First object subordinate probability;By the first non-dependent probability and the first history non-dependent probability multiplication, obtain first object it is non-from
Category probability, wherein, the first non-dependent probability is that to judge that pending text is not belonging to first according to the first sorting technique to be confirmed
The probability of text categories, the first history non-dependent probability is treated for the pending text stored in presetting database is not belonging to first
Confirm the probability of text categories;First object subordinate probability is added with first object non-dependent probability, first object is obtained
Sub- probability;And the sub- probability of first object subordinate probability and first object is divided by, obtain first object probability.
First object probability is the probability that the pending text for calculating belongs to the first text categories to be confirmed.First history from
Category probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database;First subordinate is general
Rate is to judge that pending text belongs to the probability of the first text categories to be confirmed according to the first sorting technique in system.Therefore two
It is the first history subordinate probability and the to think that the pending text belongs to the probability of the first text categories to be confirmed under the conditions of individual
The product of one subordinate probability.
For example, the probability that the pending text stored in presetting database belongs to the first text categories to be confirmed is 0.6 (first
History subordinate probability), that is, judge that the probability that pending text is not belonging to the first text categories to be confirmed is 0.4 (the first history
Non-dependent probability);Judge that pending text belongs to the probability of the first text categories to be confirmed according to the first sorting technique in system
It is that 0.8 (the first subordinate probability), i.e. system judge that the probability that pending text is not belonging to the first text categories to be confirmed is 0.2
(the first non-dependent probability);(pending text belongs to the first text to be confirmed to calculate first object probability according to data above
The probability of this classification)=(0.6*0.8)/(0.6*0.8+0.4*0.2)=0.857, calculate pending text and be not belonging to first
The probability of text categories to be confirmed=(0.4*0.2)/(0.6*0.8+0.4*0.2)=0.143.
Whether step S103, judge first object probability higher than predetermined threshold value.
Predetermined threshold value can be the value that user or party in request set according to the satisfaction to classification feature.For example preset
Threshold value is 0.8.
Step S104, when first object probability be less than predetermined threshold value when, successively using it is different from the first sorting technique at least
A kind of sorting technique carries out classification treatment to pending text, until the destination probability for calculating is greater than or equal to predetermined threshold value
Untill, and the text categories to be confirmed that will be finally given are used as target text classification.
Specifically, when first object probability is less than predetermined threshold value, pending text is divided using the second sorting technique
Class treatment, it is for instance possible to use Naive Bayes Classification method, obtains the second text categories to be confirmed and the second subordinate is general
Rate, wherein, the second subordinate probability is to judge that pending text belongs to the second text categories to be confirmed according to the second sorting technique
Probability;According to the second subordinate probability and second history subordinate the second destination probability of probability calculation, wherein, the second history from
Category probability belongs to the probability of the second text categories to be confirmed for the pending text stored in presetting database;Judge the second mesh
Whether mark probability is higher than predetermined threshold value, if being judged as YES, using the second text categories to be confirmed as target text classification,
If being judged as NO, continue other sorting techniques using non-first sorting technique and the second sorting technique according to obtaining mistake above
Journey carries out classification treatment to pending text, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and will
The text categories to be confirmed for finally giving are used as target text classification.
For example, predetermined threshold value is 0.9, the above-mentioned first object probability for calculating is 0.857, judges first object probability
Less than predetermined threshold value, then system then will be considered that the first text categories to be confirmed in the current first text categories to be confirmed are not
Target text type, correspondingly, system can be using the second sorting technique (such as:Naive Bayes Classification method) treat place
Reason text carries out classification treatment, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and will finally give
Text categories to be confirmed as target text classification.
If predetermined threshold value is 0.8, the above-mentioned first object probability for calculating is 0.857, judges that first object probability is high
In predetermined threshold value, it is determined that the first text categories to be confirmed are the target text classification belonging to pending text.For example, really
It is emotion class to make the text type belonging to pending text.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, in the text to be confirmed that will be finally given
After this classification is as target text classification, the method also includes:Export target text classification to destination address.
Text type belonging to pending text is exported to destination address, display or user enter to it in destination address
Row analyzing and processing.
Alternatively, in the processing method of the text classification that the embodiment of the present application is provided, in the text to be confirmed that will be finally given
After this classification is as target text classification, the method also includes:Preset data is updated with the destination probability for finally calculating
The history subordinate probability corresponding with the final sorting technique for using stored in storehouse.
Store in presetting database corresponding with the final sorting technique for using is updated by the destination probability that will finally calculate
History subordinate probability, it is ensured that in presetting database store history subordinate probability accuracy.
In this application, destination probability is introduced by above step, determines that pending text is corresponding according to destination probability
Target text type, to make up and process determination target text type using only a kind of sorting technique and be effectively reduced by not
Necessary repeatedly classification processing method goes to determine target text type, and then has reached same to the accuracy of text classification in lifting
When also improve effect to the treatment effeciency of text classification.
The processing method of the text classification that the embodiment of the present application is provided, is entered by using the first sorting technique to pending text
Row classification is processed, and obtains the first text categories to be confirmed and the first subordinate probability, wherein, the first subordinate probability is according to the
One sorting technique judges that pending text belongs to the probability of the first text categories to be confirmed;According to the first subordinate probability and first
History subordinate probability calculation first object probability, wherein, the first history subordinate probability be presetting database in store wait locate
Reason text belongs to the probability of the first text categories to be confirmed;Judge first object probability whether higher than predetermined threshold value;And work as
When first object probability is less than predetermined threshold value, treated using at least one sorting technique different from the first sorting technique successively
Treatment text carries out classification treatment, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and is obtained final
The text categories to be confirmed for arriving are solved in correlation technique to be lifted to the accurate of text classification as target text classification
Property causes the problem low to the treatment effeciency of text classification.By introducing destination probability, determined according to destination probability pending
The corresponding target text type of text, to make up and process determination target text type using only a kind of sorting technique and effectively subtract
Lack and gone to determine target text type by unnecessary multiple classification processing method, and then reached in lifting to text classification
Accuracy also improve effect to the treatment effeciency of text classification simultaneously.
It should be noted that can be in the such as one group meter of computer executable instructions the step of the flow of accompanying drawing is illustrated
Performed in calculation machine system, and, although logical order is shown in flow charts, but in some cases, can be with
Shown or described step is performed different from order herein.
The embodiment of the present application additionally provides a kind of processing unit of text classification, it is necessary to illustrate, the embodiment of the present application
The processing unit of text classification can be used for performing the processing method for text classification that the embodiment of the present application is provided.With
Under the processing unit of text classification that the embodiment of the present application is provided is introduced.
Fig. 2 is the schematic diagram of the processing unit of the text classification according to the embodiment of the present application.As shown in Fig. 2 the device bag
Include:Processing unit 10, computing unit 20, the determining unit 40 of judging unit 30 and first.
Processing unit 10, for carrying out classification treatment to pending text using the first sorting technique, obtains first to be confirmed
Text categories and the first subordinate probability, wherein, the first subordinate probability is to judge pending text category according to the first sorting technique
In the probability of the first text categories to be confirmed.
Computing unit 20, for according to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein,
First history subordinate probability belongs to the probability of the first text categories to be confirmed for the pending text stored in presetting database.
Judging unit 30, for judging first object probability whether higher than predetermined threshold value.
First determining unit 40, for when first object probability be less than predetermined threshold value when, successively using with the first sorting technique
Different at least one sorting techniques carry out classification treatment to pending text, until the destination probability for calculating is higher than or waits
Untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text classification.
The processing unit of the text classification that the embodiment of the present application is provided, is treated by processing unit 10 using the first sorting technique
Treatment text carries out classification treatment, obtains the first text categories to be confirmed and the first subordinate probability, wherein, the first subordinate is general
Rate is to judge that pending text belongs to the probability of the first text categories to be confirmed according to the first sorting technique;Computing unit 20
According to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, the first history subordinate probability is pre-
If the pending text stored in database belongs to the probability of the first text categories to be confirmed;Judging unit 30 judges the first mesh
Whether mark probability is higher than predetermined threshold value;And first determining unit 40 when first object probability be less than predetermined threshold value when, successively
Classification treatment is carried out to pending text using at least one sorting technique different from the first sorting technique, until calculating
Destination probability greater than or equal to untill predetermined threshold value, and the text categories to be confirmed that will be finally given are used as target text class
Not, solve in correlation technique in order to the accuracy lifted to text classification causes low to the treatment effeciency of text classification asking
Topic, by introducing destination probability, the corresponding target text type of pending text is determined according to destination probability, and making up only makes
Processed with a kind of sorting technique and determine target text type and be effectively reduced to be gone by unnecessary multiple classification processing method
Determine target text type, and then reached in lifting to the accuracy of text classification while also improving to text classification
The effect for the treatment of effeciency.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, the device also includes:Second determines
Unit, for determine to pending text carry out classify treatment various sorting techniques;And acquiring unit, for obtaining
The sorting technique set of various sorting technique compositions, wherein, sorting technique set includes the first sorting technique.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, computing unit 20 includes:First meter
Module is calculated, for by the first subordinate probability and the first history subordinate probability multiplication, obtaining first object subordinate probability;Second
Computing module, for by the first non-dependent probability and the first history non-dependent probability multiplication, obtaining first object non-dependent general
Rate, wherein, the first non-dependent probability is to judge that pending text is not belonging to the first text to be confirmed according to the first sorting technique
The probability of classification, the first history non-dependent probability is to be confirmed for the pending text stored in presetting database is not belonging to first
The probability of text categories;3rd computing module, for first object subordinate probability to be added with first object non-dependent probability,
Obtain the sub- probability of first object;And the 4th computing module, for by first object subordinate probability and the sub- probability of first object
It is divided by, obtains first object probability.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, the device also includes:Updating block,
The corresponding with the final sorting technique for using of storage is gone through in for updating presetting database with the destination probability for finally calculating
History subordinate probability.
Alternatively, in the processing unit of the text classification that the embodiment of the present application is provided, the device also includes:Output unit,
For exporting target text classification to destination address.
The processing unit of the text classification includes processor and memory, and above-mentioned processing unit, computing unit, judgement are single
Unit and the first determining unit etc. in memory, are stored in memory as program unit storage by computing device
Said procedure unit realizes corresponding function.Above-mentioned predetermined threshold value, presetting database may be stored in memory.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one or
More than, process text classification by adjusting kernel parameter.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM), memory is deposited including at least one
Storage chip.
Present invention also provides a kind of embodiment of computer program product, when being performed on data processing equipment, it is suitable to
Perform the program code of initialization there are as below methods step:Classification treatment is carried out to pending text using the first sorting technique,
The first text categories to be confirmed and the first subordinate probability are obtained, wherein, the first subordinate probability is to be sentenced according to the first sorting technique
Fixed pending text belongs to the probability of the first text categories to be confirmed;According to the first subordinate probability and the first history subordinate probability
First object probability is calculated, wherein, the first history subordinate probability is that the pending text stored in presetting database belongs to the
The probability of one text categories to be confirmed;Judge first object probability whether higher than predetermined threshold value;And when first object probability
During less than predetermined threshold value, pending text is carried out using at least one sorting technique different from the first sorting technique successively
Classification is processed, untill the destination probability for calculating is greater than or equal to predetermined threshold value, and the text to be confirmed that will be finally given
This classification is used as target text classification.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the application is not limited by described sequence of movement,
Because according to the application, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art
Should know, embodiment described in this description belongs to preferred embodiment, and involved action and module might not
Necessary to being the application.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the part described in detail in certain embodiment,
May refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way
Realize.For example, device embodiment described above is only schematical, such as the division of described unit is only
A kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be combined
Or another system is desirably integrated into, or some features can be ignored, or do not perform.
The unit illustrated as separating component can be or may not be physically separate, be shown as unit
Part can be or may not be physical location, you can with positioned at a place, or multiple can also be distributed to
On NE.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme
Purpose.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, or
Unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
Obviously, those skilled in the art should be understood that each module or each step of above-mentioned the application can be with general
Computing device realizes that they can be concentrated on single computing device, or is distributed in multiple computing devices and is constituted
Network on, alternatively, the program code that they can be can perform with computing device be realized, it is thus possible to by they
Storage is performed by computing device in the storage device, or they are fabricated to each integrated circuit modules respectively, or
Multiple modules or step in them are fabricated to single integrated circuit module to realize.So, the application is not restricted to appoint
What specific hardware and software is combined.
The preferred embodiment of the application is the foregoing is only, the application is not limited to, for those skilled in the art
For, the application can have various modifications and variations.All any modifications within spirit herein and principle, made,
Equivalent, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. a kind of processing method of text classification, it is characterised in that including:
Classification treatment is carried out to pending text using the first sorting technique, obtain the first text categories to be confirmed and
First subordinate probability, wherein, the first subordinate probability is to wait to locate according to first sorting technique judges
Reason text belongs to the probability of the described first text categories to be confirmed;
According to the first subordinate probability and the first history subordinate probability calculation first object probability, wherein, it is described
First history subordinate probability is that the described pending text of storage in presetting database belongs to the described first text to be confirmed
The probability of this classification;
Judge the first object probability whether higher than predetermined threshold value;And
When the first object probability is less than the predetermined threshold value, successively using with first sorting technique not
Same at least one sorting technique carries out the classification treatment to the pending text, until the target for calculating
Untill probability is greater than or equal to the predetermined threshold value, and the text categories to be confirmed that will be finally given are literary as target
This classification.
2. method according to claim 1, it is characterised in that using the first sorting technique to the pending text
Originally before carrying out classification treatment, methods described also includes:
It is determined that carrying out various sorting techniques of classification treatment to the pending text;And
The sorting technique set of various sorting technique compositions is obtained, wherein, the sorting technique set includes
First sorting technique.
3. method according to claim 1, it is characterised in that according to the first subordinate probability and the first history from
Category probability calculation first object probability includes:
By the first subordinate probability and the first history subordinate probability multiplication, first object subordinate probability is obtained;
By the first non-dependent probability and the first history non-dependent probability multiplication, first object non-dependent probability is obtained,
Wherein, the first non-dependent probability is to judge that the pending text is not belonging to according to first sorting technique
The probability of first text categories to be confirmed, the first history non-dependent probability is in the presetting database
The described pending text of storage is not belonging to the probability of the described first text categories to be confirmed;
The first object subordinate probability is added with the first object non-dependent probability, first object is obtained
Probability;And
The first object subordinate probability is divided by with the sub- probability of the first object, the first object is obtained general
Rate.
4. method according to claim 1, it is characterised in that in the text categories to be confirmed that will be finally given
After as target text classification, methods described also includes:
The classification side with final use stored in the presetting database is updated with the destination probability for finally calculating
The corresponding history subordinate probability of method.
5. method according to claim 1, it is characterised in that in the text categories to be confirmed that will be finally given
After as target text classification, methods described also includes:
Export the target text classification to destination address.
6. a kind of processing unit of text classification, it is characterised in that including:
Processing unit, for carrying out classification treatment to pending text using the first sorting technique, obtains first and treats
Confirm text categories and the first subordinate probability, wherein, the first subordinate probability is according to the first classification side
Method judges that the pending text belongs to the probability of the described first text categories to be confirmed;
Computing unit, for general according to the first subordinate probability and the first history subordinate probability calculation first object
Rate, wherein, the first history subordinate probability belongs to institute for the described pending text stored in presetting database
State the probability of the first text categories to be confirmed;
Judging unit, for judging the first object probability whether higher than predetermined threshold value;And
First determining unit, for when the first object probability be less than the predetermined threshold value when, successively using with
The different at least one sorting technique of first sorting technique is carried out at the classification to the pending text
Reason, untill the destination probability for calculating is greater than or equal to the predetermined threshold value, and treats what is finally given really
Text categories are recognized as target text classification.
7. device according to claim 6, it is characterised in that described device also includes:
Second determining unit, for determine to the pending text carry out classify treatment various sorting techniques;
And
Acquiring unit, the sorting technique set for obtaining various sorting technique compositions, wherein, described point
The set of class method includes first sorting technique.
8. device according to claim 6, it is characterised in that the computing unit includes:
First computing module, for by the first subordinate probability and the first history subordinate probability multiplication, obtaining
To first object subordinate probability;
Second computing module, for by the first non-dependent probability and the first history non-dependent probability multiplication, obtaining
One target non-dependent probability, wherein, the first non-dependent probability is to judge institute according to first sorting technique
State the probability that pending text is not belonging to the described first text categories to be confirmed, the first history non-dependent probability
For the described pending text stored in the presetting database is not belonging to the general of the described first text categories to be confirmed
Rate;
3rd computing module, for by the first object subordinate probability and the first object non-dependent probability phase
Plus, obtain the sub- probability of first object;And
4th computing module, for the first object subordinate probability to be divided by with the sub- probability of the first object,
Obtain the first object probability.
9. device according to claim 6, it is characterised in that described device also includes:Updating block, for
What is stored in the final destination probability renewal presetting database for calculating is corresponding with the final sorting technique for using
History subordinate probability.
10. device according to claim 6, it is characterised in that described device also includes:Output unit, for defeated
Go out the target text classification to destination address.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510921141.1A CN106874291A (en) | 2015-12-11 | 2015-12-11 | The processing method and processing device of text classification |
PCT/CN2016/107313 WO2017097118A1 (en) | 2015-12-11 | 2016-11-25 | Text classification processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510921141.1A CN106874291A (en) | 2015-12-11 | 2015-12-11 | The processing method and processing device of text classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874291A true CN106874291A (en) | 2017-06-20 |
Family
ID=59013723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510921141.1A Pending CN106874291A (en) | 2015-12-11 | 2015-12-11 | The processing method and processing device of text classification |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106874291A (en) |
WO (1) | WO2017097118A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597985A (en) * | 2019-08-15 | 2019-12-20 | 重庆金融资产交易所有限责任公司 | Data classification method, device, terminal and medium based on data analysis |
CN111191447A (en) * | 2019-12-18 | 2020-05-22 | 东软集团股份有限公司 | Equipment defect classification method, device and equipment |
CN112380346A (en) * | 2020-11-23 | 2021-02-19 | 宁波深擎信息科技有限公司 | Financial news emotion analysis method and device, computer equipment and storage medium |
CN113806542A (en) * | 2021-09-18 | 2021-12-17 | 上海幻电信息科技有限公司 | Text analysis method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1701324A (en) * | 2001-11-02 | 2005-11-23 | Dba西方集团西方出版社 | Systems, methods, and software for classifying text |
CN101059796A (en) * | 2006-04-19 | 2007-10-24 | 中国科学院自动化研究所 | Two-stage combined file classification method based on probability subject |
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN102033964A (en) * | 2011-01-13 | 2011-04-27 | 北京邮电大学 | Text classification method based on block partition and position weight |
CN103514174A (en) * | 2012-06-18 | 2014-01-15 | 北京百度网讯科技有限公司 | Text categorization method and device |
US20140314311A1 (en) * | 2013-04-23 | 2014-10-23 | Wal-Mart Stores, Inc. | System and method for classification with effective use of manual data input |
US9104972B1 (en) * | 2009-03-13 | 2015-08-11 | Google Inc. | Classifying documents using multiple classifiers |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102141977A (en) * | 2010-02-01 | 2011-08-03 | 阿里巴巴集团控股有限公司 | Text classification method and device |
CN103473356B (en) * | 2013-09-26 | 2017-01-25 | 苏州大学 | Document-level emotion classifying method and device |
-
2015
- 2015-12-11 CN CN201510921141.1A patent/CN106874291A/en active Pending
-
2016
- 2016-11-25 WO PCT/CN2016/107313 patent/WO2017097118A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1701324A (en) * | 2001-11-02 | 2005-11-23 | Dba西方集团西方出版社 | Systems, methods, and software for classifying text |
CN101059796A (en) * | 2006-04-19 | 2007-10-24 | 中国科学院自动化研究所 | Two-stage combined file classification method based on probability subject |
US9104972B1 (en) * | 2009-03-13 | 2015-08-11 | Google Inc. | Classifying documents using multiple classifiers |
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN102033964A (en) * | 2011-01-13 | 2011-04-27 | 北京邮电大学 | Text classification method based on block partition and position weight |
CN103514174A (en) * | 2012-06-18 | 2014-01-15 | 北京百度网讯科技有限公司 | Text categorization method and device |
US20140314311A1 (en) * | 2013-04-23 | 2014-10-23 | Wal-Mart Stores, Inc. | System and method for classification with effective use of manual data input |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597985A (en) * | 2019-08-15 | 2019-12-20 | 重庆金融资产交易所有限责任公司 | Data classification method, device, terminal and medium based on data analysis |
CN111191447A (en) * | 2019-12-18 | 2020-05-22 | 东软集团股份有限公司 | Equipment defect classification method, device and equipment |
CN111191447B (en) * | 2019-12-18 | 2023-07-14 | 东软集团股份有限公司 | Equipment defect classification method, device and equipment |
CN112380346A (en) * | 2020-11-23 | 2021-02-19 | 宁波深擎信息科技有限公司 | Financial news emotion analysis method and device, computer equipment and storage medium |
CN112380346B (en) * | 2020-11-23 | 2023-04-25 | 宁波深擎信息科技有限公司 | Financial news emotion analysis method and device, computer equipment and storage medium |
CN113806542A (en) * | 2021-09-18 | 2021-12-17 | 上海幻电信息科技有限公司 | Text analysis method and system |
CN113806542B (en) * | 2021-09-18 | 2024-05-17 | 上海幻电信息科技有限公司 | Text analysis method and system |
Also Published As
Publication number | Publication date |
---|---|
WO2017097118A1 (en) | 2017-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Baird et al. | Designing experiments to measure spillover effects | |
CN110929752B (en) | Grouping method based on knowledge driving and data driving and related equipment | |
CN106529565A (en) | Target identification model training and target identification method and device, and computing equipment | |
CN106874291A (en) | The processing method and processing device of text classification | |
CN110263979B (en) | Method and device for predicting sample label based on reinforcement learning model | |
US20160012544A1 (en) | Insurance claim validation and anomaly detection based on modus operandi analysis | |
CN105929690B (en) | A kind of Flexible Workshop Robust Scheduling method based on decomposition multi-objective Evolutionary Algorithm | |
CN107274543B (en) | A kind of recognition methods of bank note, device, terminal device and computer storage medium | |
CN104076809B (en) | Data processing equipment and data processing method | |
CN109034175B (en) | Image processing method, device and equipment | |
WO2014176056A2 (en) | Data classification | |
CN108733790A (en) | Data reordering method, device, server and storage medium | |
Datta et al. | Some convergence-based M-ary cardinal metrics for comparing performances of multi-objective optimizers | |
CN113780365B (en) | Sample generation method and device | |
CN114638501A (en) | Business data processing method and device, computer equipment and storage medium | |
CN111382250A (en) | Question text matching method and device, computer equipment and storage medium | |
CN107729909B (en) | Application method and device of attribute classifier | |
CN110262950A (en) | Abnormal movement detection method and device based on many index | |
CN109872183A (en) | Intelligent Service evaluation method, computer readable storage medium and terminal device | |
CN115660101A (en) | Data service providing method and device based on service node information | |
CN107545347A (en) | Attribute determining method, device and server for prevention and control risk | |
CN106844718A (en) | The determination method and apparatus of data acquisition system | |
CN113344415A (en) | Deep neural network-based service distribution method, device, equipment and medium | |
Popescu | Statistical analysis of consumer price indices | |
CN110610378A (en) | Product demand analysis method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |
|
RJ01 | Rejection of invention patent application after publication |