WO2017097118A1 - 文本分类的处理方法及装置 - Google Patents

文本分类的处理方法及装置 Download PDF

Info

Publication number
WO2017097118A1
WO2017097118A1 PCT/CN2016/107313 CN2016107313W WO2017097118A1 WO 2017097118 A1 WO2017097118 A1 WO 2017097118A1 CN 2016107313 W CN2016107313 W CN 2016107313W WO 2017097118 A1 WO2017097118 A1 WO 2017097118A1
Authority
WO
WIPO (PCT)
Prior art keywords
probability
text
target
classification
dependent
Prior art date
Application number
PCT/CN2016/107313
Other languages
English (en)
French (fr)
Inventor
何鑫
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2017097118A1 publication Critical patent/WO2017097118A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • the present application relates to the field of text processing, and in particular to a method and apparatus for processing text classification.
  • Text categorization is one of the important tasks of natural language processing. Similar to the industry classification of the article, emotional analysis and many other natural language processing tasks, the essence of the text is the classification of the text. At present, there are many ways to deal with text classification problems, whether based on rules or machine learning. Usually, a classification method is used to classify the text, obtain the classification result, and output the classification processing result. Then the classification of the text using only one classification method is less accurate. In order to improve the accuracy of categorizing texts, a series of classification methods are adopted in the related art, aiming at classifying texts using a plurality of less precise classification methods to obtain multiple classification processing results. Then, each classification processing result is voted, and the classification processing result of the highest ticket is selected as an output. This approach largely compensates for the deficiencies of using only one classification method, but whether or not it is necessary, the method requires multiple classification methods for each input text, resulting in a degradation of text processing performance.
  • the main purpose of the present application is to provide a method and a device for processing a text classification, so as to solve the problem of low efficiency in processing text classification in order to improve the accuracy of text classification in the related art.
  • a method of processing text classification includes: performing a classification process on the processed text by using the first classification method, and obtaining a first to-be-confirmed text category and a first dependent probability, wherein the first dependent probability is determined according to the first classification method that the to-be-processed text belongs to the first to be confirmed.
  • a probability of the text category calculating a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is a probability that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category; Whether the first target probability is higher than a preset threshold; and when the first target probability is lower than the preset threshold, the at least one classification method different from the first classification method is sequentially used to classify the processed text until the calculated target The probability is higher than or equal to the preset threshold, and the final text category to be confirmed is taken as the target text category.
  • the method further includes: determining a plurality of classification methods for classifying the processed text; and obtaining a classification party composed of the plurality of classification methods A set of methods, wherein the set of classification methods includes a first classification method.
  • calculating the first target probability according to the first dependent probability and the first historical dependent probability comprises: multiplying the first dependent probability and the first historical dependent probability to obtain a first target dependent probability; and the first non-dependent probability and the first A first non-dependent probability is obtained by multiplying a historical non-dependent probability, wherein the first non-dependent probability is a probability that the to-be-processed text does not belong to the first to-be-confirmed text category according to the first classification method, and the first historical non-dependent probability The probability that the pending text stored in the preset database does not belong to the first to-be-confirmed text category; the first target dependent probability is added to the first target non-dependent probability to obtain the first target sub-probability; and the first target is subordinated The probability is divided by the first target subprobability to obtain the first target probability.
  • the method further includes: updating the historical dependent probability stored in the preset database corresponding to the finally adopted classification method with the finally calculated target probability.
  • the method further includes: outputting the target text category to the target address.
  • a processing apparatus for text classification includes: a processing unit, configured to perform classification processing on the text to be processed by using the first classification method, to obtain a first to-be-confirmed text category and a first dependent probability, wherein the first dependent probability is to determine the to-be-processed text according to the first classification method a probability of belonging to the first to-be-confirmed text category; a calculating unit, configured to calculate a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is that the to-be-processed text stored in the preset database belongs to a probability of the first to-be-confirmed text category; a determining unit, configured to determine whether the first target probability is higher than a preset threshold; and a first determining unit, configured to sequentially adopt the first target probability when the first target probability is lower than a preset threshold At least one classification method different in classification method performs classification processing
  • the apparatus further includes: a second determining unit, configured to determine a plurality of classification methods for classifying the processed text; and an obtaining unit, configured to acquire a collection of classification methods composed of a plurality of classification methods, wherein the classification method set Includes the first classification method.
  • the calculating unit includes: a first calculating module, configured to multiply the first dependent probability and the first historical dependent probability to obtain a first target dependent probability; and a second calculating module, configured to use the first non-dependent probability and the first A first non-dependent probability is obtained by multiplying a historical non-dependent probability, wherein the first non-dependent probability is a probability that the to-be-processed text does not belong to the first to-be-confirmed text category according to the first classification method, and the first historical non-dependent probability
  • the third calculation module is configured to add the first target dependent probability to the first target non-dependent probability to obtain the first target sub-probability.
  • a fourth calculating module configured to divide the first target dependent probability by the first target sub-probability to obtain a first target probability.
  • the apparatus further includes: an updating unit, configured to update, according to the finally calculated target probability, a historical dependent probability stored in the preset database corresponding to the finally adopted classification method.
  • the apparatus further includes: an output unit configured to output the target text category to the target address.
  • the first classification method is used to perform classification processing on the processed text, and the first to-be-confirmed text category and the first dependent probability are obtained, wherein the first dependent probability is determined according to the first classification method to determine that the to-be-processed text belongs to a probability of the first to-be-confirmed text category; calculating a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category
  • the probability of determining whether the first target probability is higher than a preset threshold; and when the first target probability is lower than the preset threshold, the processed text is classified by using at least one classification method different from the first classification method, until The calculated target probability is higher than or equal to the preset threshold, and the final text category to be confirmed is used as the target text category, which solves the problem that the processing efficiency of the text classification is low in order to improve the accuracy of
  • FIG. 1 is a flowchart of a method of processing text classification according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a processing device for text classification according to an embodiment of the present application.
  • a method of processing text classification is provided.
  • FIG. 1 is a flow chart of a method of processing text categorization according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
  • Step S101 Perform a classification process on the processed text by using the first classification method to obtain a first to-be-confirmed text category and a first dependent probability, where the first dependent probability is determined according to the first classification method that the to-be-processed text belongs to the first to-be-confirmed text. The probability of the category.
  • the method before the classification processing of the text to be processed by using the first classification method, the method further includes: determining a plurality of classification methods for classifying the processed text; And obtaining a collection of classification methods composed of a plurality of classification methods, wherein the collection of classification methods includes a first classification method.
  • classification methods In natural language processing, there are many methods for text classification, such as using linguistic rules, using various classification methods of machine learning, logistic regression, naive Bayes, support vector machines, random forests, etc.
  • Classification methods multiple classification methods form a collection of classification methods.
  • the logistic regression classification method in the collection of classification methods is selected as the first classification method to classify the processed text, and the first to-be-confirmed text category is obtained.
  • the first to-be-confirmed text category may be that the text type to which the text to be processed belongs is an emotion category.
  • the system determines the probability that the text type to be processed by the classification process to be processed by the first classification method belongs to the correct rate (ie, the first dependent probability).
  • Step S102 Calculate a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is a probability that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category.
  • calculating the first target probability according to the first dependent probability and the first historical dependent probability comprises: multiplying the first dependent probability and the first historical dependent probability, Obtaining a first target dependent probability; multiplying the first non-dependent probability and the first historical non-dependent probability to obtain a first target non-dependent probability, wherein the first non-dependent probability is determined according to the first classification method that the to-be-processed text does not belong to The probability of the first to-be-confirmed text category, the first historical non-dependent probability is a probability that the to-be-processed text stored in the preset database does not belong to the first to-be-confirmed text category; the first-target dependent probability is compared with the first target non-dependent probability Adding, obtaining a first target sub-probability; and dividing the first target dependent probability by the first target sub-probability to obtain a first target probability.
  • the first target probability is a probability that the calculated text to be processed belongs to the first text category to be confirmed.
  • First history from The genius probability is a probability that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category; the first sub-probability is a probability that the system determines that the to-be-processed text belongs to the first to-be-confirmed text category according to the first classification method. Therefore, the probability that the to-be-processed text belongs to the first to-be-confirmed text category under both conditions is the product of the first historical dependent probability and the first dependent probability.
  • the probability that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category is 0.6 (first historical dependent probability), that is, the probability that the pending text does not belong to the first to-be-confirmed text category is 0.4 (first Historical non-dependent probability); the probability that the pending text belongs to the first to-be-confirmed text category according to the first classification method is 0.8 (first dependent probability), that is, the system determines that the to-be-processed text does not belong to the first to-be-confirmed text category.
  • Step S103 determining whether the first target probability is higher than a preset threshold.
  • the preset threshold may be a value set by the user or the demander based on the degree of satisfaction with the classification function.
  • the preset threshold is 0.8.
  • Step S104 When the first target probability is lower than the preset threshold, the processed text is classified by using at least one classification method different from the first classification method, until the calculated target probability is higher than or equal to a preset threshold. And the final text category to be confirmed is the target text category.
  • the second classification method is used to perform classification processing on the processed text.
  • the naive Bayesian classification method may be used to obtain the second to-be-confirmed text category and the second dependent probability.
  • the second dependent probability is a probability that the to-be-processed text belongs to the second to-be-confirmed text category according to the second classification method; the second target probability is calculated according to the second dependent probability and the second historical dependent probability, wherein the second historical subordinate The probability is a probability that the to-be-processed text stored in the preset database belongs to the second to-be-confirmed text category; whether the second target probability is higher than a preset threshold, and if the determination is yes, the second to-be-confirmed text category is used as the target text category If the determination is no, continue to use the non-first classification method and the other classification method of the second classification method to classify the processed text according to the above process until the calculated target probability is higher than or equal to the preset threshold, and The final text category to be confirmed is taken as the target text category.
  • the system considers the first to-be-confirmed text in the current first to-be-confirmed text category.
  • the category is not the target text type.
  • the system will classify the processed text by the second classification method (such as Naive Bayes classification method) until the calculated target probability is higher than or equal to the preset threshold, and The final text category to be confirmed is the target text category.
  • the first target probability calculated above is 0.857, and the first target probability is determined to be high.
  • the first to-be-confirmed text category is the target text category to which the to-be-processed text belongs. For example, it is determined that the text type to which the text to be processed belongs is an emotion class.
  • the method further includes: outputting the target text category to the target address.
  • the text type to which the text to be processed belongs is output to the target address, displayed on the target address, or analyzed by the user.
  • the method further includes: updating the preset database with the final calculated target probability.
  • the historical dependent probability stored corresponding to the classification method ultimately employed.
  • the accuracy of the historical dependent probability stored in the preset database is ensured by updating the finally calculated target probability by updating the historical dependent probability stored in the preset database corresponding to the finally adopted classification method.
  • the target probability is introduced through the above steps, and the target text type corresponding to the text to be processed is determined according to the target probability, and the use of only one classification method to determine the target text type and the effective reduction of the unnecessary multiple classification processing is eliminated.
  • the method determines the target text type, and thus achieves the improvement of the accuracy of the text classification and the processing efficiency of the text classification.
  • the processing method of the text classification provided by the embodiment of the present application, by using the first classification method to perform classification processing on the processed text, to obtain a first to-be-confirmed text category and a first dependent probability, wherein the first dependent probability is according to the first classification method Determining a probability that the to-be-processed text belongs to the first to-be-confirmed text category; calculating a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is that the to-be-processed text stored in the preset database belongs to the first a probability of confirming the text category; determining whether the first target probability is higher than a preset threshold; and when the first target probability is lower than the preset threshold, sequentially processing the text by using at least one classification method different from the first classification method The classification process is performed until the calculated target probability is higher than or equal to the preset threshold, and the final text category to be confirmed is used as the target text category, which solves the problem of classifying the text in order to improve
  • the problem of low processing efficiency By introducing the target probability, determining the target text type corresponding to the text to be processed according to the target probability, making up for using only one classification method to determine the target text type and effectively reducing the target text type by unnecessary multiple classification processing methods, It achieves the effect of improving the accuracy of text classification while also improving the processing efficiency of text classification.
  • the embodiment of the present application further provides a processing device for text classification. It should be noted that the processing device for text classification in the embodiment of the present application may be used to execute the processing method for text classification provided by the embodiment of the present application. Take The processing device for text classification provided by the embodiment of the present application is introduced.
  • the apparatus includes a processing unit 10, a calculation unit 20, a determination unit 30, and a first determination unit 40.
  • the processing unit 10 is configured to perform classification processing on the text to be processed by using the first classification method, to obtain a first to-be-confirmed text category and a first dependent probability, where the first dependent probability is determined according to the first classification method that the to-be-processed text belongs to the first The probability of the text category to be confirmed.
  • the calculating unit 20 is configured to calculate a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is a probability that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category.
  • the determining unit 30 is configured to determine whether the first target probability is higher than a preset threshold.
  • the first determining unit 40 is configured to, when the first target probability is lower than the preset threshold, sequentially classify the processed text by using at least one classification method different from the first classification method, until the calculated target probability is higher than or It is equal to the preset threshold, and the final text category to be confirmed is taken as the target text category.
  • the processing unit 10, the computing unit 20, the determining unit 30, and the first determining unit 40 may be run in a computer terminal as part of the apparatus, and may be implemented by a processor in the computer terminal.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (MID), a PAD, and the like.
  • the processing device for text classification provided by the embodiment of the present application, the processing unit 10 performs classification processing on the text to be processed by using the first classification method, to obtain a first to-be-confirmed text category and a first dependent probability, wherein the first dependent probability is according to the first
  • a classification method determines a probability that the to-be-processed text belongs to the first to-be-confirmed text category; the calculating unit 20 calculates a first target probability according to the first dependent probability and the first historical dependent probability, wherein the first historical dependent probability is stored in a preset database
  • the at least one classification method different in the first classification method performs classification processing on the processed text until the calculated target probability is higher than or equal to a preset threshold
  • the classification of text is The problem of low efficiency, by introducing the target probability, determining the target text type corresponding to the text to be processed according to the target probability, making up for using only one classification method to determine the target text type and effectively reducing the unnecessary multiple classification processing method to determine
  • the target text type improves the accuracy of the text classification and also improves the processing efficiency of the text classification.
  • the device further includes: a second determination a unit, configured to determine a plurality of classification methods for classifying the processed text; and an obtaining unit, configured to obtain a collection of classification methods consisting of multiple classification methods, wherein the collection of classification methods includes a first classification method.
  • the calculating unit 20 includes: a first calculating module, configured to multiply the first dependent probability and the first historical dependent probability to obtain a first target dependent probability a second calculation module, configured to multiply the first non-dependent probability and the first historical non-dependent probability to obtain a first target non-dependent probability, wherein the first non-dependent probability is determined according to the first classification method
  • the probability of belonging to the first to-be-confirmed text category, the first historical non-dependent probability is a probability that the to-be-processed text stored in the preset database does not belong to the first to-be-confirmed text category
  • the third computing module is configured to use the first target-dependent probability And adding a first target sub-probability to obtain a first target sub-probability; and a fourth calculating module, configured to divide the first target dependent probability by the first target sub-probability to obtain a first target probability.
  • the apparatus further includes: an updating unit, configured to update, in the final calculated target probability, the preset corresponding to the finally adopted classification method stored in the preset database. Historical subordination probability.
  • the device further includes: an output unit, configured to output the target text category to the target address.
  • the various functional units provided by the embodiments of the present application may be operated in a mobile terminal, a computer terminal, or the like, or may be stored as part of a storage medium.
  • embodiments of the present invention may provide a computer terminal, which may be any computer terminal device in a group of computer terminals.
  • a computer terminal may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
  • the computer terminal may execute the program code of the following steps in the processing method of the text classification: classifying the processed text by using the first classification method, and obtaining the first to-be-confirmed text category and the first dependent probability; Calculating a first target probability according to a dependent probability and a first historical dependent probability; determining whether the first target probability is higher than a preset threshold; and when the first target probability is lower than a preset threshold, using at least a different method from the first classification method A classification method classifies the processed text until the calculated target probability is higher than or equal to a preset threshold, and uses the finally obtained text category to be confirmed as the target text category.
  • the computer terminal can include: one or more processors, memory, and transmission means.
  • the memory can be used to store the software program and the module, such as the processing method of the text classification and the program instruction/module corresponding to the device in the embodiment of the present invention, and the processor executes various programs by running the software program and the module stored in the memory.
  • Functional application and data processing that is, a processing method for realizing the above-described text classification.
  • the memory may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • the memory can further include memory remotely located relative to the processor, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the above transmission device is for receiving or transmitting data via a network.
  • Specific examples of the above network may include a wired network and a wireless network.
  • the transmission device includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • the transmission device is a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • the memory is used to store preset action conditions and information of the preset rights user, and an application.
  • the processor can call the memory stored information and the application by the transmitting device to execute the program code of the method steps of each of the alternative or preferred embodiments of the above method embodiments.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (MID), a PAD, and the like.
  • a smart phone such as an Android phone, an iOS phone, etc.
  • a tablet computer such as an iPad, Samsung Galaxy Tab, Samsung Galaxy Tab, etc.
  • MID mobile Internet device
  • PAD PAD
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be used to save the program code executed by the processing method of the text classification provided by the foregoing method embodiment and the device embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is configured to store program code for performing the following steps: performing classification processing on the processed text by using the first classification method, to obtain a first to-be-confirmed text category and a first dependent probability; Calculating a first target probability according to the first dependent probability and the first historical dependent probability; determining whether the first target probability is higher than a preset threshold; and when the first target probability is lower than a preset threshold, sequentially adopting different from the first classification method At least one classification method performs classification processing on the processed text until the calculated target probability is higher than or equal to a preset threshold, and the final obtained text category to be confirmed is the target text category.
  • the storage medium may also be configured as program code for storing various preferred or optional method steps provided by the processing method of the text classification.
  • the processing device for text classification includes a processor and a memory, and the processing unit, the calculating unit, the determining unit, the first determining unit, and the like are all stored as a program unit in a memory, and the processor executes the above-mentioned program unit stored in the memory.
  • the preset threshold and the preset database may all be stored in the memory.
  • the processor contains a kernel, and the kernel removes the corresponding program unit from the memory.
  • the kernel can set one or more to handle text categorization by adjusting kernel parameters.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • the present application also provides an embodiment of a computer program product, when executed on a data processing device, adapted to perform program code initialization with the following method steps: classifying the processed text using the first classification method to obtain the first a text category to be confirmed and a first dependent probability, wherein the first dependent probability is a probability that the to-be-processed text belongs to the first to-be-confirmed text category according to the first classification method; and the first is calculated according to the first dependent probability and the first historical dependent probability a target probability, wherein the first historical dependent probability is a probability that the to-be-processed text stored in the preset database belongs to the first to-be-confirmed text category; determining whether the first target probability is higher than a preset threshold; and when the first target probability is lower than When the threshold is preset, the processed text is classified by using at least one classification method different from the first classification method, until the calculated target probability is higher than or equal to a preset threshold, and the finally obtained text category to be confirmed is obtained.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • modules or steps of the present application can be implemented by a general computing device, which can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device by a computing device, or they may be fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the application is not limited to any particular combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种文本分类的处理方法及装置。该方法包括:采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率;根据第一从属概率和第一历史从属概率计算第一目标概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。通过本申请,解决了相关技术中为了提升对文本分类的准确性导致对文本分类的处理效率低的问题。

Description

文本分类的处理方法及装置 技术领域
本申请涉及文本处理领域,具体而言,涉及一种文本分类的处理方法及装置。
背景技术
文本分类是自然语言处理的重要任务之一,类似于文章的行业分类,情感分析等许多自然语言处理任务其实质都是文本的分类。目前,无论是基于规则还是基于机器学习,处理文本分类问题的方法都有很多。通常,采用一种分类方法对文本进行分类处理,得到分类结果,输出分类处理结果。然后仅采用一种分类方法对文本进行分类处理的准确性较低。为了提升对文本进行分类的准确性,相关技术中采用了一系列分类方法,旨在使用多个不太精准的分类方法对文本进行分类处理,得到多个分类处理结果。然后再对每一个分类处理结果进行投票,选出最高票的分类处理结果作为输出。这种方法在很大程度上弥补了仅仅使用一个分类方法的不足,然而无论是否有必要,该方法对于每一个输入的文本都需要采用多个分类方法,造成对文本处理性能的下降。
针对相关技术中为了提升对文本分类的准确性导致对文本分类的处理效率低的问题,目前尚未提出有效的解决方案。
发明内容
本申请的主要目的在于提供一种文本分类的处理方法及装置,以解决相关技术中为了提升对文本分类的准确性导致对文本分类的处理效率低的问题。
为了实现上述目的,根据本申请的一个方面,提供了一种文本分类的处理方法。该方法包括:采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率;根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
进一步地,在采用第一分类方法对待处理文本进行分类处理之前,该方法还包括:确定对待处理文本进行分类处理的多种分类方法;以及获取多种分类方法组成的分类方 法集合,其中,分类方法集合包括第一分类方法。
进一步地,根据第一从属概率和第一历史从属概率计算第一目标概率包括:将第一从属概率和第一历史从属概率相乘,得到第一目标从属概率;将第一非从属概率和第一历史非从属概率相乘,得到第一目标非从属概率,其中,第一非从属概率为根据第一分类方法判定待处理文本不属于第一待确认文本类别的概率,第一历史非从属概率为预设数据库中存储的待处理文本不属于第一待确认文本类别的概率;将第一目标从属概率与第一目标非从属概率相加,得到第一目标子概率;以及将第一目标从属概率与第一目标子概率相除,得到第一目标概率。
进一步地,在将最终得到的待确认文本类别作为目标文本类别之后,该方法还包括:以最终计算出的目标概率更新预设数据库中存储的与最终采用的分类方法对应的历史从属概率。
进一步地,在将最终得到的待确认文本类别作为目标文本类别之后,该方法还包括:输出目标文本类别至目标地址。
为了实现上述目的,根据本申请的另一方面,提供了一种文本分类的处理装置。该装置包括:处理单元,用于采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率;计算单元,用于根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;判断单元,用于判断第一目标概率是否高于预设阈值;以及第一确定单元,用于当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
进一步地,该装置还包括:第二确定单元,用于确定对待处理文本进行分类处理的多种分类方法;以及获取单元,用于获取多种分类方法组成的分类方法集合,其中,分类方法集合包括第一分类方法。
进一步地,计算单元包括:第一计算模块,用于将第一从属概率和第一历史从属概率相乘,得到第一目标从属概率;第二计算模块,用于将第一非从属概率和第一历史非从属概率相乘,得到第一目标非从属概率,其中,第一非从属概率为根据第一分类方法判定待处理文本不属于第一待确认文本类别的概率,第一历史非从属概率为预设数据库中存储的待处理文本不属于第一待确认文本类别的概率;第三计算模块,用于将第一目标从属概率与第一目标非从属概率相加,得到第一目标子概率;以及第四计算模块,用于将第一目标从属概率与第一目标子概率相除,得到第一目标概率。
进一步地,该装置还包括:更新单元,用于以最终计算出的目标概率更新预设数据库中存储的与最终采用的分类方法对应的历史从属概率。
进一步地,该装置还包括:输出单元,用于输出目标文本类别至目标地址。
通过本申请,采用以下步骤:采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率;根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别,解决了相关技术中为了提升对文本分类的准确性导致对文本分类的处理效率低的问题。通过引入目标概率,根据目标概率确定待处理文本对应的目标文本类型,弥补仅使用一种分类方法处理确定目标文本类型和有效的减少了通过不必要多次分类处理方法去确定目标文本类型,进而达到了在提升对文本分类的准确性同时也提升了对文本分类的处理效率的效果。
附图说明
构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的文本分类的处理方法的流程图;以及
图2是根据本申请实施例的文本分类的处理装置的示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二” 等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请的实施例,提供了一种文本分类的处理方法。
图1是根据本申请实施例的文本分类的处理方法的流程图。如图1所示,该方法包括以下步骤:
步骤S101,采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率。
可选地,在本申请实施例提供的文本分类的处理方法中,在采用第一分类方法对待处理文本进行分类处理之前,该方法还包括:确定对待处理文本进行分类处理的多种分类方法;以及获取多种分类方法组成的分类方法集合,其中,分类方法集合包括第一分类方法。
在自然语言处理中,对于文本分类的处理方法有很多种方法,例如利用语言学规则,利用机器学习的各种分类方法,逻辑回归、朴素贝叶斯、支持向量机、随机森林等等多种分类方法,多种分类方法组成分类方法集合。例如,选取分类方法集合中的逻辑回归分类方法作为第一分类方法对待处理文本进行分类,得到第一待确认文本类别。例如,第一待确认文本类别可以为待处理文本所属的文本类型为情感类别。系统会判定采用第一分类方法对待处理文本进行分类处理得到的待处理文本所属的文本类型为正确率的概率(即第一从属概率)。
步骤S102,根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率。
可选地,在本申请实施例提供的文本分类的处理方法中,根据第一从属概率和第一历史从属概率计算第一目标概率包括:将第一从属概率和第一历史从属概率相乘,得到第一目标从属概率;将第一非从属概率和第一历史非从属概率相乘,得到第一目标非从属概率,其中,第一非从属概率为根据第一分类方法判定待处理文本不属于第一待确认文本类别的概率,第一历史非从属概率为预设数据库中存储的待处理文本不属于第一待确认文本类别的概率;将第一目标从属概率与第一目标非从属概率相加,得到第一目标子概率;以及将第一目标从属概率与第一目标子概率相除,得到第一目标概率。
第一目标概率为计算出的待处理文本属于第一待确认文本类别的概率。第一历史从 属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;第一从属概率为系统中根据第一分类方法判定待处理文本属于第一待确认文本类别的概率。因此两个条件下均认为该待处理文本属于第一待确认文本类别的概率为第一历史从属概率与第一从属概率的乘积。
例如,预设数据库中存储的待处理文本属于第一待确认文本类别的概率为0.6(第一历史从属概率),即判定待处理文本不属于第一待确认文本类别的概率为0.4(第一历史非从属概率);系统中根据第一分类方法判定待处理文本属于第一待确认文本类别的概率为0.8(第一从属概率),即系统判定待处理文本不属于第一待确认文本类别的概率为0.2(第一非从属概率);根据以上数据计算出第一目标概率(待处理文本属于第一待确认文本类别的概率)=(0.6*0.8)/(0.6*0.8+0.4*0.2)=0.857,计算出待处理文本不属于第一待确认文本类别的概率=(0.4*0.2)/(0.6*0.8+0.4*0.2)=0.143。
步骤S103,判断第一目标概率是否高于预设阈值。
预设阈值可以是用户或者需求方根据对分类功能的满意程度而设定的值。例如预设阈值为0.8。
步骤S104,当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
具体地,当第一目标概率低于预设阈值时,采用第二分类方法对待处理文本进行分类处理,例如,可以采用朴素贝叶斯分类方法,得到第二待确认文本类别和第二从属概率,其中,第二从属概率为根据第二分类方法判定待处理文本属于第二待确认文本类别的概率;根据第二从属概率和第二历史从属概率计算第二目标概率,其中,第二历史从属概率为预设数据库中存储的待处理文本属于第二待确认文本类别的概率;判断第二目标概率是否高于预设阈值,若判断为是,则将第二待确认文本类别作为目标文本类别,若判断为否,则继续采用非第一分类方法和第二分类方法的其他分类方法按照上面得过程对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
例如,预设阈值为0.9,上述计算出的第一目标概率为0.857,判断出第一目标概率低于预设阈值,则系统则会认为当前第一待确认文本类别中的第一待确认文本类别不是目标文本类型,相应地,系统会采用第二分类方法(如:朴素贝叶斯分类方法)对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
如果预设阈值为0.8,上述计算出的第一目标概率为0.857,判断出第一目标概率高 于预设阈值,则确定第一待确认文本类别为待处理文本所属的目标文本类别。例如,确定出待处理文本所属的文本类型为情感类。
可选地,在本申请实施例提供的文本分类的处理方法中,在将最终得到的待确认文本类别作为目标文本类别之后,该方法还包括:输出目标文本类别至目标地址。
将待处理文本所属的文本类型输出至目标地址,在目标地址上显示或者用户对其进行分析处理。
可选地,在本申请实施例提供的文本分类的处理方法中,在将最终得到的待确认文本类别作为目标文本类别之后,该方法还包括:以最终计算出的目标概率更新预设数据库中存储的与最终采用的分类方法对应的历史从属概率。
通过将最终计算出的目标概率更新预设数据库中存储的与最终采用的分类方法对应的历史从属概率,保证了预设数据库中存储的历史从属概率的准确性。
在本申请中,通过以上步骤引入了目标概率,根据目标概率确定待处理文本对应的目标文本类型,弥补仅使用一种分类方法处理确定目标文本类型和有效的减少了通过不必要多次分类处理方法去确定目标文本类型,进而达到了在提升对文本分类的准确性同时也提升了对文本分类的处理效率的效果。
本申请实施例提供的文本分类的处理方法,通过采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率;根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别,解决了相关技术中为了提升对文本分类的准确性导致对文本分类的处理效率低的问题。通过引入目标概率,根据目标概率确定待处理文本对应的目标文本类型,弥补仅使用一种分类方法处理确定目标文本类型和有效的减少了通过不必要多次分类处理方法去确定目标文本类型,进而达到了在提升对文本分类的准确性同时也提升了对文本分类的处理效率的效果。
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例还提供了一种文本分类的处理装置,需要说明的是,本申请实施例的文本分类的处理装置可以用于执行本申请实施例所提供的用于文本分类的处理方法。以 下对本申请实施例提供的文本分类的处理装置进行介绍。
图2是根据本申请实施例的文本分类的处理装置的示意图。如图2所示,该装置包括:处理单元10、计算单元20、判断单元30和第一确定单元40。
处理单元10,用于采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率。
计算单元20,用于根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率。
判断单元30,用于判断第一目标概率是否高于预设阈值。
第一确定单元40,用于当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
此处需要说明的是,上述处理单元10、计算单元20、判断单元30和第一确定单元40可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本申请实施例提供的文本分类的处理装置,通过处理单元10采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率;计算单元20根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;判断单元30判断第一目标概率是否高于预设阈值;以及第一确定单元40当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别,解决了相关技术中为了提升对文本分类的准确性导致对文本分类的处理效率低的问题,通过引入目标概率,根据目标概率确定待处理文本对应的目标文本类型,弥补仅使用一种分类方法处理确定目标文本类型和有效的减少了通过不必要多次分类处理方法去确定目标文本类型,进而达到了在提升对文本分类的准确性同时也提升了对文本分类的处理效率的效果。
可选地,在本申请实施例提供的文本分类的处理装置中,该装置还包括:第二确定 单元,用于确定对待处理文本进行分类处理的多种分类方法;以及获取单元,用于获取多种分类方法组成的分类方法集合,其中,分类方法集合包括第一分类方法。
可选地,在本申请实施例提供的文本分类的处理装置中,计算单元20包括:第一计算模块,用于将第一从属概率和第一历史从属概率相乘,得到第一目标从属概率;第二计算模块,用于将第一非从属概率和第一历史非从属概率相乘,得到第一目标非从属概率,其中,第一非从属概率为根据第一分类方法判定待处理文本不属于第一待确认文本类别的概率,第一历史非从属概率为预设数据库中存储的待处理文本不属于第一待确认文本类别的概率;第三计算模块,用于将第一目标从属概率与第一目标非从属概率相加,得到第一目标子概率;以及第四计算模块,用于将第一目标从属概率与第一目标子概率相除,得到第一目标概率。
可选地,在本申请实施例提供的文本分类的处理装置中,该装置还包括:更新单元,用于以最终计算出的目标概率更新预设数据库中存储的与最终采用的分类方法对应的历史从属概率。
可选地,在本申请实施例提供的文本分类的处理装置中,该装置还包括:输出单元,用于输出目标文本类别至目标地址。
本申请实施例所提供的各个功能单元可以在移动终端、计算机终端或者类似的运算装置中运行,也可以作为存储介质的一部分进行存储。
由此,本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
在本实施例中,上述计算机终端可以执行文本分类的处理方法中以下步骤的程序代码:采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率;根据第一从属概率和第一历史从属概率计算第一目标概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
可选地,该计算机终端可以包括:一个或多个处理器、存储器、以及传输装置。
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的文本分类的处理方法及装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的文本分类的处理方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器用于存储预设动作条件和预设权限用户的信息、以及应用程序。
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行上述方法实施例中的各个可选或优选实施例的方法步骤的程序代码。
本领域普通技术人员可以理解,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述方法实施例和装置实施例所提供的文本分类的处理方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率;根据第一从属概率和第一历史从属概率计算第一目标概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
可选地,在本实施例中,存储介质还可以被设置为存储文本分类的处理方法提供的各种优选地或可选的方法步骤的程序代码。
如上参照附图以示例的方式描述了根据本发明的文本分类的处理方法及装置。但是,本领域技术人员应当理解,对于上述本发明所提出的文本分类的处理方法及装置,还可以在不脱离本发明内容的基础上做出各种改进。因此,本发明的保护范围应当由所附的权利要求书的内容确定。
所述文本分类的处理装置包括处理器和存储器,上述处理单元、计算单元、判断单元和第一确定单元等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元实现相应功能。上述预设阈值、预设数据库都可以存储在存储器中。
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数处理文本分类。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。
本申请还提供了一种计算机程序产品的实施例,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序代码:采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,第一从属概率为根据第一分类方法判定待处理文本属于第一待确认文本类别的概率;根据第一从属概率和第一历史从属概率计算第一目标概率,其中,第一历史从属概率为预设数据库中存储的待处理文本属于第一待确认文本类别的概率;判断第一目标概率是否高于预设阈值;以及当第一目标概率低于预设阈值时,依次采用与第一分类方法不同的至少一种分类方法对待处理文本进行分类处理,直到计算出的目标概率高于或等于预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制, 因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种文本分类的处理方法,其特征在于,包括:
    采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,所述第一从属概率为根据所述第一分类方法判定所述待处理文本属于所述第一待确认文本类别的概率;
    根据所述第一从属概率和第一历史从属概率计算第一目标概率,其中,所述第一历史从属概率为预设数据库中存储的所述待处理文本属于所述第一待确认文本类别的概率;
    判断所述第一目标概率是否高于预设阈值;以及
    当所述第一目标概率低于所述预设阈值时,依次采用与所述第一分类方法不同的至少一种分类方法对所述待处理文本进行所述分类处理,直到计算出的目标概率高于或等于所述预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
  2. 根据权利要求1所述的方法,其特征在于,在采用第一分类方法对所述待处理文本进行分类处理之前,所述方法还包括:
    确定对所述待处理文本进行分类处理的多种分类方法;以及
    获取所述多种分类方法组成的分类方法集合,其中,所述分类方法集合包括所述第一分类方法。
  3. 根据权利要求1所述的方法,其特征在于,根据所述第一从属概率和第一历史从属概率计算第一目标概率包括:
    将所述第一从属概率和所述第一历史从属概率相乘,得到第一目标从属概率;
    将第一非从属概率和第一历史非从属概率相乘,得到第一目标非从属概率,其中,所述第一非从属概率为根据所述第一分类方法判定所述待处理文本不属于所述第一待确认文本类别的概率,所述第一历史非从属概率为所述预设数据库中存储的所述待处理文本不属于所述第一待确认文本类别的概率;
    将所述第一目标从属概率与所述第一目标非从属概率相加,得到第一目标子概率;以及
    将所述第一目标从属概率与所述第一目标子概率相除,得到所述第一目标概率。
  4. 根据权利要求1所述的方法,其特征在于,在所述将最终得到的待确认文本类别 作为目标文本类别之后,所述方法还包括:
    以最终计算出的目标概率更新所述预设数据库中存储的与最终采用的分类方法对应的历史从属概率。
  5. 根据权利要求1所述的方法,其特征在于,在所述将最终得到的待确认文本类别作为目标文本类别之后,所述方法还包括:
    输出所述目标文本类别至目标地址。
  6. 一种文本分类的处理装置,其特征在于,包括:
    处理单元,用于采用第一分类方法对待处理文本进行分类处理,得到第一待确认文本类别和第一从属概率,其中,所述第一从属概率为根据所述第一分类方法判定所述待处理文本属于所述第一待确认文本类别的概率;
    计算单元,用于根据所述第一从属概率和第一历史从属概率计算第一目标概率,其中,所述第一历史从属概率为预设数据库中存储的所述待处理文本属于所述第一待确认文本类别的概率;
    判断单元,用于判断所述第一目标概率是否高于预设阈值;以及
    第一确定单元,用于当所述第一目标概率低于所述预设阈值时,依次采用与所述第一分类方法不同的至少一种分类方法对所述待处理文本进行所述分类处理,直到计算出的目标概率高于或等于所述预设阈值为止,并将最终得到的待确认文本类别作为目标文本类别。
  7. 根据权利要求6所述的装置,其特征在于,所述装置还包括:
    第二确定单元,用于确定对所述待处理文本进行分类处理的多种分类方法;以及
    获取单元,用于获取所述多种分类方法组成的分类方法集合,其中,所述分类方法集合包括所述第一分类方法。
  8. 根据权利要求6所述的装置,其特征在于,所述计算单元包括:
    第一计算模块,用于将所述第一从属概率和所述第一历史从属概率相乘,得到第一目标从属概率;
    第二计算模块,用于将第一非从属概率和第一历史非从属概率相乘,得到第一目标非从属概率,其中,所述第一非从属概率为根据所述第一分类方法判定所述待处理文本不属于所述第一待确认文本类别的概率,所述第一历史非从属概率 为所述预设数据库中存储的所述待处理文本不属于所述第一待确认文本类别的概率;
    第三计算模块,用于将所述第一目标从属概率与所述第一目标非从属概率相加,得到第一目标子概率;以及
    第四计算模块,用于将所述第一目标从属概率与所述第一目标子概率相除,得到所述第一目标概率。
  9. 根据权利要求6所述的装置,其特征在于,所述装置还包括:更新单元,用于以最终计算出的目标概率更新所述预设数据库中存储的与最终采用的分类方法对应的历史从属概率。
  10. 根据权利要求6所述的装置,其特征在于,所述装置还包括:输出单元,用于输出所述目标文本类别至目标地址。
PCT/CN2016/107313 2015-12-11 2016-11-25 文本分类的处理方法及装置 WO2017097118A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510921141.1 2015-12-11
CN201510921141.1A CN106874291A (zh) 2015-12-11 2015-12-11 文本分类的处理方法及装置

Publications (1)

Publication Number Publication Date
WO2017097118A1 true WO2017097118A1 (zh) 2017-06-15

Family

ID=59013723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107313 WO2017097118A1 (zh) 2015-12-11 2016-11-25 文本分类的处理方法及装置

Country Status (2)

Country Link
CN (1) CN106874291A (zh)
WO (1) WO2017097118A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110597985A (zh) * 2019-08-15 2019-12-20 重庆金融资产交易所有限责任公司 基于数据分析的数据分类方法、装置、终端及介质
CN111191447B (zh) * 2019-12-18 2023-07-14 东软集团股份有限公司 一种设备缺陷的分类方法、装置及设备
CN112380346B (zh) * 2020-11-23 2023-04-25 宁波深擎信息科技有限公司 金融新闻情感分析方法、装置、计算机设备及存储介质
CN113806542B (zh) * 2021-09-18 2024-05-17 上海幻电信息科技有限公司 文本分析方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059796A (zh) * 2006-04-19 2007-10-24 中国科学院自动化研究所 基于概率主题词的两级组合文本分类方法
WO2011093925A1 (en) * 2010-02-01 2011-08-04 Alibaba Group Holding Limited Method and apparatus of text classification
CN103473356A (zh) * 2013-09-26 2013-12-25 苏州大学 一种篇章级情感分类方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062498B2 (en) * 2001-11-02 2006-06-13 Thomson Legal Regulatory Global Ag Systems, methods, and software for classifying text from judicial opinions and other documents
US8713007B1 (en) * 2009-03-13 2014-04-29 Google Inc. Classifying documents using multiple classifiers
CN101587493B (zh) * 2009-06-29 2012-07-04 中国科学技术大学 文本分类方法
CN102033964B (zh) * 2011-01-13 2012-05-09 北京邮电大学 基于块划分及位置权重的文本分类方法
CN103514174B (zh) * 2012-06-18 2019-01-15 北京百度网讯科技有限公司 一种文本分类方法和装置
US9195910B2 (en) * 2013-04-23 2015-11-24 Wal-Mart Stores, Inc. System and method for classification with effective use of manual data input and crowdsourcing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101059796A (zh) * 2006-04-19 2007-10-24 中国科学院自动化研究所 基于概率主题词的两级组合文本分类方法
WO2011093925A1 (en) * 2010-02-01 2011-08-04 Alibaba Group Holding Limited Method and apparatus of text classification
CN103473356A (zh) * 2013-09-26 2013-12-25 苏州大学 一种篇章级情感分类方法及装置

Also Published As

Publication number Publication date
CN106874291A (zh) 2017-06-20

Similar Documents

Publication Publication Date Title
US20200272913A1 (en) Recommendation Method and Apparatus
US11308405B2 (en) Human-computer dialogue method and apparatus
CN110619423B (zh) 多任务预测方法、装置、电子设备及存储介质
US10218716B2 (en) Technologies for analyzing uniform resource locators
WO2017097118A1 (zh) 文本分类的处理方法及装置
CN106484777B (zh) 一种多媒体数据处理方法以及装置
US10679012B1 (en) Techniques to add smart device information to machine learning for increased context
CN110598802A (zh) 一种内存检测模型训练的方法、内存检测的方法及装置
WO2021151336A1 (zh) 基于注意力机制的道路图像目标检测方法及相关设备
WO2017092623A1 (zh) 文本向量表示方法及装置
US10747961B2 (en) Method and device for identifying a sentence
US10546574B2 (en) Voice recognition apparatus and method
CN111178537B (zh) 一种特征提取模型训练方法及设备
CN111914159A (zh) 一种信息推荐方法及终端
US20140279815A1 (en) System and Method for Generating Greedy Reason Codes for Computer Models
CN118043802A (zh) 一种推荐模型训练方法及装置
CN111787042A (zh) 用于推送信息的方法和装置
LU102633B1 (en) Ticket troubleshooting support system
CN116579380A (zh) 一种数据处理方法以及相关设备
CN113407866A (zh) 应用于数字化社交的大数据话题推送方法及服务器
CN110232393B (zh) 数据的处理方法、装置、存储介质和电子装置
CN113051126A (zh) 画像构建方法、装置、设备及存储介质
CN112307319A (zh) 一种页面生成方法及装置
WO2018069078A1 (en) Optimization of deep learning models
CN112925982B (zh) 用户重定向方法及装置、存储介质、计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16872310

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16872310

Country of ref document: EP

Kind code of ref document: A1