CN109726288A - File classification method and device based on artificial intelligence process - Google Patents
File classification method and device based on artificial intelligence process Download PDFInfo
- Publication number
- CN109726288A CN109726288A CN201811625414.8A CN201811625414A CN109726288A CN 109726288 A CN109726288 A CN 109726288A CN 201811625414 A CN201811625414 A CN 201811625414A CN 109726288 A CN109726288 A CN 109726288A
- Authority
- CN
- China
- Prior art keywords
- text
- classification
- texts
- history
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
Embodiment of the disclosure discloses a kind of file classification method based on artificial intelligence process, this method comprises: being classified using textual classification model to each text in the first text set for not marking classification, with the confidence level of each text in determination first text set, wherein, the textual classification model is generated based on the history text collection for having marked classification;Based on the confidence level of each text in first text set, one or more text is determined from first text set, and to one or more of text marking classifications;When the text of the new category different when the classification for including in one or more of texts after mark from the history text is concentrated, the history text collection is updated using one or more of texts after mark.New text categories can be found automatically using the method for embodiment of the disclosure, and improve the classification accuracy of textual classification model.
Description
Technical field
Present disclosure belongs to technical field of information processing more particularly to a kind of text classification based on artificial intelligence process
Method, apparatus and a kind of corresponding computer readable storage medium.
Background technique
Artificial intelligence (Artificial Intelligence), english abbreviation AI.It is research, develop for simulating,
Extend and the theory of the intelligence of extension people, method, a new technological sciences of technology and application system.Text classification, which refers to, adopts
Text (sample) is collected with natural language processing (NLP) technology and carries out automatic contingency table according to certain classification system or standard
Note.Text classification can be widely used in various fields, such as positive and negative public sentiment monitoring, intelligent customer service, differentiate spam,
Film comment emotion recognition and any classifiable task dispatching.Traditional file classification method includes two processes: 1, being based on
The sample for largely having marked classification, trains model using machine learning method;2, using model to the sample for not marking classification
This is classified.However, this method be built upon classification it is fixed on the basis of, when occur new sample be not belonging to it is previously given
Any classification when, model classification performance will be deteriorated.
Summary of the invention
Embodiment of the disclosure provides a kind of file classification method based on artificial intelligence process, device and a kind of phase
The computer readable storage medium answered, at least to be partially solved above-mentioned and other potential problem.
The first aspect of embodiment of the disclosure proposes a kind of file classification method based on artificial intelligence process, described
File classification method the following steps are included:
A. classified using textual classification model to each text in the first text set for not marking classification, with determination
The confidence level of each text in first text set, wherein the textual classification model is based on the history for having marked classification
Text set generates;
B. the confidence level based on each text in first text set, from first text set determine one or
Multiple texts, and to one or more of text marking classifications;
It C. include different new of classification concentrated from the history text in one or more of texts after mark
When the text of classification, the history text collection is updated using one or more of texts after mark;And
D. new textual classification model is generated using updated history text collection for first text set
In other texts for not marking classify.
The second aspect of embodiment of the disclosure proposes a kind of document sorting apparatus based on artificial intelligence process, described
Document sorting apparatus includes:
Processor;And
Memory makes the processor execute following steps when executed for storing instruction:
A. classified using textual classification model to each text in the first text set for not marking classification, with determination
The confidence level of each text in first text set, wherein the textual classification model is based on the history for having marked classification
Text set generates;
B. the confidence level based on each text in first text set, from first text set determine one or
Multiple texts, and to one or more of text marking classifications;
It C. include different new of classification concentrated from the history text in one or more of texts after mark
When the text of classification, the history text collection is updated using one or more of texts after mark;And
D. new textual classification model is generated using updated history text collection for first text set
In other texts for not marking classify.
The third aspect of embodiment of the disclosure proposes a kind of computer readable storage medium, including computer can be performed
Instruction, the computer executable instructions execute described device according to this hair disclosed embodiment
Based on the file classification method of artificial intelligence process described in first aspect.
According to the file classification method based on artificial intelligence process of embodiment of the disclosure, device and corresponding calculating
Machine readable storage medium storing program for executing makes it possible to carry out text classification using the sample for having marked classification on a small quantity, by increment iterative come automatic
It was found that new text categories, and the textual classification model that can timely update, to improve the classification accuracy of model.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the feature, advantage and other aspects of the presently disclosed embodiments will become
Must be more obvious, show several embodiments of the disclosure by way of example rather than limitation herein, in the accompanying drawings:
Fig. 1 shows the process of the file classification method 100 according to an embodiment of the present disclosure based on artificial intelligence process
Figure;
The process of Fig. 2 shows the according to an embodiment of the present disclosure file classification method 200 based on artificial intelligence process
Figure;
Fig. 3 shows according to an embodiment of the present disclosure for selecting new sample to be labeled according to new classification results
Illustrative methods 300 flow chart;And
Fig. 4 shows the signal of the document sorting apparatus 400 according to an embodiment of the present disclosure based on artificial intelligence process
Figure.
Specific embodiment
Below with reference to each exemplary embodiment of the attached drawing detailed description disclosure.Flow chart and block diagram in attached drawing are shown
The architecture, function and operation in the cards of method and system according to various embodiments of the present disclosure.It should be noted that
Each of flowchart or block diagram box can represent a part of a module, program segment or code, the module, journey
Sequence section or a part of code may include it is one or more for realizing in each embodiment the logic function of defined can
It executes instruction.It should also be noted that in some alternative implementations, function marked in the box can also be according to being different from
The sequence marked in attached drawing occurs.For example, two boxes succeedingly indicated can actually be basically executed in parallel, or
They can also be executed in a reverse order sometimes, this depends on related function.It should also be noted that flow chart
And/or the combination of the box in each of block diagram box and flowchart and or block diagram, it can be used as defined in execution
The dedicated hardware based systems of functions or operations is realized, or the combination of specialized hardware and computer instruction can be used
To realize.
Term as used herein "include", "comprise" and similar terms are open terms, i.e., " including/include but
It is not limited to ", expression can also include other content.Term "based" is " being based at least partially on ".Term " one embodiment "
It indicates " at least one embodiment ";Term " another embodiment " expression " at least one other embodiment " etc..
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as part of specification.For the company between each unit in attached drawing
Line, it is only for convenient for explanation, indicate that the unit at least line both ends is in communication with each other, it is not intended that the non-line of limitation
Unit between can not communicate.
For ease of description, some terms occurred in present disclosure are illustrated below, it should be understood that the application
Used term, which should be interpreted that, to be had and it is in the context of present specification and in relation to the consistent meaning of meaning in field
Justice.
As previously mentioned, traditional file classification method needs have largely marked the sample of classification to train model, and
Classified using model to the sample for not marking classification.However, this method is only applicable to the fixed classification scene of classification, such as
Fruit has new sample to be not belonging to previously given any classification, and the performance of model will be deteriorated.
In order to solve problems, embodiment of the disclosure provides improved file classification method, makes it possible to utilize
The sample for having marked classification on a small quantity carries out text classification, finds new text categories automatically by increment iterative, and can and
Shi Gengxin textual classification model, to improve the classification accuracy of model.
Fig. 1 shows the process of the file classification method 100 according to an embodiment of the present disclosure based on artificial intelligence process
Figure.As shown in the flowchart, method 100 the following steps are included:
Step 101: classified using textual classification model to each text in the first text set for not marking classification,
To determine the confidence level of each text in first text set, wherein text disaggregated model is based on having marked going through for classification
History text set generates.In this step, can based on the textual classification model trained to each text not marked into
Row classification, to generate the confidence level of each text, confidence level can indicate a possibility that each text belongs to particular category.Example
Such as, textual classification model can based on such as machine learning method (e.g., including but be not limited to naive Bayesian, supporting vector
Machine (SVM) and such as CNN (convolutional neural networks), RNN (Recognition with Recurrent Neural Network), shot and long term memory models (LSTM) etc
Deep learning method etc.) history text collection is trained to generate, training process may include for example extracting about text
Characteristic information (for example, TF (word frequency)/IDF (reverse document frequency) feature, bag of words feature etc.) and be sent into model and be trained.
Step 102: the confidence level based on each text in first text set determines one from first text set
Or multiple texts, and to the one or more text marking classification.In this step, can be determined based on confidence level will mark
One or more texts of (being labeled for example, being pushed to artificial judgment).For example, confidence can be selected from the first text set
Degree is that low one or more texts are labeled, or one or more texts can be selected to be marked according to other way
Note.
Step 103: including different from the classification that the history text is concentrated in one or more text after mark
When the text of new category, the history text collection is updated using the one or more text after mark.It in this step, can be with
Expand history text collection using the one or more text after mark.
Step 104: generating new textual classification model using updated history text collection for first text
Other texts that this concentration does not mark are classified.In this step, it is new to train to can use the history text collection of expansion
Disaggregated model classify to the remaining text that do not mark, generate new textual classification model process be similar in step
Generating process described in 101.
In some embodiments, which includes different classes of multiple subsets, each of multiple subset
Subset includes the text of the same category.For example, history text collection can have different classes of text.For example, text can wrap
Include short text or long text (including sentence).
In some embodiments, step 102 may include: to generate set corresponding with the classification that the history text is concentrated
Confidence threshold;And it is based on the confidence threshold value, one or more of texts are selected from second text set.In the step
In rapid, the one or more texts to be marked can be selected based on confidence threshold value, for example, can select from the first text set
One or more texts that confidence level is selected lower than confidence threshold value are labeled, or can be selected to mark according to other way
One or more texts of note.By for different classes of setting confidence threshold value, can by the setting to confidence threshold value come
It efficiently controls and different classes of text selecting is labeled.
In some embodiments, step 102 can also include: the classification results based on each text, adjust the confidence
Spend threshold value.In this step, confidence threshold value can be dynamically adjusted based on classification results, allows to be particular category
Text is selected more often to be labeled.
In some embodiments, the classification results based on each text, adjusting the multiple confidence threshold value can wrap
Include: the classification results based on each text in first text set calculate different classes of text scale;And not based on this
The ratio of generic text adjusts the confidence threshold value.In this step, it is adjusted according to the text scale of classification results dynamic
Confidence threshold value, such as particular category ratio are lower (for example, new category ratio is lower at the beginning), can setting respective classes
Confidence threshold is adjusted to higher, allows to be that the text of particular category (for example, new category) is more selected as and is labeled,
A possibility that improving discovery particular category, thus the effectively ratio of balanced different classes of sample, the classification for improving model is quasi-
Exactness.
In some embodiments, step 103 may include: that the one or more text after mark is added to the history
Text set.In this step, expand text collection by the way that one or more texts after mark are added to history text collection.
In some embodiments, the one or more text after mark is added to the history text collection may include:
The quantity of text based on the new category calculates the phase of each text and the text of the new category in the one or more text
Like degree;Based on the similarity, at least one text in the one or more text is determined, and again at least one text
Mark classification;And the one or more text of at least one text including marking again is added to the history text
Collection.In this step, it if the amount of text of newly-increased classification or very few, can calculate similar to newly-increased classification text
Degree, and at least one (for example, some or all of etc.) text to mark again is selected based on similarity, to further mention
A possibility that text of high discovery new category.For example, text can be calculated for example, by TF/IDF, PMI (mutual information between point) etc.
Similarity between this, and can choose at least one text that for example similarity is higher than a certain similarity threshold and marked
Note, or at least one text can be selected to be labeled according to other way.
In some embodiments, the new text for not marking classification can be received with the time to update first text
Collection.It should be appreciated that above-mentioned step 101-104 can be iteratively performed to realize the classification to text, to identify all
Classification the discovery of all categories is thus completed in a manner of increment since the first text set is constantly expanding, if after
Changing occurs in continuous text distribution, and can also timely update model, make correct judgement to the new category text of appearance.
According to Fig. 1 described embodiment, compared with traditional file classification method, improved text classification side is provided
Method makes it possible to carry out text classification using the sample for having marked classification on a small quantity, finds new text automatically by increment iterative
This classification, and if changing occurs in subsequent samples distribution, can timely update model, make to the new category sample of appearance
Correctly judgement, to improve the classification accuracy of model.
For example, can text classification in such as scene of collection business, sale or customer service etc in application method 100
Model.For example, in collection business scenario, employee (that is, the person of urging) and use based on mechanism (for example, internet financial institution etc.)
Talk between family (that is, by personnel are urged), can according to dialog history information using the textual classification model in method 100 to
All sentences at family carry out intent classifier, and (for example, determining whether user has loan repayment capacity and/or refund wish, such as user has
There are loan repayment capacity and refund wish, there is refund wish but there is no loan repayment capacity, there is loan repayment capacity but do not have refund wish, without also
Money ability is also without the classifications such as refund wish or the classification of various other types), corresponding response then is established to these classifications
Mode, so that how this answers when employee knows the sentence for encountering related category.Similarly, above-mentioned textual classification model can be with
Applied to sale, customer service scene, process be it is similar, be no longer described in detail.
The process of Fig. 2 shows the according to an embodiment of the present disclosure file classification method 200 based on artificial intelligence process
Figure.As shown in the flowchart, method 200 the following steps are included:
Step 201: training textual classification model using the sample (text) for having marked classification on a small quantity.In the step
In, initially, it can use the mode as described in the step 101 of method 100 to train initial textual classification model.
Step 202: being classified to the sample for not marking classification using text disaggregated model to generate each sample
Confidence level, and the sample for selecting one or more confidence levels low is labeled classification.In this step, it can choose such as confidence
Degree is labeled lower than the sample of confidence threshold value, or the sample that can be selected confidence level low according to other way.
Step 203: training new textual classification model using the sample newly marked and the previous sample marked.?
In the step, the sample that has marked can be expanded to update textual classification model.
Step 204: being classified using new textual classification model to the other texts not marked.In this step, may be used
More accurately to be classified to the text not marked using updated textual classification model.
Step 205: selecting new sample to be labeled according to new classification results.It, can be with after the completion of step 205 executes
Return step 203 carries out in a manner of iteratively, until there is no new categories to occur.
Described embodiment according to fig. 2 provides improved file classification method, makes it possible to using having marked on a small quantity
The sample of classification carries out text classification, finds new text categories, and if subsequent samples point automatically by increment iterative
Changing occurs in cloth, and can timely update model, correct judgement be made to the new category sample of appearance, to improve model
Classification accuracy.
For example, can text classification in such as scene of collection business, sale or customer service etc in application method 200
Model.For example, in collection business scenario, employee (that is, the person of urging) and use based on mechanism (for example, internet financial institution etc.)
Talk between family (that is, by personnel are urged), can according to dialog history information using the textual classification model in method 200 to
All sentences at family carry out intent classifier, then corresponding response mode are established to these classifications, so that employee, which knows, encounters phase
How this answers when the sentence of pass classification.Similarly, above-mentioned textual classification model can also be applied to sale, customer service scene,
Process be it is similar, be no longer described in detail.
Fig. 3 shows according to an embodiment of the present disclosure for selecting new sample to be labeled according to new classification results
Illustrative methods 300 flow chart, i.e. Fig. 3 shows the example implementations of the frame 205 in the method 200 of Fig. 2.Such as stream
Shown in journey figure, method 300 includes:
Step 301: the confidence threshold value based on classification dynamically adjusts confidence level threshold according to the sample proportion of classification results
Value selects confidence level to be labeled lower than the sample of confidence threshold value.It in this step, can be based on classification results come dynamically
Confidence threshold value is adjusted, allows to be that the sample of particular category is selected more often to be labeled.Such as particular category ratio
Example lower (for example, new category ratio is lower at the beginning), can be adjusted to higher for the confidence threshold value of respective classes, so that can
It can be a possibility that sample of particular category (for example, new category) is more selected as and is labeled, improves discovery particular category,
To the ratio of effectively balanced different classes of sample, the classification accuracy of model is improved.
Step 302: determining whether the quantity of the sample of new category is very few.In this step, it can be determined that by step 301
Whether the quantity of what can be obtained the be labeled as sample of new category is very few.
Step 303: if determining that the quantity of the sample of new category is very few at step 302, calculating the sample with new category
This similarity, the sample for selecting similarity to be greater than a certain similarity threshold are labeled.On the contrary, if true at step 302
Determine the sample of new category quantity be not it is very few, then skip step 303.In the step 303, if the textual data of newly-increased classification
It measures or very few, then can calculate the similarity with newly-increased classification sample, and select the sample to be marked, example based on similarity
The sample for such as selecting similarity to be greater than a certain similarity threshold is labeled, thus further increase the sample of discovery new category
Possibility.For example, the similarity between sample can be calculated for example, by TF/IDF, PMI (mutual information between point) etc., and can
To select the sample that for example similarity is higher than a certain similarity threshold to be labeled, or sample can be selected according to other way
Originally it is labeled.
According to Fig. 3 described embodiment, can allow to be that the sample of new category is more pushed to mark, from
And a possibility that improving discovery new category, and the effectively ratio of balanced different classes of sample.It is similar with method 100,200,
Method 300 also can be applied in such as scene of collection business, sale or customer service etc.
Fig. 4 shows the signal of the document sorting apparatus 400 according to an embodiment of the present disclosure based on artificial intelligence process
Figure.Device 400 may include: memory 401 and the processor 402 for being coupled to memory 401.Memory 401 refers to for storing
Enable, make processor 402 when executed execute various methods described herein (method 100 of such as Fig. 1,
Method 200, the method 300 of Fig. 3 of Fig. 2) in one or more movements or step.
Memory 401 may include volatile memory and nonvolatile memory, such as ROM (read only
Memory), RAM (random access memory), mobile disk, disk, CD and USB flash disk etc..Processor 402 can be center
Processor (CPU), microcontroller, specific integrated circuit (ASIC), digital signal processor (DSP), field programmable gate array
(FPGA) or other programmable logic device or be configured as realize embodiment of the disclosure one or more integrated circuits
Deng.
Additionally or alternatively, the above method can be by computer program product, i.e. computer readable storage medium is real
It is existing.Computer program product may include computer readable storage medium, containing for executing each of present disclosure
The computer-readable program instructions of aspect.Computer readable storage medium, which can be, can keep and store by instruction execution equipment
The tangible device of the instruction used.Computer readable storage medium for example can be but not limited to storage device electric, magnetic storage is set
Standby, light storage device, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.It is computer-readable
The more specific example (non exhaustive list) of storage medium includes: portable computer diskette, hard disk, random access memory
(RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory
(SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding
Equipment, the punch card for being for example stored thereon with instruction or groove internal projection structure and above-mentioned any appropriate combination.Here
Used computer readable storage medium is not interpreted as instantaneous signal itself, such as radio wave or other Free propagations
Electromagnetic wave, the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) propagated by waveguide or other transmission mediums or pass through
The electric signal of electric wire transmission.
In general, the various example embodiments of the disclosure can in hardware or special circuit, software, firmware, logic, or
Implement in any combination thereof.Some aspects can be implemented within hardware, and other aspects can be can be by controller, micro process
Implement in the firmware or software that device or other calculating equipment execute.When the various aspects of embodiment of the disclosure are illustrated or described as frame
When figure, flow chart or other certain graphical representations of use, it will be understood that box described herein, device, system, techniques or methods can
Using as unrestricted example in hardware, software, firmware, special circuit or logic, common hardware or controller or other in terms of
It calculates and implements in equipment or its certain combination.
It should be noted that although being referred to several modules or unit of device in the detailed description above, this stroke
It point is only exemplary rather than enforceable.In fact, in accordance with an embodiment of the present disclosure, two or more above-described modules
Feature and function can be embodied in a module.Conversely, the feature and function of an above-described module can be into
One step, which is divided by multiple modules, to be embodied.
The foregoing is merely embodiment of the disclosure alternative embodiments, are not limited to embodiment of the disclosure, for
For those skilled in the art, embodiment of the disclosure can have various modifications and variations.It is all in embodiment of the disclosure
Within spirit and principle, made any modification, equivalence replacement, improvement etc. should be included in the protection of embodiment of the disclosure
Within the scope of.
Although describing embodiment of the disclosure by reference to several specific embodiments, it should be appreciated that, the disclosure
Embodiment is not limited to disclosed specific embodiment.Embodiment of the disclosure be intended to cover appended claims spirit and
Included various modifications and equivalent arrangements in range.Scope of the following claims is to be accorded the broadest interpretation, thus comprising
All such modifications and equivalent structure and function.
Claims (15)
1. a kind of file classification method based on artificial intelligence process, which comprises the following steps:
A. classified using textual classification model to each text in the first text set for not marking classification, described in determination
The confidence level of each text in first text set, wherein the textual classification model is based on the history text for having marked classification
Collection is to generate;
B. the confidence level based on each text in first text set determines one or more from first text set
Text, and to one or more of text marking classifications;
It C. include the new category different from the classification that the history text is concentrated in one or more of texts after mark
Text when, the history text collection is updated using one or more of texts after mark;And
D. generate new textual classification model using updated history text collection with for in first text set not
Other texts of mark are classified.
2. the method according to claim 1, wherein the history text collection includes different classes of multiple sons
Collect, each subset in the multiple subset includes the text of the same category.
3. file classification method according to claim 1, which is characterized in that based in first text set in step B
Each text confidence level, from first text set determine one or more texts include:
Generate confidence threshold value corresponding with the classification that the history text is concentrated;And
Based on the confidence threshold value, one or more of texts are selected from second text set.
4. file classification method according to claim 3, which is characterized in that further comprise:
Based on the classification results of each text, the confidence threshold value is adjusted.
5. file classification method according to claim 4, which is characterized in that the classification results based on each text,
Adjusting the confidence threshold value includes:
Based on the classification results of each text in first text set, different classes of text scale is calculated;And
Based on the ratio of the different classes of text, the confidence threshold value is adjusted.
6. file classification method according to claim 1, which is characterized in that step C. is one or more after mark
When including the text for the different new category of classification concentrated from the history text in a text, using one after mark
Or multiple texts include: to update the history text collection
One or more of texts after mark are added to the history text collection.
7. file classification method according to claim 6, which is characterized in that by one or more of texts after mark
Being added to the history text collection includes:
The quantity of text based on the new category calculates each text in one or more of texts and the new category
Text similarity;
Based on the similarity, at least one text in one or more of texts is determined, and at least one described text
This marks classification again;And
To include mark again described in one or more of texts of at least one text be added to the history text collection.
8. a kind of document sorting apparatus based on artificial intelligence process characterized by comprising
Processor;And
Memory makes the processor execute following steps when executed for storing instruction:
A. classified using textual classification model to each text in the first text set for not marking classification, described in determination
The confidence level of each text in first text set, wherein the textual classification model is based on the history text for having marked classification
Collection is to generate;
B. the confidence level based on each text in first text set determines one or more from first text set
Text, and to one or more of text marking classifications;
It C. include the new category different from the classification that the history text is concentrated in one or more of texts after mark
Text when, the history text collection is updated using one or more of texts after mark;And
D. generate new textual classification model using updated history text collection with for in first text set not
Other texts of mark are classified.
9. document sorting apparatus according to claim 8, which is characterized in that the history text collection includes different classes of
Multiple subsets, each subset in the multiple subset include the text of the same category.
10. document sorting apparatus according to claim 8, which is characterized in that based in first text set in step B
Each text confidence level, from first text set determine one or more texts include:
Generate confidence threshold value corresponding with the classification that the history text is concentrated;And
Based on the confidence threshold value, one or more of texts are selected from second text set.
11. document sorting apparatus according to claim 10, which is characterized in that also make institute when executed
It states processor and executes following steps:
Based on the classification results of each text, the confidence threshold value is adjusted.
12. document sorting apparatus according to claim 11, which is characterized in that the classification knot based on each text
Fruit, adjusting the confidence threshold value includes:
Based on the classification results of each text in first text set, the ratio of different classes of text is calculated;And
Based on the ratio of the different classes of text, the confidence threshold value is adjusted.
13. document sorting apparatus according to claim 8, which is characterized in that step C. when mark after it is one or
When including the text of the new category different from the classification that the history text is concentrated in multiple texts, described one after mark is utilized
A or multiple texts include: to update the history text collection
One or more of texts after mark are added to the history text collection.
14. document sorting apparatus according to claim 13, which is characterized in that by one or more of texts after mark
Originally being added to the history text collection includes:
The quantity of text based on the new category calculates each text in one or more of texts and the new category
Text similarity;
Based on the similarity, at least one text in one or more of texts is determined, and at least one described text
This marks classification again;And
To include mark again described in one or more of texts of at least one text be added to the history text collection.
15. a kind of computer readable storage medium, including computer executable instructions, the computer executable instructions are in device
Described device is made to execute the text described in any one of -7 based on artificial intelligence process point according to claim 1 when middle operation
Class method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811625414.8A CN109726288A (en) | 2018-12-28 | 2018-12-28 | File classification method and device based on artificial intelligence process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811625414.8A CN109726288A (en) | 2018-12-28 | 2018-12-28 | File classification method and device based on artificial intelligence process |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109726288A true CN109726288A (en) | 2019-05-07 |
Family
ID=66297432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811625414.8A Pending CN109726288A (en) | 2018-12-28 | 2018-12-28 | File classification method and device based on artificial intelligence process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109726288A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457675A (en) * | 2019-06-26 | 2019-11-15 | 平安科技(深圳)有限公司 | Prediction model training method, device, storage medium and computer equipment |
CN111464707A (en) * | 2020-03-30 | 2020-07-28 | 中国建设银行股份有限公司 | Outbound call processing method, device and system |
CN112579768A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Emotion classification model training method, text emotion classification method and text emotion classification device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561805A (en) * | 2008-04-18 | 2009-10-21 | 日电(中国)有限公司 | Document classifier generation method and system |
CN105260738A (en) * | 2015-09-15 | 2016-01-20 | 武汉大学 | Method and system for detecting change of high-resolution remote sensing image based on active learning |
CN106407333A (en) * | 2016-09-05 | 2017-02-15 | 北京百度网讯科技有限公司 | Artificial intelligence-based spoken language query identification method and apparatus |
CN107358257A (en) * | 2017-07-07 | 2017-11-17 | 华南理工大学 | Under a kind of big data scene can incremental learning image classification training method |
CN108830332A (en) * | 2018-06-22 | 2018-11-16 | 安徽江淮汽车集团股份有限公司 | A kind of vision vehicle checking method and system |
-
2018
- 2018-12-28 CN CN201811625414.8A patent/CN109726288A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561805A (en) * | 2008-04-18 | 2009-10-21 | 日电(中国)有限公司 | Document classifier generation method and system |
CN105260738A (en) * | 2015-09-15 | 2016-01-20 | 武汉大学 | Method and system for detecting change of high-resolution remote sensing image based on active learning |
CN106407333A (en) * | 2016-09-05 | 2017-02-15 | 北京百度网讯科技有限公司 | Artificial intelligence-based spoken language query identification method and apparatus |
CN107358257A (en) * | 2017-07-07 | 2017-11-17 | 华南理工大学 | Under a kind of big data scene can incremental learning image classification training method |
CN108830332A (en) * | 2018-06-22 | 2018-11-16 | 安徽江淮汽车集团股份有限公司 | A kind of vision vehicle checking method and system |
Non-Patent Citations (2)
Title |
---|
张磊: "《特定领域命名实体识别通用方法的研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
涂盛慧: "《基于关键词的非法实验申请分类系统的设计与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457675A (en) * | 2019-06-26 | 2019-11-15 | 平安科技(深圳)有限公司 | Prediction model training method, device, storage medium and computer equipment |
CN110457675B (en) * | 2019-06-26 | 2024-01-19 | 平安科技(深圳)有限公司 | Predictive model training method and device, storage medium and computer equipment |
CN112579768A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Emotion classification model training method, text emotion classification method and text emotion classification device |
CN111464707A (en) * | 2020-03-30 | 2020-07-28 | 中国建设银行股份有限公司 | Outbound call processing method, device and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tur et al. | Combining active and semi-supervised learning for spoken language understanding | |
CN109492101B (en) | Text classification method, system and medium based on label information and text characteristics | |
Muangkammuen et al. | Automated thai-faq chatbot using rnn-lstm | |
CN107766929B (en) | Model analysis method and device | |
CN110309514A (en) | A kind of method for recognizing semantics and device | |
US20120136812A1 (en) | Method and system for machine-learning based optimization and customization of document similarities calculation | |
CN106296195A (en) | A kind of Risk Identification Method and device | |
CN106126751A (en) | A kind of sorting technique with time availability and device | |
CN106294344A (en) | Video retrieval method and device | |
CN109726288A (en) | File classification method and device based on artificial intelligence process | |
CN112559734B (en) | Brief report generating method, brief report generating device, electronic equipment and computer readable storage medium | |
CN109948735A (en) | A kind of multi-tag classification method, system, device and storage medium | |
CN111339260A (en) | BERT and QA thought-based fine-grained emotion analysis method | |
CA3123387C (en) | Method and system for generating an intent classifier | |
CN112528031A (en) | Work order intelligent distribution method and system | |
CN109948160A (en) | Short text classification method and device | |
CN110399467A (en) | The method and apparatus of training data for natural language question answering system is provided | |
CN108665158A (en) | A kind of method, apparatus and equipment of trained air control model | |
CN114077836A (en) | Text classification method and device based on heterogeneous neural network | |
Merello et al. | Investigating timing and impact of news on the stock market | |
CN111160959A (en) | User click conversion estimation method and device | |
US20220414344A1 (en) | Method and system for generating an intent classifier | |
US20230004581A1 (en) | Computer-Implemented Method for Improving Classification of Labels and Categories of a Database | |
CN115953080A (en) | Engineer service level determination method, apparatus and storage medium | |
CN114547264A (en) | News diagram data identification method based on Mahalanobis distance and comparison learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190507 |
|
WD01 | Invention patent application deemed withdrawn after publication |