CN107766371A - A kind of text message sorting technique and its device - Google Patents

A kind of text message sorting technique and its device Download PDF

Info

Publication number
CN107766371A
CN107766371A CN201610693358.6A CN201610693358A CN107766371A CN 107766371 A CN107766371 A CN 107766371A CN 201610693358 A CN201610693358 A CN 201610693358A CN 107766371 A CN107766371 A CN 107766371A
Authority
CN
China
Prior art keywords
text
key word
information
word information
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610693358.6A
Other languages
Chinese (zh)
Other versions
CN107766371B (en
Inventor
周晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610693358.6A priority Critical patent/CN107766371B/en
Priority to PCT/CN2017/093896 priority patent/WO2018032937A1/en
Publication of CN107766371A publication Critical patent/CN107766371A/en
Application granted granted Critical
Publication of CN107766371B publication Critical patent/CN107766371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The embodiment of the present invention provides a kind of text message sorting technique and its device, by the sample key word information collection for pre-setting text categories, and the corresponding relation of the sample key word information collection and text categories information is established, matching basis is provided when classifying text information is classified for follow-up treat;When treating classifying text information and carrying out classification processing, classifying text information extraction key word information is treated according to preset rules, the corresponding text categories information of text message to be sorted is matched according to the corresponding relation of sample key word information collection and text categories;Only need carry out system automatically to match by the information classification mode of the present invention, drastically increase the efficiency of classification processing, shorten the cycle of analysis, reduce the error of manual allocation, improve the degree of accuracy of matching.

Description

A kind of text message sorting technique and its device
Technical field
The present invention relates to the sorting technique field of text message, more particularly to a kind of text message sorting technique and its dress Put.
Background technology
With the development of information classification technology, the information processing department in each enterprise, sea all can be received or accumulated daily The information of amount, in some cases, it is desirable to the information of a certain classification is extracted from the information, but due to these information and classification Between do not establish direct corresponding relation, therefore, it is impossible to directly be extracted with search engine retrieving.It is existing to information The method of classification is typically to be analyzed one by one by the way of artificial, can so spend many manpowers artificial.And simultaneously With being continuously increased for interactive information quantity, or the continuous cumulative rises of related work daily, at this moment, if needing again identical Time in these information have been handled in high quality, then need to improve the processing speed of staff or the more people of input Power resource, but it is difficult the equal requirement for accomplishing efficiency and quality that the mode of currently employed manpower, which is, because this pass through people Classify for wisdom, do not ensure that each staff has identical cognition to the classification of information so that in classification pair A certain degree of difference is also had in the recall ratio of information, causes the accuracy rate of classification relatively low.
The content of the invention
Text message sorting technique provided in an embodiment of the present invention and its device, to solve mainly to pass through people in the prior art The mode of work carries out classification processing to text message, caused by analytical cycle grow, operating efficiency is low, and the skill that recall ratio is not high Art problem.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of text message sorting technique, including:
Obtain text message to be sorted;
Key word information collection, the key word information Ji Bao are extracted from the text message to be sorted according to preset rules Include at least one key word information;
According to the key word information collection, and default sample key word information collection pass corresponding with text categories information System, matches text categories information corresponding to the key word information collection;
The text message to be sorted is classified according to the text categories information matched.
The embodiment of the present invention also provides a kind of text message sorter, including:Acquisition module, extraction module, matching mould Block and sort module;
The acquisition module is used to obtain text message to be sorted;
The extraction module is used to extract key word information collection, institute from the text message to be sorted according to preset rules Stating key word information collection includes at least one key word information;
The matching module is used for according to the key word information collection, and default sample key word information collection and text The corresponding relation of classification information, match text categories information corresponding to the key word information collection;
The sort module is used to classify to the text message to be sorted according to the text categories information matched.
The embodiment of the present invention also provides a kind of computer-readable storage medium, and computer is stored with the computer-readable storage medium Executable instruction, the computer executable instructions are used to perform foregoing text message sorting technique.
The beneficial effects of the invention are as follows:
Text message sorting technique, device and the computer-readable storage medium provided according to embodiments of the present invention, by pre- The sample key word information collection of text categories is first set, and establishes the sample key word information collection and text categories of text classification The corresponding relation of information, matching basis is provided when classifying text information is classified for follow-up treat, and it is automatic to realize Matching classification provides possibility;Further, when carrying out text message classification to be sorted, according to preset rules to text to be sorted This information carries out the extraction of key word information collection, the information for best embodying text categories that will be extracted from text message to be sorted Match cognization is carried out with the sample key word information collection pre-set, corresponding text categories information is obtained, it is achieved thereby that being System treats the automatic identification matching of classifying text information, and this kind of matching way only needs carry out system automatically to match, pole The earth improves the efficiency of classification processing, shortens the cycle of analysis.Pass through the sample key word information with fixed correspondence Collect the mode matched to be classified, reduce the error of manual allocation, improve the degree of accuracy of matching.
Brief description of the drawings
Fig. 1 is the flow chart for the text message sorting technique that first embodiment of the invention provides;
Fig. 2 is that the user that second embodiment of the invention provides is classified by client using text message sorting technique Process chart;
Fig. 3 is the flow chart of the key word information collection for each text categories of extension that second embodiment of the invention provides;
Fig. 4 is the study process chart for the disaggregated model that second embodiment of the invention provides;
Fig. 5 is that being interacted by browser and service end for second embodiment of the invention offer realizes showing for text message classification It is intended to;
Fig. 6 is the process chart classified to wall scroll text message that third embodiment of the invention provides;
Fig. 7 is the process chart classified to batch text information that third embodiment of the invention provides;
Fig. 8 is the structural representation for the text message sorter that fourth embodiment of the invention provides.
Embodiment
The embodiment of the present invention is described in further detail below by embodiment combination accompanying drawing.
First embodiment:
In order to which in the prior art, the classification for information is typically to be classified using manually-operated mode, causes work Make the problem of efficiency is low and the degree of accuracy is not high, the embodiment of the invention discloses a kind of text message sorting technique and system, according to Preset rules carry out the extraction of key word information collection to the text message to be sorted got, according to the key word information of extraction Collection, and default sample key word information collection and the corresponding relation of text classification information, match the text message to be sorted Text categories information corresponding to key word information collection, classifying text information is finally treated according to text categories information and classified, It is achieved thereby that to the automatic sort operation of text message, operating efficiency, the degree of accuracy of classification etc. are greatly enhanced.
Fig. 1 is referred to, Fig. 1 is the process chart for the text message sorting technique that the present embodiment provides.
The processing step for the text message sorting technique that the present embodiment provides, it is specific as follows:
S101, obtain text message to be sorted.
Preferably, the text message to be sorted got includes at least one text message, at least one text message Can be one text classification or different text categories;
In the present embodiment, the text message to be sorted got can also be what is be converted to by other kinds of information Text message, e.g. voice, video information etc., when the information got is voice, during information classification is carried out What is obtained is text message corresponding to the voice, and text envelope is converted the speech into particular by speech text identification plug-in unit Breath;Similarly, for other kinds of non-textual information, it is also desirable to carry out information by changing plug-in unit and be converted to corresponding text This information.
S102, key word information collection, the keyword letter are extracted from the text message to be sorted according to preset rules Breath collection includes at least one key word information.
In the present embodiment, after text message to be sorted is got, classifying text information is treated according to default rule Handled, specifically carry out the segmentation of word to the text message to be sorted according to word segmentation processing technology, treat this after segmentation Classifying text information is divided at least one key word information, and the key word information that collection segmentation obtains forms the text to be sorted The key word information collection of information.
Preferably, when being split according to participle technique to text message, first remove in the text message to be sorted Punctuation mark, then sequentially carried out the segmentation of keyword originally according to the text message to be sorted.
In the present embodiment, after the completion of segmentation, at least one keyword is obtained, but the keyword is not all Can for text message classification make substantive contribution, on some keywords in all categories all can existing for, as example call word, number Words language, measure word, time word etc., these words are suitable for almost in all information exchanges, therefore, after the completion of segmentation, also Need that keyword is carried out to extract handsome choosing, selection best embodies the keyword of the classification of text information, forms a keyword Information collection.
Specific example is named to illustrate the extraction for carrying out keyword to text message by participle technique, such as to text This information " joyful No. 2 boxes 18 in hotel at night:00 point, king always comes " segmented, first remove useless symbol, example before participle Such as punctuation mark, exception symbol, become " joyful No. 2 boxes 18 in hotel at night after removing punctuate:00 king always comes ";Then Segmented, become after participle:" evening/joyful hotel/No. 2//box/18:00/ point/Wang is total/also/it is next ";In the text In information, the key word information that can embody the classification of text information is " hotel ", " box ", by " hotel ", " box " from text One key word information collection of composition is extracted in this information.
S103, according to the key word information collection, and default sample key word information collection and text categories information Corresponding relation, match text categories information corresponding to the key word information collection.
After the extraction of classifying text information progress key word information collection is treated, according to obtained key word information set pair text This information is classified.Preferably, the sample key word information collection of each text categories pre-set is obtained, then will be from treating The key word information collection extracted in classifying text information is matched with the sample key word information collection of each text categories respectively, Text categories information is obtained according to the corresponding relation between default sample key word information collection and text categories, specifically inquired about Key word information in text message to be sorted is concentrated in the sample key word information of text categories whether there is, if in the presence of, The sample key word information collection of corresponding text categories is labeled, finally text categories are believed according to corresponding to identifying mark Breath.
In the present embodiment, for the corresponding relation between default key word information and text categories particular by with Under type obtains:The multiple sample text information got in advance are classified, and extracted each in each text categories after classification The key word information of individual sample text information, form the sample key word information collection;By from the sample of same text categories Corresponding relation is established between the sample key word information collection and text classification information that are extracted in text message.
Specifically, first existing sample text information, the sample text information can be in system in acquisition system History text information or system obtained categorized good classification text message template is downloaded from internet.
When sample text information is the history text information in system, staff is first according in sample text information Hold the mark that these sample text information are carried out with classification, after the completion of mark, classification is carried out to sample text information according to mark Classification statistics, all sample text information are opened into storage according to class discrimination;Further, according to classification to sample of all categories This text message carries out word segmentation processing, extracts the key word information collection of classification, finally establishes from sample text information and extract The corresponding relation of the key word information collection arrived and corresponding text categories information.
In the present embodiment, step S103 includes:Each key word information that the key word information is concentrated, with presetting Each text categories information corresponding to sample key word information collection matched, obtain and each sample key word information collection one Former first character string corresponding to one obtains the character of original second rearranged by each former first character string according to preset order String;Former first character string includes character 0 and/or character 1, and the sequence of positions where each character 0 and 1 and each text class Other each key word information is one-to-one, the table of character 0 in the sequence of positions that corresponding sample key word information is concentrated Show that the key word information of the text message to be sorted is not present in the sample key word information and concentrated, the character 1 represents The key word information of the text message to be sorted is present in the sample key word information and concentrated;According to obtained former character string Identify text categories information corresponding to the key word information collection.
Particularly concentrated according to the default each sample key word information of key word information inquiry to be sorted with the presence or absence of this The key word information of text message, if in the presence of crucial corresponding to the default sample key word information concentration being currently queried Word is labeled as " presence ", and others are labeled as " being not present ", finally after the completion of inquiry, can be exported one and is made up of character 0 or 1 Former character string, and export character string in character 0 or 1 order and position be according to sample key word information concentrate The clooating sequence output of each key word information script, such as the keyword row of the sample key word information collection of " bank transaction " classification Sequence order be【It is transferred to income bank expenditure consumption withdrawal account】, the key word information collection of text message to be sorted is " account " " consumption " " bank ", 0 and 1 position in matching in the character string that exports should be according to【Income bank expenditure is transferred to disappear Take withdrawal account】Sequential output, obtained character string is【0 0 1 0 1 0 1】.
Further, text categories information corresponding to being identified according to mark is specifically according to the character string of output point Analysis, the text categories information according to corresponding to obtaining analysis result, it is preferred that here analyzed character string with specific reference to acquisition The keyword letter of each text categories concentrate mark " exist " number classify, it, which is marked, more is more possible to.
Further, in the present embodiment, the former character string includes former first character string or former second character string, the original First character string can be understood as the character that sample key word information collection corresponding with a text categories match output String, former second character string are the character string for having multiple text categories matching outputs.When obtaining and each sample key word information collection The first character string of one-to-one original, then former character string is then former first character string, and when classifying processing, according to former first word Symbol string carries out classification analysis.
When obtaining the character string of original second rearranged by each former first character string according to preset order, then former character string It is then former second character string, and when classifying processing, classification analysis is carried out according to former second character string.
It is such as default in respect of " bank transaction ", " dinner party ", " engineering project " these three text categories, respective keyword letter Breath collection is as shown in table 1 below:
The sample key word information collection of table 1 and the corresponding relation of text categories information
For citing " evening/joyful hotel/No. 2//box/18 above:00/ point/Wang is total/also/it is next ".This sentence is After the participle technique for having carried out text message, the matching of keyword is carried out using sample key word information collection mentioned above, Such as three class text classifications " bank transaction " above, " dinner party ", the large sample key word information collection of " engineering project " composition【Turn Enter to take in bank's expenditure consumption withdrawal account credit card and produce contact ... dining it is small gather there is free box to meet and discuss restaurant of hotel private room The dinner party wine row Room is poly- to have a meal ..., and project payment beats money and beats account to account refund loan floatation ten thousand ...】
To above-mentioned text message --- " evening/joyful hotel/No. 2//box/18:00/ point/Wang is total/also/to be come " progress Analysis, extract key word information therein " hotel " " box " " next " and be present in default sample key word information concentration Content.
The character string exported after the completion of matching is:
【0 0 0 0 0 0 0 0 0……0 0 0 1 0 1 0 0 0 0 0 1……0 0 0 0 0 0 0 0 0……】
By analyzing above-mentioned character string, it is " dinner party " to obtain the text categories that text information is assigned to.
In order to further improve the accuracy rate for the text categories information for treating classifying text information matches, the present embodiment is complete Into after the matching of key word information collection, in addition to:
The obtained former character string is corrected according to the disaggregated model for learning to obtain in advance, obtains final character String, and the former character string is replaced by the final character string.
Specifically key word information collection is matched with default sample key word information collection first, matching obtains treating point The character string of the key word information collection of class text information;Mould is carried out to the character string according to the disaggregated model for learning to obtain in advance Type training learns;The text categories information according to corresponding to obtaining the result that model training learns.
In the present embodiment, the disaggregated model obtains particular by the following manner:Obtain the sample text of each text categories The sample key word information collection of this information;Each key word information collection is believed with the sample keyword of corresponding each text categories respectively Breath collection is matched, character string corresponding to output;The character string is learnt according to default training learning algorithm, learnt Obtain disaggregated model;Corresponding relation will be established between disaggregated model and text classification information.Preferably, the training learning algorithm is adopted With random forest classification learning algorithm, obtained disaggregated model is random forest disaggregated model.
S104, the text message to be sorted is classified according to the text categories information matched.
In the present embodiment, in order to solve to occur due to the sample of each text categories in matching inquiry key word information collection The keyword of this key word information collection is not comprehensive so that recall ratio is not high, causes the problem of error in classification occur, the present embodiment exists When creating the sample key word information collection of each text categories, in addition to:Expanded keyword is come by using the mode of Field Words Information collection so that the sample key word information collection in each text categories can include the keyword of corresponding classification more fully hereinafter.
In the present embodiment, can also be realized for above-mentioned each step by the processor on mobile terminal, specifically It is the sequence code that above steps function is realized by writing in memory, execution is read by processor.
Text message sorting technique provided in an embodiment of the present invention, by according to sample text information creating text categories Sample key word information collection, and sample key word information collection and the corresponding relation of text categories are established, treating classifying text When information carries out classification processing, classifying text information extraction key word information is treated according to preset rules, according to default sample Key word information collection and the corresponding relation of text categories match the corresponding text categories information of the text message to be sorted;Pass through pass Keyword realizes the matching of text categories information, simplifies the operating procedure of classification information, further solves by artificial Mode carries out classification processing to text message, caused by analytical cycle grow, the problem of operating efficiency is low.
Further, the embodiment of the present invention is also classified by way of disaggregated model to text message, using training The method of study carries out the identification classification of text message automatically, this mode classified automatically, greatly improves classification processing Operating efficiency, while also improve the degree of accuracy of classification.
Second embodiment:
Fig. 2 is refer to, Fig. 2 is that the user that the present embodiment provides is divided by client using text message sorting technique The process chart of class.
The text message sorting technique that the present embodiment is combined with client and specific application scenarios obtain, it is handled Step is as follows:
S201, obtain the sample text information in client, and be labeled classification, establish sample key word information collection and Corresponding relation between text categories information.
In this step, it is to gather text message to create text categories key word information collection, and will collects As the sample text information for creating key word information collection, text information received text message before being client History text information or the short breath in treated some applications or terminal of classifying before, chat text information etc. Deng, such as wechat, QQ etc. chat message.
In this example, it is assumed that the sample text information got is as follows:
A. " project payment 1,000,000 has been beaten ".
B. " your tail number 44XX credits card 21 days 02 month 13:56 19,089.59 yuan of consumption【Construction Bank】”.
C. " upper first happy festival time, but dessert is fragrant, the heart is also reunited, and people also reunites ".
D. " you reminds in China Mobile:The moon had moderate rain and gradually stopped tonight, and tomorrow is overcast to cloudy ".
E. " joyful No. 2 boxes 18 in hotel at night:00 point, king always comes ".
It is labeled according to the above-mentioned sample text information got, the mark is to distinguish classification, realizes the mark Mode can be manual mark or automatic mark.
Substantially it is to solidify the knowledge of industry specialists, realize classification and keyword during the mark Corresponding relation fix, so below carry out text message classification when, can be more accurate.The present embodiment is to above-mentioned sample Text message is divided into three major types not " bank transaction ", " engineering project ", " dinner party ", and specific mark is classified as follows shown in table 2.
The sample text information of table 2 and the corresponding relation of classification mark
Can also then it marked by pre-setting of all categories and digital corresponding relation for the mark of upper table 2 When marked by way of numeral in table.
In the present embodiment,, can only be according to required for business personnel when choosing for the selection of sample text information Class scope obtained, from create sample key word information collection in principle for, certainly The more the better, sample text Information is more, and the keyword of the sample key word information collection finally formed is more complete, but is related to work in practical operation The problem of measuring, the sample of sub-fraction can be only obtained when obtaining sample text information, so establishment obtains of all categories Sample key word information collection can not represent a kind of another characteristic completely.
Therefore, in order to get more complete classification sample key word information collection, obtained according to sample text information To after sample key word information collection of all categories, the extension of keyword is also carried out.
S202, it is extended according to the sorted sample key word information collection of step S201.
In this step, further subdivision specifically is made to the sample text information in of all categories after class point, preferably , keyword extraction is carried out to the sample text information in of all categories, obtaining the external world according to the keyword extracted carries the pass The text message of keyword.It is as shown in Figure 3 for the processing step of the expanded keyword information collection of the step:
S301, according to classification, extract the keyword in sample information of all categories.
Such as:From sample text information " your tail number 44XX credits card 21 days 02 month 13 in bank transaction classification:56 consumption 19,089.59 yuan【Construction Bank】" in extraction " consumption ", the word such as " bank " as keyword.
From sample text information " joyful No. 2 boxes 18 in hotel at night in dinner party classification:00 point, king always come " in extract The words such as " hotel ", " box " " next " are as keyword.
S302, the keyword in the sample information with the category is collected according to the keyword of each sample text information extraction Extraneous text message.
In this example, " bank transaction " is provided with, " dinner party ", " engineering project " these three topics, in practical application, often The expression way of the text message of individual classification is diversified, for each classification, it is necessary to a major class is excavated, such as The short message of " bank transaction ", the messaging format of different bank differ, at this moment, should be according to the pass extracted from sample information The short message with " consumption ", each big bank of " bank " keyword on keyword " consumption ", " bank " inquiry internet.
S303, other keywords in the text message that extraction step S302 is obtained be added to corresponding to classification keyword Information is concentrated.By the sample key word information collection such as table 3 below institute of all categories for extending and after obtained keyword collected, obtaining Show.
Sample key word information collection of all categories after the extension of table 3
S203, text message to be sorted is obtained, and treat classifying text information and segmented, extract key word information.Tool Body is after removing the punctuation mark in text message, and keyword segmentation is carried out according to the solar calendar order of the content of text message, Segmentation obtains at least one key word information.
S204, each key word information collection is matched with the key word information collection of corresponding text categories, output is corresponding Character string.
In this step, by carrying out matching inquiry according to the keyword of the text message after participle, and the text is exported The matching result of information, is comprised the following steps that:
Step A, after text message segments, represent that the keyword not occurred uses 0 using 1 for the keyword of appearance Represent.
Such as:" evening/joyful hotel/No. 2//box/18:00/ point/Wang is total/also/it is next ".Text has been carried out in this sentence After the participle technique of information, the matching of keyword, such as three classes above are carried out using key word information collection mentioned above Topic classification " bank transaction " " dinner party " " engineering project " key word information collection is 500, the keyword letter of that text information The output result of breath matching is exactly the vectorial expression way of 500 dimensions.
【Be transferred to income bank expenditure consumption withdrawal account credit card and produce contact ... dining it is small gather there is free box to meet and discuss wine The shop restaurant private room dinner party wine row Room is poly- to have a meal ..., and project payment beats money and beats account to account refund loan floatation ...】
To above-mentioned text message --- " evening/joyful hotel/No. 2//box/18:00/ point/Wang is total/also/to be come " progress Analysis, can recognize that the inside keyword " hotel " " box " " next " be in the presence of " dinner party " classification key word information concentration it is interior Hold.
500 dimension character strings be:【0 0 0 0 0 0 0 0 0……0 0 0 1 0 1 0 0 0 0 0 1……0 0 0 0……】
Step B, forms the keyword character string of the text message, and content is exactly above said 500 dimension character string:
【0 0 0 0 0 0 0 0 0……0 0 0 1 0 1 0 0 0 0 0 1……0 0 0 0……】
Dimension is how many, is expressed according to the text categories key word information collection of reality.
S205, the text categories information according to corresponding to the string analysis of output obtains the text message to be sorted, goes forward side by side Row classification is handled.
In the present embodiment, classification processing is being carried out to text message for step S205, specifically included:Learned according to advance Acquistion to disaggregated model carry out classification processing, the disaggregated model is by being learnt to obtain to sample text information, at this It is as follows to manage step:
Step 1, obtain according to each key word information collection and the key word information collection of corresponding text categories match it is defeated The character string gone out.
Step 2, study is trained to the character string according to the disaggregated model for learning to obtain in advance.
Step 3, the text categories information according to corresponding to obtaining the result of training study.
In the present embodiment, the disaggregated model learns to obtain particular by a under type, its processing step such as Fig. 4 institutes Show.
S401, obtain the key word information collection of the sample text information of each text categories.
S402, each key word information collection is matched with the key word information collection of corresponding text categories, output is corresponding Character string.
S403, inputted the character string of each sample text information as the study of model, study obtains corresponding classification Disaggregated model.
S404, corresponding relation will be established between the disaggregated model and text classification information.
In a present embodiment, it is preferred that learning using Random Forest model, the text of input is received in the model training stage The character string and text categories information of this information, using the method training pattern of stochastic gradient, treat that training misses as input sample Difference reaches certain threshold value, after model convergence, exports and preserves Random Forest model parameter.
Random forest is a kind of relatively conventional, preferable machine learning model of effect, and it is relatively simple using several Decision tree is trained simultaneously, and the classification results of each decision tree are voted according to the principle that the minority is subordinate to the majority, ballot knot Final output of the fruit as model.
In the present embodiment, except text information classification approach is applied into client, the information point of client is realized Outside class, the interface based on browser mode can also be applied to and access system, as shown in Figure 5:
S501, user pass through browser access user interface.
S502, user interface and WEB server carry out interacting message, issue corresponding order, include wound text categories Key word information collection and expanded keyword information collection, and visualized the result of analyzing and processing by interface.
S503, WEB server is actually assigned by what REST service was instructed, including Algorithm for Training, single text analyzing, Batch text analysis etc..
S504, in REST service and machine learning algorithm processing carries out Algorithm for Training, is entered using random forests algorithm model Row training.
S505, machine learning algorithm model training is carried out into multiple graders according to the classification of topic, it is convenient behind to text The use of this information classification processing.
S506, REST service carry out Analysis Service to the text to be analyzed using different graders.
S507, text analyzing learn eventually through cooperative operation above and internal algorithm, carry out information classification.
3rd embodiment:
It please join part Fig. 6, Fig. 6 is the process chart classified to wall scroll text message that the present embodiment provides, at it It is as follows to manage step:
" Nanjing tomorrow evening restaurant president's parlor is had a meal for S601, the input of wall scroll text message, such as input wall scroll text.”
S602, the text message of input is segmented, after removing punctuation mark, the text segmented:It is " bright My god/evening/Nanjing/restaurant/president/parlor/has a meal/".
S603, every text split into multiple words, and an above-mentioned provision is originally to have split into multiple words:" tomorrow " " evening " " Nanjing " " restaurant " " president " " parlor " " having a meal ".
S604, the extraction of keyword is carried out to text message, vectorization is carried out with the classification key word information collection in system Analysis, existing key word information collection represent that the key word information collection being not present is represented with 0 with 1.
Using classification key word information collection above, " restaurant " " parlor " " having a meal " so existing key word information is found Collection.Keyword character string is for example:
【0 0 0 0 0 0 0 0 0……0 0 0 0 0 0 10 1 0 0 0 1 0……0 0 0 0……】
S605, the character string of keyword is formed, can subsequently be used as classification analysis.
Character string is:
【0 0 0 0 0 0 0 0 0……0 0 0 0 0 0 10 1 0 0 0 1 0……0 0 0 0……】
S606, classification analysis is carried out to wall scroll text.
Such as input wall scroll text " is had a meal between the restaurant president of Nanjing tomorrow evening." as information can be parsed into and The relevant information of case, text categories are " dinner parties ".
As shown in fig. 7, the process chart classified to batch text information provided for the present embodiment, batch text Analysis is the circulation of the process of wall scroll text.Batch text analysis needs batch text upload process, and its processing step is as follows:
S701, by batch text uploaded information to analysis system.
Here simple example:
1st, " your tail number 65XX cards 2 days 16:13 outlets pay 130,000 yuan of (withdrawal), and 3,125.97 yuan of remaining sum can With 53,125.97 yuan of remaining sum.【Industrial and commercial bank】”.
2nd, " 1,000,000 funds arrive account "
3rd, " tomorrow evening, which has not been played, has a meal, place:Xing Lin wineshops, No. 1 parlor ".…………
S702, each text message is segmented.
S703, keyword vector table is formed, can subsequently be used as classification analysis.
S704, classification analysis is carried out to each text message.
For the batch text of example from above, with regard to three classifications " bank transaction " " dinner party " " engineering project " above can be carried out Be classified as follows shown in table 4.
The sorted text message mapping table of table 4
Under the basis that system learns with regard to automaton, mechanized classification has been carried out to emerging text message.
The implementation of the scheme provided by the present embodiment, solves most manually-operated mode in practical application, but In an automated manner, the method for establishing classification key word information collection, the method trained using machine learning realize the text of system Information classification is analyzed, and drastically increases operating efficiency.
Fourth embodiment:
Fig. 8 is referred to, Fig. 8 is the structural representation for the text message categorizing system that the present embodiment provides.The present embodiment carries The text message sorter 8 of confession includes:Acquisition module 81, extraction module 82, matching module 83 and sort module 84, wherein:
Acquisition module 81 is used to obtain text message to be sorted, it is preferred that the text message to be sorted got is included extremely A few text message, at least one text message can be one text classification or different text categories; In addition, text information can also be the text message being converted to by other kinds of information, e.g. voice, video information Etc., when the information got is voice, what is obtained during information classification is carried out is text message corresponding to the voice, Text message is converted the speech into particular by speech text identification plug-in unit.
Extraction module 82 is used to extract key word information collection, keyword letter from text message to be sorted according to preset rules Breath collection includes at least one key word information.
Matching module 83 is used to text categories be believed according to key word information collection, and default sample key word information collection The corresponding relation of breath, text categories information corresponding to matching keywords information collection.
Sort module 84 is classified for treating classifying text information according to the text categories information matched.
In the present embodiment, extraction module 82 is specifically basis point treating classifying text information extraction key word information collection Word treatment technology carries out the segmentation of word to the text message to be sorted, is divided into the text message to be sorted at least after segmentation One key word information, the key word information that collection segmentation obtains form the key word information collection of the text message to be sorted.It is excellent Choosing, the punctuation mark in the text message to be sorted is first removed, then order is carried out originally according to the text message to be sorted The segmentation of keyword.
In the present embodiment, can be integrated for above-mentioned each module with the processor on mobile terminal, passing through software Processor is marked off into the module with above-mentioned function.
In the present embodiment, described device also includes corresponding relation building module, for multiple samples to getting in advance This text message is classified, and extract classification after each text categories sample text information key word information, form sample Key word information collection;And by the sample key word information collection extracted from the sample text information of same text categories with Corresponding relation is established between text classification information.
In the present embodiment, when matching text categories information corresponding to the key word information of text message to be sorted, Default each sample key word information collection is inquired about with specific reference to key word information whether there is the text envelope to be sorted with module 83 The key word information of breath, if in the presence of in keyword mark corresponding to the default sample key word information concentration being currently queried For " presence ", others are labeled as " being not present ", finally after the completion of inquiry, can export a character string, know according to mark Text categories information is specifically and analyzed according to the character string of output corresponding to not going out, literary according to corresponding to obtaining analysis result This classification information, it is preferred that analyzed here character string with specific reference to the keyword letter collection acceptance of the bid for obtaining each text categories Note " exist " number classify, it, which is marked, more is more possible to.
In the present embodiment, each keyword that matching module 83 is particularly used for concentrating the key word information is believed Breath, sample key word information collection corresponding with default each text categories information are matched, and obtain and each sample is crucial One-to-one former first character string of word information collection obtains what is rearranged by each former first character string according to preset order Former second character string;Former first character string includes character 0 and/or character 1, and the sequence of positions where each character 0 and 1 In the sequence of positions that corresponding sample key word information is concentrated it is one-to-one, institute with each key word information of each text categories State character 0 and represent that the key word information of the text message to be sorted is not present in the sample key word information and concentrated, it is described Character 1 represents that the key word information of the text message to be sorted is present in the sample key word information and concentrated;According to obtaining Former character string identify text categories information corresponding to the key word information collection.
In the present embodiment, the system also includes correcting module;
Matching module 83 is used to be matched key word information with default sample key word information collection, and matching is treated The former character string of the key word information collection of classifying text information, the former character string include former first character string or former second character String;
Correct module to be used to correct the obtained former character string according to the disaggregated model for learning to obtain in advance, obtain The former character string is replaced to final character string, and by the final character string;
The text categories information according to corresponding to obtaining final character string of matching module 83.
In the present embodiment, correct module to be additionally operable to obtain disaggregated model by advance model training study, be specifically, Correct the key word information collection that module obtains the sample text information of each text categories;By each key word information collection and corresponding text The sample key word information collection of this classification is matched, character string corresponding to output;According to default training learning algorithm to word Symbol string carries out model training study study, and study obtains disaggregated model;Will foundation pair between disaggregated model and text classification information It should be related to.
In the present embodiment, module is corrected according to the different disaggregated model of the text string generation of all categories of output, for example, The present embodiment is provided with three classifications " bank transaction ", " dinner party ", " engineering project ", then corrects module and is carrying out model learning When respectively just formed three random forest disaggregated models.So, this disaggregated model can be used as new text message, case letter The analysis of breath.The information relevant with the case information relevant with this few class disaggregated model is automatically identified by system.
Under text message classification analysis pattern, single text analyzing can be carried out according to disaggregated model of all categories.To Dan Wen During this analysis, user can input a wall scroll text message, and the wall scroll text message is analyzed.
Such as input wall scroll text " is had a meal between the restaurant president of Nanjing tomorrow evening." as information can be parsed into and The relevant information of case, text categories are " dinner parties ".
In the present embodiment, batch text analysis can be also carried out, when analyzing batch text, uploads batch to be analyzed Text is measured, these texts are carried out with classification analysis, Category Relevance analysis.
The analysis of a variety of classification, such as sub-category analysis are carried out using a variety of disaggregated models, for example whether being that bank hands over Easily etc., it is suitable for plurality of application scenes.
To batch quantity analysis, good data preserve, and can download analysis report.User can easily obtain what is analyzed The classification situation of batch text.
In the present embodiment, each module for above-mentioned text message categorizing system realizes that each function can pass through The mode of program code to realize, read particular by the processor in terminal from memory prestore be used for realize The code of text message classification, the acquisition and classification that information can be achieved are performed to code compilation.
In summary, the text message sorting technique and its device that the embodiment of the present invention provides, by pre-setting text The sample key word information collection of this classification, and establish the sample key word information collection of text classification and pair of text categories information It should be related to, matching basis is provided when classifying text information is classified for follow-up treat, and to realize that Auto-matching is classified Provide possibility;Treat classifying text information carry out classification processing when, treat classifying text information extraction according to preset rules Key word information, text message pair to be sorted is matched with the corresponding relation of text categories according to default sample key word information collection The text categories information answered, the matching of text categories information is realized by keyword, simplifies the operating procedure of classification information, The operating efficiency of classification processing is also greatly enhanced simultaneously, while also improves the degree of accuracy of classification.
Obviously, those skilled in the art should be understood that each module of the embodiments of the present invention or each step can be used General computing device realizes that they can be concentrated on single computing device, or be distributed in multiple computing device institutes On the network of composition, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to by they It is stored in computer-readable storage medium (ROM/RAM, magnetic disc, CD) and is performed by computing device, and in some cases, can With to perform shown or described step different from order herein, or they are fabricated to each integrated circuit die respectively Block, or the multiple modules or step in them are fabricated to single integrated circuit module to realize.So the present invention does not limit Combined in any specific hardware and software.
Above content is to combine the further description that specific embodiment is made to the embodiment of the present invention, it is impossible to is recognized The specific implementation of the fixed present invention is confined to these explanations.For general technical staff of the technical field of the invention, Without departing from the inventive concept of the premise, some simple deduction or replace can also be made, should all be considered as belonging to the present invention Protection domain.

Claims (10)

1. a kind of text message sorting technique, including:
Obtain text message to be sorted;
Key word information collection is extracted from the text message to be sorted according to preset rules, the key word information collection is included extremely A few key word information;
According to the key word information collection, and default sample key word information collection and the corresponding relation of text categories information, Match text categories information corresponding to the key word information collection;
The text message to be sorted is classified according to the text categories information matched.
2. text message sorting technique according to claim 1, it is characterised in that described to be treated according to preset rules from described Key word information collection is extracted in classifying text information to be included:After removing the punctuation mark in the text message to be sorted, according to The script order of the content of the text message to be sorted carries out keyword segmentation, and segmentation obtains at least one key word information.
3. text message sorting technique according to claim 1, it is characterised in that also include obtaining institute in the following manner State the corresponding relation of sample key word information collection and text categories information:
The multiple sample text information got in advance are classified, and extract after classification each sample text in each text categories The key word information of this information, form the sample key word information collection;
The sample key word information collection extracted from the sample text information of same text categories and text classification are believed Corresponding relation is established between breath.
4. the text message sorting technique according to any one of claims 1 to 3, it is characterised in that described according to the pass Keyword information collection, and default sample key word information collection and the corresponding relation of text categories information, match the keyword Text categories information includes corresponding to information collection:
Each key word information that the key word information is concentrated, sample corresponding with default each text categories information close Keyword information collection is matched, and is obtained former first character string one-to-one with each sample key word information collection or is obtained by each The character string of original second that individual former first character string rearranges according to preset order;Former first character string include character 0 and/ Or character 1, and the sequence of positions where each character 0 and 1 is closed with each key word information of each text categories in corresponding sample The sequence of positions that keyword information is concentrated is one-to-one, and the character 0 represents the keyword letter of the text message to be sorted Breath is not present in the sample key word information and concentrated, and the character 1 represents the key word information of the text message to be sorted It is present in the sample key word information to concentrate;
The text categories information according to corresponding to obtained former character string identifies the key word information collection.
5. text message sorting technique according to claim 4, it is characterised in that obtain former first character described After string or former second character string, the former character string that the basis obtains identifies text categories corresponding to the key word information collection Before information, in addition to:
The obtained former character string is corrected according to the disaggregated model for learning to obtain in advance, obtains final character string, And the former character string is replaced by the final character string.
6. a kind of text message sorter, including:Acquisition module, extraction module, matching module and sort module;
The acquisition module is used to obtain text message to be sorted;
The extraction module is used to extract key word information collection, the pass from the text message to be sorted according to preset rules Keyword information collection includes at least one key word information;
The matching module is used for according to the key word information collection, and default sample key word information collection and text categories The corresponding relation of information, match text categories information corresponding to the key word information collection;
The sort module is used to classify to the text message to be sorted according to the text categories information matched.
7. text message sorter according to claim 6, it is characterised in that the extraction module is described for removing After punctuation mark in text message to be sorted, keyword is carried out according to the script order of the content of the text message to be sorted Segmentation, segmentation obtain at least one key word information.
8. text message sorter according to claim 6, it is characterised in that also include:Corresponding relation building module, For classifying to the multiple sample text information got in advance, and extract the sample text letter of each text categories after classification The key word information of breath, form the sample key word information collection;And by from the sample text information of same text categories In establish corresponding relation between the sample key word information collection that extracts and text classification information.
9. the text message sorter according to any one of claim 6 to 8, it is characterised in that the matching module is used In each key word information for concentrating the key word information, sample corresponding with default each text categories information is crucial Word information collection is matched, and is obtained former first character string one-to-one with each sample key word information collection or is obtained by each The character string of original second that former first character string rearranges according to preset order;Former first character string include character 0 and/or Character 1, and the sequence of positions where each character 0 and 1 is crucial in corresponding sample with each key word information of each text categories The sequence of positions that word information is concentrated is one-to-one, and the character 0 represents the key word information of the text message to be sorted It is not present in the sample key word information to concentrate, the character 1 represents that the key word information of the text message to be sorted is deposited It is that the sample key word information is concentrated;The text class according to corresponding to obtained former character string identifies the key word information collection Other information.
10. text message sorter according to claim 9, it is characterised in that also include:Module is corrected, for root The obtained former character string is corrected according to the disaggregated model for learning to obtain in advance, obtains final character string, and pass through The final character string replaces the former character string.
CN201610693358.6A 2016-08-19 2016-08-19 Text information classification method and device Active CN107766371B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610693358.6A CN107766371B (en) 2016-08-19 2016-08-19 Text information classification method and device
PCT/CN2017/093896 WO2018032937A1 (en) 2016-08-19 2017-07-21 Method and apparatus for classifying text information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610693358.6A CN107766371B (en) 2016-08-19 2016-08-19 Text information classification method and device

Publications (2)

Publication Number Publication Date
CN107766371A true CN107766371A (en) 2018-03-06
CN107766371B CN107766371B (en) 2023-11-17

Family

ID=61196311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610693358.6A Active CN107766371B (en) 2016-08-19 2016-08-19 Text information classification method and device

Country Status (2)

Country Link
CN (1) CN107766371B (en)
WO (1) WO2018032937A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549723A (en) * 2018-04-28 2018-09-18 北京神州泰岳软件股份有限公司 A kind of text concept sorting technique, device and server
CN108875067A (en) * 2018-06-29 2018-11-23 北京百度网讯科技有限公司 text data classification method, device, equipment and storage medium
CN109271481A (en) * 2018-08-31 2019-01-25 国网河北省电力有限公司沧州供电分公司 A kind of classification method, system and the terminal device of electric power demand information
CN109559744A (en) * 2018-12-12 2019-04-02 泰康保险集团股份有限公司 Processing method, device and the readable storage medium storing program for executing of voice data
CN109657010A (en) * 2018-10-30 2019-04-19 百度在线网络技术(北京)有限公司 Document processing method, device and storage medium
CN110046341A (en) * 2018-12-29 2019-07-23 中国银联股份有限公司 For carrying out matched method and system to information
CN110060317A (en) * 2019-03-16 2019-07-26 平安城市建设科技(深圳)有限公司 Poster method of automatic configuration, equipment, storage medium and device
CN110348021A (en) * 2019-07-17 2019-10-18 湖北亿咖通科技有限公司 Character string identification method, electronic equipment, storage medium based on name physical model
CN110413774A (en) * 2019-06-21 2019-11-05 厦门美域中央信息科技有限公司 A kind of information classification approach based on genetic algorithm
CN110795561A (en) * 2019-10-24 2020-02-14 北京华宇信息技术有限公司 Automatic identification system for electronic file material types and autonomous learning method thereof
CN110851598A (en) * 2019-10-30 2020-02-28 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
CN110941638A (en) * 2018-09-21 2020-03-31 武汉安天信息技术有限责任公司 Application classification rule base construction method, application classification method and device
CN110955796A (en) * 2019-11-26 2020-04-03 北京明略软件系统有限公司 Case characteristic information extraction method and device based on record information
CN111199170A (en) * 2018-11-16 2020-05-26 长鑫存储技术有限公司 Formula file identification method and device, electronic equipment and storage medium
CN111324735A (en) * 2020-02-20 2020-06-23 湖南芒果听见科技有限公司 Method and terminal for automatically classifying hourly essentials
CN111339290A (en) * 2018-11-30 2020-06-26 北京嘀嘀无限科技发展有限公司 Text classification method and system
CN111782601A (en) * 2020-06-08 2020-10-16 北京海泰方圆科技股份有限公司 Electronic file processing method and device, electronic equipment and machine readable medium
CN112364169A (en) * 2021-01-13 2021-02-12 北京云真信科技有限公司 Nlp-based wifi identification method, electronic device and medium
CN112417158A (en) * 2020-12-15 2021-02-26 中国联合网络通信集团有限公司 Training method, classification method, device and equipment of text data classification model
CN113254595A (en) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN113312913A (en) * 2021-07-30 2021-08-27 北京惠每云科技有限公司 Case book segmentation method and device, electronic device and readable storage medium
CN113486149A (en) * 2021-07-09 2021-10-08 深圳证券时报社有限公司 Keyword matching-based listed company announcement classification and emotion analysis method
WO2021243575A1 (en) * 2020-06-02 2021-12-09 深圳市欢太科技有限公司 Text information classification method, mobile terminal, and computer-readable storage medium
WO2022036998A1 (en) * 2020-08-20 2022-02-24 广东电网有限责任公司清远供电局 Power system violation management method and apparatus, and power device

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874534B (en) * 2018-08-31 2023-04-28 阿里巴巴集团控股有限公司 Data processing method and data processing device
CN109471922A (en) * 2018-09-29 2019-03-15 平安科技(深圳)有限公司 Case type recognition methods, device, equipment and medium based on deep learning model
CN109446525B (en) * 2018-10-26 2023-03-24 腾讯科技(深圳)有限公司 Text processing method and device, computer readable storage medium and computer equipment
CN111126053B (en) * 2018-10-31 2023-07-04 北京国双科技有限公司 Information processing method and related equipment
CN110750643B (en) * 2019-09-29 2024-02-09 上证所信息网络有限公司 Method, device and storage medium for classifying non-periodic announcements of marketing companies
CN111192692B (en) * 2020-01-02 2023-12-08 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111222316B (en) * 2020-01-03 2023-08-29 北京小米移动软件有限公司 Text detection method, device and storage medium
CN111428037B (en) * 2020-03-24 2022-09-20 合肥科捷通科技信息服务有限公司 Method for analyzing matching performance of behavior policy
CN111460149B (en) * 2020-03-27 2023-07-25 科大讯飞股份有限公司 Text classification method, related device and readable storage medium
CN111860657A (en) * 2020-07-23 2020-10-30 中国建设银行股份有限公司 Image classification method and device, electronic equipment and storage medium
CN112163088A (en) * 2020-09-02 2021-01-01 中国人民解放军战略支援部队信息工程大学 Method, system and equipment for mining short message user information of telecommunication network based on DenseNet
CN113609860B (en) * 2021-08-05 2023-09-19 湖南特能博世科技有限公司 Text segmentation method and device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000250925A (en) * 1999-02-26 2000-09-14 Matsushita Electric Ind Co Ltd Document retrieval and sorting method and device
JP2008027431A (en) * 2006-06-22 2008-02-07 Nec Corp Information analyzing apparatus, information analyzing method, and information analyzing program
CN101593200A (en) * 2009-06-19 2009-12-02 淮海工学院 Chinese Web page classification method based on the keyword frequency analysis
CN102279887A (en) * 2011-08-18 2011-12-14 北京百度网讯科技有限公司 Method, device and system for classifying documents
CN103324621A (en) * 2012-03-21 2013-09-25 北京百度网讯科技有限公司 Method and device for correcting spelling of Thai texts

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1741013A (en) * 2004-08-27 2006-03-01 英业达股份有限公司 Automatic adaptive system for customer service and its method
CN101184259B (en) * 2007-11-01 2010-06-23 浙江大学 Keyword automatically learning and updating method in rubbish short message
EP2332039A4 (en) * 2008-08-11 2012-12-05 Collective Inc Method and system for classifying text
CN103577423B (en) * 2012-07-23 2016-12-07 阿里巴巴集团控股有限公司 Keyword classification method and system
CN103294817A (en) * 2013-06-13 2013-09-11 华东师范大学 Text feature extraction method based on categorical distribution probability

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000250925A (en) * 1999-02-26 2000-09-14 Matsushita Electric Ind Co Ltd Document retrieval and sorting method and device
JP2008027431A (en) * 2006-06-22 2008-02-07 Nec Corp Information analyzing apparatus, information analyzing method, and information analyzing program
CN101593200A (en) * 2009-06-19 2009-12-02 淮海工学院 Chinese Web page classification method based on the keyword frequency analysis
CN102279887A (en) * 2011-08-18 2011-12-14 北京百度网讯科技有限公司 Method, device and system for classifying documents
CN103324621A (en) * 2012-03-21 2013-09-25 北京百度网讯科技有限公司 Method and device for correcting spelling of Thai texts

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549723B (en) * 2018-04-28 2022-04-05 北京神州泰岳软件股份有限公司 Text concept classification method and device and server
CN108549723A (en) * 2018-04-28 2018-09-18 北京神州泰岳软件股份有限公司 A kind of text concept sorting technique, device and server
CN108875067A (en) * 2018-06-29 2018-11-23 北京百度网讯科技有限公司 text data classification method, device, equipment and storage medium
CN109271481A (en) * 2018-08-31 2019-01-25 国网河北省电力有限公司沧州供电分公司 A kind of classification method, system and the terminal device of electric power demand information
CN110941638A (en) * 2018-09-21 2020-03-31 武汉安天信息技术有限责任公司 Application classification rule base construction method, application classification method and device
CN110941638B (en) * 2018-09-21 2023-09-08 武汉安天信息技术有限责任公司 Application classification rule base construction method, application classification method and device
CN109657010A (en) * 2018-10-30 2019-04-19 百度在线网络技术(北京)有限公司 Document processing method, device and storage medium
CN111199170B (en) * 2018-11-16 2022-04-01 长鑫存储技术有限公司 Formula file identification method and device, electronic equipment and storage medium
CN111199170A (en) * 2018-11-16 2020-05-26 长鑫存储技术有限公司 Formula file identification method and device, electronic equipment and storage medium
CN111339290A (en) * 2018-11-30 2020-06-26 北京嘀嘀无限科技发展有限公司 Text classification method and system
CN109559744A (en) * 2018-12-12 2019-04-02 泰康保险集团股份有限公司 Processing method, device and the readable storage medium storing program for executing of voice data
CN110046341A (en) * 2018-12-29 2019-07-23 中国银联股份有限公司 For carrying out matched method and system to information
CN110046341B (en) * 2018-12-29 2023-06-09 中国银联股份有限公司 Method and system for matching information
CN110060317A (en) * 2019-03-16 2019-07-26 平安城市建设科技(深圳)有限公司 Poster method of automatic configuration, equipment, storage medium and device
CN110413774A (en) * 2019-06-21 2019-11-05 厦门美域中央信息科技有限公司 A kind of information classification approach based on genetic algorithm
CN110348021A (en) * 2019-07-17 2019-10-18 湖北亿咖通科技有限公司 Character string identification method, electronic equipment, storage medium based on name physical model
CN110795561A (en) * 2019-10-24 2020-02-14 北京华宇信息技术有限公司 Automatic identification system for electronic file material types and autonomous learning method thereof
CN110851598A (en) * 2019-10-30 2020-02-28 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
CN110955796A (en) * 2019-11-26 2020-04-03 北京明略软件系统有限公司 Case characteristic information extraction method and device based on record information
CN110955796B (en) * 2019-11-26 2023-05-02 北京明略软件系统有限公司 Case feature information extraction method and device based on stroke information
CN111324735A (en) * 2020-02-20 2020-06-23 湖南芒果听见科技有限公司 Method and terminal for automatically classifying hourly essentials
WO2021243575A1 (en) * 2020-06-02 2021-12-09 深圳市欢太科技有限公司 Text information classification method, mobile terminal, and computer-readable storage medium
CN111782601A (en) * 2020-06-08 2020-10-16 北京海泰方圆科技股份有限公司 Electronic file processing method and device, electronic equipment and machine readable medium
WO2022036998A1 (en) * 2020-08-20 2022-02-24 广东电网有限责任公司清远供电局 Power system violation management method and apparatus, and power device
CN112417158A (en) * 2020-12-15 2021-02-26 中国联合网络通信集团有限公司 Training method, classification method, device and equipment of text data classification model
CN112364169B (en) * 2021-01-13 2022-03-04 北京云真信科技有限公司 Nlp-based wifi identification method, electronic device and medium
CN112364169A (en) * 2021-01-13 2021-02-12 北京云真信科技有限公司 Nlp-based wifi identification method, electronic device and medium
CN113254595A (en) * 2021-06-22 2021-08-13 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN113486149A (en) * 2021-07-09 2021-10-08 深圳证券时报社有限公司 Keyword matching-based listed company announcement classification and emotion analysis method
CN113312913A (en) * 2021-07-30 2021-08-27 北京惠每云科技有限公司 Case book segmentation method and device, electronic device and readable storage medium

Also Published As

Publication number Publication date
WO2018032937A1 (en) 2018-02-22
CN107766371B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN107766371A (en) A kind of text message sorting technique and its device
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN108280064A (en) Participle, part-of-speech tagging, Entity recognition and the combination treatment method of syntactic analysis
CN107562918A (en) A kind of mathematical problem knowledge point discovery and batch label acquisition method
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN103577989B (en) A kind of information classification approach and information classifying system based on product identification
CN106776538A (en) The information extracting method of enterprise's noncanonical format document
CN106095928A (en) A kind of event type recognition methods and device
CN107301170A (en) The method and apparatus of cutting sentence based on artificial intelligence
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN111274814B (en) Novel semi-supervised text entity information extraction method
CN107343223A (en) The recognition methods of video segment and device
CN109740159B (en) Processing method and device for named entity recognition
CN107145573A (en) The problem of artificial intelligence customer service robot, answers method and system
WO2021036439A1 (en) Method for responding to complaint, and device
CN107357785A (en) Theme feature word abstracting method and system, feeling polarities determination methods and system
CN106682236A (en) Machine learning based patent data processing method and processing system adopting same
CN113434688B (en) Data processing method and device for public opinion classification model training
CN109947934A (en) For the data digging method and system of short text
CN106997339A (en) Text feature, file classification method and device
CN115858758A (en) Intelligent customer service knowledge graph system with multiple unstructured data identification
CN111144116B (en) Document knowledge structured extraction method and device
CN110209772B (en) Text processing method, device and equipment and readable storage medium
CN112287240A (en) Case microblog evaluation object extraction method and device based on double-embedded multilayer convolutional neural network
CN111967267A (en) XLNET-based news text region extraction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant