CN108804512A - Generating means, method and the computer readable storage medium of textual classification model - Google Patents

Generating means, method and the computer readable storage medium of textual classification model Download PDF

Info

Publication number
CN108804512A
CN108804512A CN201810361702.0A CN201810361702A CN108804512A CN 108804512 A CN108804512 A CN 108804512A CN 201810361702 A CN201810361702 A CN 201810361702A CN 108804512 A CN108804512 A CN 108804512A
Authority
CN
China
Prior art keywords
word
word segmentation
dictionary
candidate
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810361702.0A
Other languages
Chinese (zh)
Other versions
CN108804512B (en
Inventor
王健宗
吴天博
黄章成
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810361702.0A priority Critical patent/CN108804512B/en
Priority to PCT/CN2018/102400 priority patent/WO2019200806A1/en
Publication of CN108804512A publication Critical patent/CN108804512A/en
Application granted granted Critical
Publication of CN108804512B publication Critical patent/CN108804512B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a kind of generating means of textual classification model, including memory and processor, it is stored with the model generator that can be run on a processor on memory, which realizes following steps when being executed by processor:Obtain the dictionary for word segmentation of financial field and the corpus of text of financial field;Candidate neologisms are selected to be added to dictionary for word segmentation from corpus of text;It obtains sample set and classification mark is carried out to the training sample in sample set;Based on the dictionary for word segmentation for being added to candidate neologisms, the training sample in sample set is segmented using preset segmentation methods and extracts term vector, based on adaboost algorithms, the classification information of term vector and mark is input to training in multiple Weak Classifiers, obtains textual classification model.The present invention also proposes a kind of generation method of textual classification model and a kind of computer readable storage medium.The present invention solves the problems, such as to cannot achieve the classification for carrying out financial field text emotion tendency in the prior art.

Description

Generating means, method and the computer readable storage medium of textual classification model
Technical field
The present invention relates to Text Classification field more particularly to a kind of generating means of textual classification model, method and Computer readable storage medium.
Background technology
With the development of internet and information technology, more and more mechanisms are with individual by internet approach with various sides Formula delivers the viewpoint, attitude and position to various things, such as various news analysis, forum and social network sites.These magnanimity Information for the various aspects such as e-commerce, market prediction have certain commercial value, especially financial industry, be interconnection Net information increases most fast, impacted maximum industry, therefore, Sentiment orientation analysis is carried out to carry out more to financial text message In-depth study is increasingly becoming important topic.
Emotion tendentiousness of text analysis is to belong to a part for text emotion analysis, can be with by emotional orientation analysis Grasp this paper and pass judgement on sexual orientation, for financial field, news public sentiment be embody market and industry prosperity degree and The important indicator of the transaction enthusiasm of investor, therefore, when to the analysis of the emotion tendency of the text of financial field for finance Long influence of the research with play staff's weight, but also lack realization in the prior art and Sentiment orientation is carried out to financial field text Classification scheme, lead to not to realize the classification that emotion tendency is carried out to financial field text.
Invention content
The present invention provides a kind of generating means of textual classification model, method and computer readable storage medium, main Purpose is to propose a kind of generating means of the textual classification model for the Sentiment orientation classification can be used for financial field text, with It solves the problems, such as to cannot achieve the classification for carrying out financial field text emotion tendency in the prior art.
To achieve the above object, the present invention provides a kind of generating means of textual classification model, which includes memory And processor, the model generator that can be run on the processor is stored in the memory, the model generates journey Sequence realizes following steps when being executed by the processor:
Obtain the dictionary for word segmentation of the financial field of the financial field vocabulary structure based on collection and preset financial field Corpus of text;
Candidate neologisms are selected from the corpus of text according to preset algorithm, are added to the dictionary for word segmentation;
Sample set is obtained, classification mark is carried out to the training sample in the sample set according to default Sentiment orientation classification mode Note;
Based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods to the instruction in the sample set Practice sample and carries out word segmentation processing;
Term vector is extracted according to word segmentation result, adaboost algorithms are based on, by the corresponding term vector of training sample and mark Classification information be input in preset multiple Weak Classifiers and be trained, multiple Weak Classifiers that training obtain are combined as gold Melt the textual classification model in field.
Optionally, described to select candidate neologisms from the corpus of text according to preset algorithm, it is added to the participle word The step of allusion quotation includes:
Based on the dictionary for word segmentation, word segmentation processing is carried out to the corpus of text using the segmentation methods, according to described Word segmentation result obtains candidate word set;
The information gain of each candidate word in the candidate word set is calculated, information gain is selected to be more than the first predetermined threshold value Candidate word as the first candidate neologisms, the described first candidate neologisms are added in the dictionary for word segmentation;
Based on the dictionary for word segmentation for being added to the described first candidate neologisms, using the segmentation methods to the corpus of text into Row participle, and train term vector model using the corpus of text after word segmentation processing;
The semantic phase of the word and the described first candidate neologisms in word segmentation result is calculated using the term vector model that training obtains Like degree;
Semantic similarity is more than the word of the second predetermined threshold value as the second candidate neologisms, and will the second candidate neologisms It is added in the dictionary for word segmentation.
Optionally, the processor can also be used to execute the model generator, with described that semantic similarity is big In the second predetermined threshold value word as the second candidate neologisms, and the step of the described second candidate neologisms are added to institute's predicate dictionary Later, following steps are also realized:
Word frequency of the described second candidate neologisms in corpus of text is calculated, and using the word frequency being calculated as second time Select weight of the neologisms in the dictionary for word segmentation.
Optionally, the acquisition sample set, according to default Sentiment orientation classification mode to the training sample in the sample set The step of this progress classification mark includes:
Sample set is obtained, and obtains multiple mark people according to default Sentiment orientation classification mode to the training sample in sample set Originally the multiple markup informations being labeled, from the multiple markup information, the most markup information of selection occurrence number Annotation results as corresponding training sample.
Optionally, the Weak Classifier includes grader based on convolutional neural networks algorithm, is based on Recognition with Recurrent Neural Network The grader of algorithm and grader based on shot and long term memory network algorithm.
In addition, to achieve the above object, the present invention also provides a kind of generation method of textual classification model, this method packets It includes:
Obtain the dictionary for word segmentation of the financial field of the financial field vocabulary structure based on collection and preset financial field Corpus of text;
Candidate neologisms are selected from the corpus of text according to preset algorithm, are added to the dictionary for word segmentation;
Sample set is obtained, classification mark is carried out to the training sample in the sample set according to default Sentiment orientation classification mode Note;
Based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods to the instruction in the sample set Practice sample and carries out word segmentation processing;
Term vector is extracted according to word segmentation result, adaboost algorithms are based on, by the corresponding term vector of training sample and mark Classification information be input in preset multiple Weak Classifiers and be trained, multiple Weak Classifiers that training obtain are combined as gold Melt the textual classification model in field.
Optionally, described to select candidate neologisms from the corpus of text according to preset algorithm, it is added to the participle word The step of allusion quotation includes:
Based on the dictionary for word segmentation, word segmentation processing is carried out to the corpus of text using the segmentation methods, according to described Word segmentation result obtains candidate word set;
The information gain of each candidate word in the candidate word set is calculated, information gain is selected to be more than the first predetermined threshold value Candidate word as the first candidate neologisms, the described first candidate neologisms are added in the dictionary for word segmentation;
Based on the dictionary for word segmentation for being added to the described first candidate neologisms, using the segmentation methods to the corpus of text into Row participle, and train term vector model using the corpus of text after word segmentation processing;
The semantic phase of the word and the described first candidate neologisms in word segmentation result is calculated using the term vector model that training obtains Like degree;
Semantic similarity is more than the word of the second predetermined threshold value as the second candidate neologisms, and will the second candidate neologisms It is added in the dictionary for word segmentation.
Optionally, described that semantic similarity is more than the word of the second predetermined threshold value as the second candidate neologisms, and will be described After the step of second candidate neologisms are added to institute's predicate dictionary, the method further includes step:
Word frequency of the described second candidate neologisms in corpus of text is calculated, and using the word frequency being calculated as second time Select weight of the neologisms in the dictionary for word segmentation.
Optionally, the acquisition sample set, according to default Sentiment orientation classification mode to the training sample in the sample set The step of this progress classification mark includes:
Sample set is obtained, and obtains multiple mark people according to default Sentiment orientation classification mode to the training sample in sample set Originally the multiple markup informations being labeled, from the multiple markup information, the most markup information of selection occurrence number Annotation results as corresponding training sample.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Model generator is stored on storage medium, the model generator can be executed by one or more processor, with reality Now the step of generation method of textual classification model as described above.
Generating means, method and the computer readable storage medium of textual classification model proposed by the present invention, based on collection Financial field vocabulary structure financial field dictionary for word segmentation, the corpus of text of preset financial field is obtained, according to text language Material obtains candidate word set, and candidate neologisms is selected to be added to dictionary for word segmentation from candidate word set.Sample set is obtained, according to default Sentiment orientation classification mode carries out classification mark to the training sample in sample set.Based on the participle word for being added to candidate neologisms Allusion quotation carries out word segmentation processing to the training sample in sample set using preset segmentation methods, term vector is extracted according to word segmentation result, Based on adaboost algorithms, the classification information of the corresponding term vector of training sample and mark is input to preset multiple weak typings It is trained in device, multiple Weak Classifiers that training obtains is combined as to the textual classification model of financial field.The side of the present invention In case, by the corpus of text excavation to financial field, filters out candidate neologisms and be added in dictionary for word segmentation, by updated Dictionary for word segmentation is to the sample word segmentation processing in sample set, and according to default Sentiment orientation classification mode to the sample number in sample set According to classification mark is carried out, final training obtains textual classification model, which can be applied to the Sentiment orientation point of financial field Class problem.
Description of the drawings
Fig. 1 is the schematic diagram of the generating means preferred embodiment of textual classification model of the present invention;
Fig. 2 illustrates for the program module of model generator in one embodiment of generating means of textual classification model of the present invention Figure;
Fig. 3 is the flow chart of the generation method preferred embodiment of textual classification model of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific implementation mode
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of generating means of textual classification model.It is textual classification model of the present invention shown in referring to Fig.1 Generating means preferred embodiment schematic diagram.
In the present embodiment, the generating means of textual classification model can be PC (Personal Computer, personal electricity Brain), can also be the terminal devices such as smart mobile phone, tablet computer, pocket computer.The generating means 1 of text disaggregated model Including at least memory 11, processor 12, communication bus 13 and network interface 14.
Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory, Hard disk, multimedia card, card-type memory (for example, SD or DX memories etc.), magnetic storage, disk, CD etc..Memory 11 Can be the internal storage unit of the generating means 1 of textual classification model, such as text disaggregated model in some embodiments Generating means 1 hard disk.Memory 11 can also be the outer of the generating means 1 of textual classification model in further embodiments The plug-in type hard disk being equipped in portion's storage device, such as the generating means 1 of textual classification model, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, Memory 11 can also both include textual classification model generating means 1 internal storage unit and also including External memory equipment. Memory 11 can be not only used for the application software and Various types of data that storage is installed on the generating means 1 of textual classification model, example Such as code of model generator 01 can be also used for temporarily storing the data that has exported or will export.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11 Code or processing data, such as execute model generator 01 etc..
Communication bus 13 is for realizing the connection communication between these components.
Network interface 14 may include optionally standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in Communication connection is generated between the device 1 and other electronic equipments.
Fig. 1 illustrates only the generating means 1 of the textual classification model with component 11-14 and model generator 01, It should be understood that be not required for implementing all components shown, the implementation that can be substituted is more or less component.
Optionally, which can also include user interface, and user interface may include display (Display), input Unit such as keyboard (Keyboard), optional user interface can also include standard wireline interface and wireless interface.It is optional Ground, in some embodiments, display can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..Wherein, display can also be appropriate Referred to as display screen or display unit, for being shown in the information handled in the generating means 1 of textual classification model and for showing Show visual user interface.
In 1 embodiment of device shown in Fig. 1, model generator 01 is stored in memory 11;Processor 12 executes Following steps are realized when the model generator 01 stored in memory 11:
A1, the dictionary for word segmentation for obtaining the financial field that the financial field vocabulary based on collection is built and preset finance The corpus of text in field.
First, full field dictionary for word segmentation is obtained, on the basis of full field dictionary for word segmentation, the financial field of collection is added Vocabulary constitutes financial field dictionary for word segmentation.Wherein, the vocabulary source of financial field includes mainly following three classes:Financial field profession Term, such as " William's index ", " Moving Average ", " transferable bond " etc.;Financial forum's term, such as the use in some speculation in stocks forums Term of the family when commenting on stock;Cyberspeak and special symbol, such as " rubbish stock " etc. applied to financial field.
A2, candidate neologisms are selected from the corpus of text according to preset algorithm, is added to the dictionary for word segmentation.
On the basis of above-mentioned dictionary for word segmentation, candidate neologisms is selected to be added to current participle word in expecting from new text In allusion quotation.Specifically, step A2 includes:
A21, it is based on the dictionary for word segmentation, word segmentation processing is carried out to the corpus of text using the segmentation methods, according to The word segmentation result obtains candidate word set;A22, the information gain for calculating each candidate word in the candidate word set, selection Information gain is more than the candidate word of the first predetermined threshold value as the first candidate neologisms, the described first candidate neologisms is added to described In dictionary for word segmentation;A23, based on the dictionary for word segmentation for being added to the described first candidate neologisms, using the segmentation methods to the text This language material is segmented, and trains term vector model using the corpus of text after word segmentation processing;A24, the word obtained using training Vector model calculates the semantic similarity of word and the described first candidate neologisms in word segmentation result;A25, semantic similarity is more than The word of second predetermined threshold value is added to as the second candidate neologisms, and by the described second candidate neologisms in the dictionary for word segmentation.
Obtain the corpus of text for expanding dictionary for word segmentation.Specifically, it is grabbed from financial web site by the way of web crawlers The a large amount of and to be analyzed relevant money article text message of financial theme is taken, corpus of text is formed.To the data crawled It is pre-processed, removes the garbages such as mess code symbol wherein included, web escape symbols, retain text data as text Language material.Next, by way of manually marking a large amount of text datas in corpus of text are carried out with the classification of Sentiment orientation, i.e., Classification markup information is added for text data.
Using current dictionary for word segmentation as the dictionary of preset segmentation methods, word segmentation processing is carried out to corpus of text, then, The stop words in word segmentation result is filtered out according to preset stop words vocabulary, to remove the unrelated vocabulary in result, by remaining Word segmentation result constitutes candidate word set.The classification of the corresponding corresponding text data of classification markup information of word segmentation result marks Information is consistent.
Next, candidate neologisms are selected from candidate word set according to information gain, wherein information gain is that one kind is based on The appraisal procedure of entropy, when being used for feature selecting, measurement be the appearance of some word whether to judging whether a text belongs to The information content that some class is provided;It is defined as the difference that front and back information content occurs in a document in a certain characteristic value, calculation formula For:
In above-mentioned formula, P (Cj) indicate classification CjThe probability occurred in data set, P (ti) indicate characteristic item tiIt appears in Probability in data set, P (Cj|ti) indicate characteristic item tiIt appears in and is determined as classification CjDocument in probability,It indicates Characteristic item tiThe probability not occurred,Indicate characteristic item tiIt appears in and is not belonging to classification CjDocument in probability, | C | it is the sum of classification.Wherein, classification refers to the classification of Sentiment orientation, and characteristic item is candidate word.Above-mentioned probability value can lead to The statistical conditions to candidate word in corpus of text are crossed to be calculated.
The useful degree of candidate word is judged according to the information gain being calculated, the value of information gain is bigger, then to classification It is more useful.The candidate word that information gain in candidate word set is more than the first predetermined threshold value is added to as the first candidate neologisms In current dictionary for word segmentation, the expansion to dictionary for word segmentation is realized.
Based on the above-mentioned dictionary for word segmentation expanded by vocabulary, using same segmentation methods to above-mentioned same corpus of text Word segmentation processing is carried out, word segmentation result is obtained, term vector model is trained using the corpus of text after word segmentation processing, is obtained using training Term vector model calculate the term vector of each word that participle obtains, the word first that word segmentation processing obtains is calculated according to term vector The semantic similarity of candidate neologisms, will be from as the second candidate neologisms if semantic similarity is more than the second predetermined threshold value The second candidate word selected in word segmentation result is added in dictionary for word segmentation, realizes the expansion again to dictionary for word segmentation.
It is understood that after carrying out word segmentation processing to corpus of text using segmentation methods, stop words vocabulary can be passed through The stop words in word segmentation result is deleted, because these stop words noises are big and meaningless to text classification, deleting these words can To improve the order of accuarcy of text classification, while reducing calculation amount when selecting candidate neologisms.
By above-mentioned steps, the expansion three times to dictionary for word segmentation is indeed achieved, for the first time by artificially collecting The mode that mode obtains financial field vocabulary carries out preliminary expansion, is to select neologisms by calculating information gain for the second time, the It is that neologisms are selected by term vector computing semantic similarity again three times.In addition, it is upper one that second is expanded with third time The expansion again carried out on the basis of dictionary for word segmentation after secondary expansion.By way of above-mentioned Dynamic expansion, as much as possible from New financial field vocabulary is screened in language material.Point of the segmentation methods of dictionary for word segmentation for the training sample of disaggregated model is expanded Word, the financial field vocabulary in dictionary for word segmentation is abundanter, then more accurate to the word segmentation result of financial field text, what training obtained The classification accuracy of disaggregated model is also higher.
Optionally, as an implementation, after having selected the second candidate neologisms and being added in dictionary for word segmentation, meter Calculate word frequency of the second candidate neologisms in corpus of text, and the power using word frequency as the second candidate neologisms in dictionary for word segmentation Weight.Same mode may be used for the first candidate neologisms and calculate word frequency and as its weight in dictionary for word segmentation.
A3, sample set is obtained, class is carried out to the training sample in the sample set according to default Sentiment orientation classification mode It does not mark.
The sample set for training text disaggregated model is obtained, for each training data in sample set, is obtained more A mark people, to multiple markup informations of each training data, and selects multiple marks according to default Sentiment orientation classification mode Annotation results of the most markup information of occurrence number as the training data in information.Wherein, user can be according to be analyzed Monetary affair corresponding Sentiment orientation classification mode is set, for example, the text in stock forum is divided into hold, sell and It buys in;It is positive, passive and neutral that stock in microblogging or forum is discussed that text is divided into;Financial and economic news text is divided into Positive, negative sense and neutrality etc..
A4, based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods in the sample set Training sample carry out word segmentation processing.
A5, term vector is extracted according to word segmentation result, adaboost algorithms is based on, by the corresponding term vector of training sample and mark The classification information of note is input in preset multiple Weak Classifiers and is trained, and multiple Weak Classifiers that training obtains are combined as The textual classification model of financial field.
After completing to the mark of training sample, based on the dictionary for word segmentation by repeatedly expanding, default participle is used Algorithm uses the term vector of the term vector model extraction word segmentation result after training for training sample word segmentation processing.It needs to illustrate , the segmentation methods used in the scheme of the present embodiment are always the same algorithm.
In the present embodiment, in order to improve the accuracy of textual classification model, word2vec models and Glove are used respectively (Global Vectors for word representation) model extraction term vector, each word segmentation result can obtain Two kinds of term vectors.In addition, in the present embodiment by based on convolutional neural networks algorithm grader, be based on Recognition with Recurrent Neural Network algorithm Grader and grader based on shot and long term memory network algorithm as Weak Classifier.For each Weak Classifier, respectively Using above two term vector as input, then six weak typing models can essentially be built.Based on adaboost algorithms, use The each Weak Classifier of sample training in sample set.In the training process, if certain sample is accurately classified, It constructs in next sample set, reduces the weights of the sample;If certain sample is not classified accurately, one under construction In sample set, the weights of the sample are improved.The sample set that right value update is crossed be used to train next grader, entirely train Journey is so made iteratively down.In addition, after the training process of each Weak Classifier, small weak point of error in classification rate is increased The weight of class device makes it play larger decisive action in final classification function, and reduces weak point of error in classification rate greatly The weight of class device makes it play smaller decisive action in final classification function.Iteratively training is each as procedure described above A Weak Classifier.The Weak Classifier fusion that each training is obtained, as final textual classification model.The text is classified Model can be used for carrying out financial field text the classification of Sentiment orientation, for judging that the stock in forum discusses that text is to disappear Pole, positive or neutrality etc..
The generating means for the textual classification model that the present embodiment proposes, by the corpus of text excavation to financial field, to the greatest extent New financial field word is screened in slave language material more than possible, is added in dictionary for word segmentation, is realized to financial field dictionary for word segmentation Expansion, and word segmentation processing is carried out to the training sample in sample set using the dictionary for word segmentation after financial vocabulary has been expanded, and Classification mark is carried out to the sample data in sample set according to default Sentiment orientation classification mode, final training obtains text classification Model, the model can be applied to the Sentiment orientation classification problem of financial field.
Optionally, in other examples, model generator can also be divided into one or more module, and one A or multiple modules are stored in memory 11, and are held by one or more processors (the present embodiment is by processor 12) For row to complete the present invention, the so-called module of the present invention is the series of computation machine program instruction section for referring to complete specific function, Implementation procedure of the program in the generating means of textual classification model is generated for descriptive model.
Shown in Fig. 2, journey is generated for the model in one embodiment of generating means of textual classification model of the present invention The program module schematic diagram of sequence, in the embodiment, model generator can be divided into data acquisition module 10, neologisms selection Module 20, sample labeling module 30, sample word-dividing mode 40 and model training module 50, illustratively:
Data acquisition module 10 is used for:Obtain the participle word of the financial field of the financial field vocabulary structure based on collection The corpus of text of allusion quotation and preset financial field;
Neologisms selecting module 20 is used for:Candidate neologisms are selected from the corpus of text according to preset algorithm, are added to institute State dictionary for word segmentation;
Sample labeling module 30 is used for:Sample set is obtained, according to default Sentiment orientation classification mode in the sample set Training sample carry out classification mark;
Sample word-dividing mode 40 is used for:Based on the dictionary for word segmentation for being added to candidate neologisms, calculated using preset participle Method carries out word segmentation processing to the training sample in the sample set;
Model training module 50 is used for:Term vector is extracted according to word segmentation result, adaboost algorithms are based on, by training sample The classification information of corresponding term vector and mark is input in preset multiple Weak Classifiers and is trained, and training is obtained more A Weak Classifier is combined as the textual classification model of financial field.
Above-mentioned data acquisition module 10, neologisms selecting module 20, sample labeling module 30, sample word-dividing mode 40 and model The program modules such as training module 50 are performed realized functions or operations step and are substantially the same with above-described embodiment, herein not It repeats again.
In addition, the present invention also provides a kind of generation methods of textual classification model.It is text of the present invention with reference to shown in Fig. 3 The flow chart of the generation method preferred embodiment of disaggregated model.This method can be executed by a device, which can be by soft Part and/or hardware realization.
In the present embodiment, the generation method of textual classification model includes:
Step S10 obtains the dictionary for word segmentation of the financial field of the financial field vocabulary structure based on collection and preset The corpus of text of financial field.
First, full field dictionary for word segmentation is obtained, on the basis of full field dictionary for word segmentation, the financial field of collection is added Vocabulary constitutes financial field dictionary for word segmentation.Wherein, the vocabulary source of financial field includes mainly following three classes:Financial field profession Term, such as " William's index ", " Moving Average ", " transferable bond " etc.;Financial forum's term, such as the use in some speculation in stocks forums Term of the family when commenting on stock;Cyberspeak and special symbol, such as " rubbish stock " etc. applied to financial field.
Step S20 selects candidate neologisms from the corpus of text according to preset algorithm, is added to the dictionary for word segmentation.
On the basis of above-mentioned dictionary for word segmentation, candidate neologisms is selected to be added to current participle word in expecting from new text In allusion quotation.Specifically, step S20 includes:Based on the dictionary for word segmentation, the corpus of text is divided using the segmentation methods Word processing obtains candidate word set according to the word segmentation result;The information for calculating each candidate word in the candidate word set increases Benefit selects the candidate word that information gain is more than the first predetermined threshold value to add the described first candidate neologisms as the first candidate neologisms It is added in the dictionary for word segmentation;Based on the dictionary for word segmentation for being added to the described first candidate neologisms, using the segmentation methods to institute It states corpus of text to be segmented, and term vector model is trained using the corpus of text after word segmentation processing;The word obtained using training Vector model calculates the semantic similarity of word and the described first candidate neologisms in word segmentation result;Semantic similarity is more than second The word of predetermined threshold value is added to as the second candidate neologisms, and by the described second candidate neologisms in the dictionary for word segmentation.
Obtain the corpus of text for expanding dictionary for word segmentation.Specifically, it is grabbed from financial web site by the way of web crawlers The a large amount of and to be analyzed relevant money article text message of financial theme is taken, corpus of text is formed.To the data crawled It is pre-processed, removes the garbages such as mess code symbol wherein included, web escape symbols, retain text data as text Language material.Next, by way of manually marking a large amount of text datas in corpus of text are carried out with the classification of Sentiment orientation, i.e., Classification markup information is added for text data.
Using current dictionary for word segmentation as the dictionary of preset segmentation methods, word segmentation processing is carried out to corpus of text, then, The stop words in word segmentation result is filtered out according to preset stop words vocabulary, to remove the unrelated vocabulary in result, by remaining Word segmentation result constitutes candidate word set.The classification of the corresponding corresponding text data of classification markup information of word segmentation result marks Information is consistent.
Next, candidate neologisms are selected from candidate word set according to information gain, wherein information gain is that one kind is based on The appraisal procedure of entropy, when being used for feature selecting, measurement be the appearance of some word whether to judging whether a text belongs to The information content that some class is provided;It is defined as the difference that front and back information content occurs in a document in a certain characteristic value, calculation formula For:
In above-mentioned formula, P (Cj) indicate classification CjThe probability occurred in data set, P (ti) indicate characteristic item tiIt appears in Probability in data set, P (Cj|ti) indicate characteristic item tiIt appears in and is determined as classification CjDocument in probability,It indicates Characteristic item tiThe probability not occurred,Indicate characteristic item tiIt appears in and is not belonging to classification CjDocument in probability, | C | it is the sum of classification.Wherein, classification refers to the classification of Sentiment orientation, and characteristic item is candidate word.Above-mentioned probability value can lead to The statistical conditions to candidate word in corpus of text are crossed to be calculated.
The useful degree of candidate word is judged according to the information gain being calculated, the value of information gain is bigger, then to classification It is more useful.The candidate word that information gain in candidate word set is more than the first predetermined threshold value is added to as the first candidate neologisms In current dictionary for word segmentation, the expansion to dictionary for word segmentation is realized.
Based on the above-mentioned dictionary for word segmentation expanded by vocabulary, using same segmentation methods to above-mentioned same corpus of text Word segmentation processing is carried out, word segmentation result is obtained, term vector model is trained using the corpus of text after word segmentation processing, is obtained using training Term vector model calculate the term vector of each word that participle obtains, the word first that word segmentation processing obtains is calculated according to term vector The semantic similarity of candidate neologisms, will be from as the second candidate neologisms if semantic similarity is more than the second predetermined threshold value The second candidate word selected in word segmentation result is added in dictionary for word segmentation, realizes the expansion again to dictionary for word segmentation.
It is understood that after carrying out word segmentation processing to corpus of text using segmentation methods, stop words vocabulary can be passed through The stop words in word segmentation result is deleted, because these stop words noises are big and meaningless to text classification, deleting these words can To improve the order of accuarcy of text classification, while reducing calculation amount when selecting candidate neologisms.
By above-mentioned steps, the expansion three times to dictionary for word segmentation is indeed achieved, for the first time by artificially collecting The mode that mode obtains financial field vocabulary carries out preliminary expansion, is to select neologisms by calculating information gain for the second time, the It is that neologisms are selected by term vector computing semantic similarity again three times.In addition, it is upper one that second is expanded with third time The expansion again carried out on the basis of dictionary for word segmentation after secondary expansion.By way of above-mentioned Dynamic expansion, as much as possible from New financial field vocabulary is screened in language material.Point of the segmentation methods of dictionary for word segmentation for the training sample of disaggregated model is expanded Word, the financial field vocabulary in dictionary for word segmentation is abundanter, then more accurate to the word segmentation result of financial field text, what training obtained The classification accuracy of disaggregated model is also higher.
Optionally, as an implementation, after having selected the second candidate neologisms and being added in dictionary for word segmentation, meter Calculate word frequency of the second candidate neologisms in corpus of text, and the power using word frequency as the second candidate neologisms in dictionary for word segmentation Weight.Same mode may be used for the first candidate neologisms and calculate word frequency and as its weight in dictionary for word segmentation.
Step S30, obtain sample set, according to default Sentiment orientation classification mode to the training sample in the sample set into Row classification marks.
The sample set for training text disaggregated model is obtained, for each training data in sample set, is obtained more A mark people, to multiple markup informations of each training data, and selects multiple marks according to default Sentiment orientation classification mode Annotation results of the most markup information of occurrence number as the training data in information.Wherein, user can be according to be analyzed Monetary affair corresponding Sentiment orientation classification mode is set, hold, sell and buy for example, stock forum text is divided into Enter;It is positive, passive and neutral that microblogging stock discussion text is divided into;By financial and economic news text be divided into positive, negative sense and in Property etc..
Step S40, based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods to the sample The training sample of concentration carries out word segmentation processing.
Step S50, according to word segmentation result extract term vector, be based on adaboost algorithms, by the corresponding word of training sample to Amount and the classification information of mark are input in preset multiple Weak Classifiers and are trained, multiple Weak Classifiers that training is obtained It is combined as the textual classification model of financial field.
After completing to the mark of training sample, based on the dictionary for word segmentation by repeatedly expanding, default participle is used Algorithm uses the term vector of the term vector model extraction word segmentation result after training for training sample word segmentation processing.It needs to illustrate , the segmentation methods used in the scheme of the present embodiment are always the same algorithm.
In the present embodiment, in order to improve the accuracy of textual classification model, word2vec models and Glove are used respectively Model extraction term vector, each word segmentation result can obtain two kinds of term vectors.In addition, convolutional Neural will be based in the present embodiment The grader of network algorithm, the grader based on Recognition with Recurrent Neural Network algorithm and the grader based on shot and long term memory network algorithm As Weak Classifier.For each Weak Classifier, respectively using above two term vector as input, then can essentially build Six weak typing models.Based on adaboost algorithms, each Weak Classifier of sample training in sample set is used.In training process In, if certain sample is accurately classified, under construction in a sample set, reduce the weights of the sample;If Certain sample is not classified accurately, then improving the weights of the sample in a sample set under construction.The sample that right value update is crossed This collection be used to that next grader, entire training process be trained so to be made iteratively down.In addition, each Weak Classifier After training process, increase the small Weak Classifier of error in classification rate weight, make its played in final classification function compared with Big decisive action, and reduce the weight of the big Weak Classifier of error in classification rate, make its played in final classification function compared with Small decisive action.Each Weak Classifier is iteratively trained as procedure described above.The Weak Classifier fusion that each training is obtained Get up, as final textual classification model.Text disaggregated model can be used for carrying out Sentiment orientation to financial field text Classification, for judge the stock in forum discuss text be passive, positive or neutral etc..
The generation method for the textual classification model that the present embodiment proposes, by the corpus of text excavation to financial field, to the greatest extent New financial field word is screened in slave language material more than possible, is added in dictionary for word segmentation, is realized to financial field dictionary for word segmentation Expansion, and word segmentation processing is carried out to the training sample in sample set using the dictionary for word segmentation after financial vocabulary has been expanded, and Classification mark is carried out to the sample data in sample set according to default Sentiment orientation classification mode, final training obtains text classification Model, the model can be applied to the Sentiment orientation classification problem of financial field.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium On be stored with model generator, the model generator can be executed by one or more processors, to realize following operation:
Obtain the dictionary for word segmentation of the financial field of the financial field vocabulary structure based on collection and preset financial field Corpus of text;
Candidate neologisms are selected from the corpus of text according to preset algorithm, are added to the dictionary for word segmentation;
Sample set is obtained, classification mark is carried out to the training sample in the sample set according to default Sentiment orientation classification mode Note;
Based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods to the instruction in the sample set Practice sample and carries out word segmentation processing;
Term vector is extracted according to word segmentation result, adaboost algorithms are based on, by the corresponding term vector of training sample and mark Classification information be input in preset multiple Weak Classifiers and be trained, multiple Weak Classifiers that training obtain are combined as gold Melt the textual classification model in field.
Generating means and side of the computer readable storage medium specific implementation mode of the present invention with above-mentioned textual classification model Each embodiment of method is essentially identical, does not make tired state herein.
It should be noted that the embodiments of the present invention are for illustration only, can not represent the quality of embodiment.And The terms "include", "comprise" herein or any other variant thereof is intended to cover non-exclusive inclusion, so that packet Process, device, article or the method for including a series of elements include not only those elements, but also include being not explicitly listed Other element, or further include for this process, device, article or the intrinsic element of method.Do not limiting more In the case of, the element that is limited by sentence "including a ...", it is not excluded that in the process including the element, device, article Or there is also other identical elements in method.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words does the prior art Going out the part of contribution can be expressed in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions use so that a station terminal equipment (can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of generating means of textual classification model, which is characterized in that described device includes memory and processor, described to deposit The model generator that can be run on the processor is stored on reservoir, the model generator is held by the processor Following steps are realized when row:
Obtain the dictionary for word segmentation of the financial field of the financial field vocabulary structure based on collection and the text of preset financial field This language material;
Candidate neologisms are selected from the corpus of text according to preset algorithm, are added to the dictionary for word segmentation;
Sample set is obtained, classification mark is carried out to the training sample in the sample set according to default Sentiment orientation classification mode;
Based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods to the training sample in the sample set This progress word segmentation processing;
Term vector is extracted according to word segmentation result, adaboost algorithms are based on, by the class of training sample corresponding term vector and mark It is trained in other information input to preset multiple Weak Classifiers, multiple Weak Classifiers that training obtains is combined as financial neck The textual classification model in domain.
2. the generating means of textual classification model as described in claim 1, which is characterized in that it is described according to preset algorithm from institute The step of stating and select candidate neologisms in corpus of text, being added to the dictionary for word segmentation include:
Based on the dictionary for word segmentation, word segmentation processing is carried out to the corpus of text using the segmentation methods, according to the participle As a result candidate word set is obtained;
The information gain of each candidate word in the candidate word set is calculated, information gain is selected to be more than the time of the first predetermined threshold value It selects word as the first candidate neologisms, the described first candidate neologisms is added in the dictionary for word segmentation;
Based on the dictionary for word segmentation for being added to the described first candidate neologisms, the corpus of text is divided using the segmentation methods Word, and train term vector model using the corpus of text after word segmentation processing;
The semantic similarity of the word and the described first candidate neologisms in word segmentation result is calculated using the term vector model that training obtains;
The word that semantic similarity is more than to the second predetermined threshold value is added as the second candidate neologisms, and by the described second candidate neologisms Into the dictionary for word segmentation.
3. the generating means of textual classification model as claimed in claim 2, which is characterized in that the processor can also be used to hold The row model generator, semantic similarity is more than the word of the second predetermined threshold value as the second candidate neologisms described, And after the step of the described second candidate neologisms are added to institute's predicate dictionary, also realize following steps:
Word frequency of the described second candidate neologisms in corpus of text is calculated, and the word frequency being calculated is new as second candidate Weight of the word in the dictionary for word segmentation.
4. the generating means of textual classification model as claimed any one in claims 1 to 3, which is characterized in that the acquisition Sample set, according to default Sentiment orientation classification mode in the sample set training sample carry out classification mark the step of wrap It includes:
Obtain sample set, and obtain multiple mark people according to default Sentiment orientation classification mode to the training sample in sample set into Multiple markup informations that rower is noted, from the multiple markup information, the markup information that selects occurrence number most as The annotation results of corresponding training sample.
5. the generating means of textual classification model as claimed any one in claims 1 to 3, which is characterized in that described weak point Class device include the grader based on convolutional neural networks algorithm, the grader based on Recognition with Recurrent Neural Network algorithm and be based on shot and long term The grader of memory network algorithm.
6. a kind of generation method of textual classification model, which is characterized in that the method includes:
Obtain the dictionary for word segmentation of the financial field of the financial field vocabulary structure based on collection and the text of preset financial field This language material;
Candidate neologisms are selected from the corpus of text according to preset algorithm, are added to the dictionary for word segmentation;
Sample set is obtained, classification mark is carried out to the training sample in the sample set according to default Sentiment orientation classification mode;
Based on the dictionary for word segmentation for being added to candidate neologisms, using preset segmentation methods to the training sample in the sample set This progress word segmentation processing;
Term vector is extracted according to word segmentation result, adaboost algorithms are based on, by the class of training sample corresponding term vector and mark It is trained in other information input to preset multiple Weak Classifiers, multiple Weak Classifiers that training obtains is combined as financial neck The textual classification model in domain.
7. the generation method of textual classification model as claimed in claim 6, which is characterized in that it is described according to preset algorithm from institute The step of stating and select candidate neologisms in corpus of text, being added to the dictionary for word segmentation include:
Based on the dictionary for word segmentation, word segmentation processing is carried out to the corpus of text using the segmentation methods, according to the participle As a result candidate word set is obtained;
The information gain of each candidate word in the candidate word set is calculated, information gain is selected to be more than the time of the first predetermined threshold value It selects word as the first candidate neologisms, the described first candidate neologisms is added in the dictionary for word segmentation;
Based on the dictionary for word segmentation for being added to the described first candidate neologisms, the corpus of text is divided using the segmentation methods Word, and train term vector model using the corpus of text after word segmentation processing;
The semantic similarity of the word and the described first candidate neologisms in word segmentation result is calculated using the term vector model that training obtains;
The word that semantic similarity is more than to the second predetermined threshold value is added as the second candidate neologisms, and by the described second candidate neologisms Into the dictionary for word segmentation.
8. the generation method of textual classification model as claimed in claim 7, which is characterized in that described to be more than semantic similarity The word of second predetermined threshold value as the second candidate neologisms, and the step of the described second candidate neologisms are added to institute's predicate dictionary it Afterwards, the method further includes step:
Word frequency of the described second candidate neologisms in corpus of text is calculated, and the word frequency being calculated is new as second candidate Weight of the word in the dictionary for word segmentation.
9. the generation method of the textual classification model as described in any one of claim 6 to 8, which is characterized in that the acquisition Sample set, according to default Sentiment orientation classification mode in the sample set training sample carry out classification mark the step of wrap It includes:
Obtain sample set, and obtain multiple mark people according to default Sentiment orientation classification mode to the training sample in sample set into Multiple markup informations that rower is noted, from the multiple markup information, the markup information that selects occurrence number most as The annotation results of corresponding training sample.
10. a kind of computer readable storage medium, which is characterized in that be stored with model life on the computer readable storage medium At program, the model generator can be executed by one or more processor, to realize as any in claim 6 to 9 The step of generation method of textual classification model described in.
CN201810361702.0A 2018-04-20 2018-04-20 Text classification model generation device and method and computer readable storage medium Active CN108804512B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810361702.0A CN108804512B (en) 2018-04-20 2018-04-20 Text classification model generation device and method and computer readable storage medium
PCT/CN2018/102400 WO2019200806A1 (en) 2018-04-20 2018-08-27 Device for generating text classification model, method, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810361702.0A CN108804512B (en) 2018-04-20 2018-04-20 Text classification model generation device and method and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN108804512A true CN108804512A (en) 2018-11-13
CN108804512B CN108804512B (en) 2020-11-24

Family

ID=64093733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810361702.0A Active CN108804512B (en) 2018-04-20 2018-04-20 Text classification model generation device and method and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN108804512B (en)
WO (1) WO2019200806A1 (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299276A (en) * 2018-11-15 2019-02-01 阿里巴巴集团控股有限公司 One kind converting the text to word insertion, file classification method and device
CN109614499A (en) * 2018-11-22 2019-04-12 阿里巴巴集团控股有限公司 A kind of dictionary generating method, new word discovery method, apparatus and electronic equipment
CN109685156A (en) * 2018-12-30 2019-04-26 浙江新铭智能科技有限公司 A kind of acquisition methods of the classifier of mood for identification
CN109684634A (en) * 2018-12-17 2019-04-26 北京百度网讯科技有限公司 Sentiment analysis method, apparatus, equipment and storage medium
CN109741190A (en) * 2018-12-27 2019-05-10 清华大学 A kind of method, system and the equipment of the classification of personal share bulletin
CN109783632A (en) * 2019-02-15 2019-05-21 腾讯科技(深圳)有限公司 Customer service information-pushing method, device, computer equipment and storage medium
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109871889A (en) * 2019-01-31 2019-06-11 内蒙古工业大学 Mass psychology appraisal procedure under emergency event
CN109871444A (en) * 2019-01-16 2019-06-11 北京邮电大学 A kind of file classification method and system
CN110008464A (en) * 2019-01-02 2019-07-12 阿里巴巴集团控股有限公司 Construction method, device, server and the readable storage medium storing program for executing of business dictionary
CN110059187A (en) * 2019-04-10 2019-07-26 华侨大学 A kind of deep learning file classification method of integrated shallow semantic anticipation mode
CN110188204A (en) * 2019-06-11 2019-08-30 腾讯科技(深圳)有限公司 A kind of extension corpora mining method, apparatus, server and storage medium
CN110210028A (en) * 2019-05-30 2019-09-06 杭州远传新业科技有限公司 For domain feature words extracting method, device, equipment and the medium of speech translation text
CN110232914A (en) * 2019-05-20 2019-09-13 平安普惠企业管理有限公司 A kind of method for recognizing semantics, device and relevant device
CN110347821A (en) * 2019-05-29 2019-10-18 华东理工大学 A kind of method, electronic equipment and the readable storage medium storing program for executing of text categories mark
CN110457475A (en) * 2019-07-25 2019-11-15 阿里巴巴集团控股有限公司 A kind of method and system expanded for text classification system construction and mark corpus
CN110489556A (en) * 2019-08-22 2019-11-22 重庆锐云科技有限公司 Quality evaluating method, device, server and storage medium about follow-up record
CN110597958A (en) * 2019-09-12 2019-12-20 苏州思必驰信息科技有限公司 Text classification model training and using method and device
CN110674289A (en) * 2019-07-04 2020-01-10 南瑞集团有限公司 Method, device and storage medium for judging article belonged classification based on word segmentation weight
CN110704581A (en) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 Computer-executed text emotion analysis method and device
CN110990567A (en) * 2019-11-25 2020-04-10 国家电网有限公司 Electric power audit text classification method for enhancing domain features
CN111104510A (en) * 2019-11-15 2020-05-05 南京中新赛克科技有限责任公司 Word embedding-based text classification training sample expansion method
CN111144097A (en) * 2019-12-25 2020-05-12 华中科技大学鄂州工业技术研究院 Modeling method and device for emotion tendency classification model of dialog text
CN111143569A (en) * 2019-12-31 2020-05-12 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN111159589A (en) * 2019-12-30 2020-05-15 中国银联股份有限公司 Classification dictionary establishing method, merchant data classification method, device and equipment
CN111177378A (en) * 2019-12-20 2020-05-19 北京淇瑀信息科技有限公司 Text mining method and device and electronic equipment
CN111177403A (en) * 2019-12-16 2020-05-19 恩亿科(北京)数据科技有限公司 Sample data processing method and device
CN111325033A (en) * 2020-03-20 2020-06-23 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111339268A (en) * 2020-02-19 2020-06-26 北京百度网讯科技有限公司 Entity word recognition method and device
CN111368555A (en) * 2020-05-27 2020-07-03 腾讯科技(深圳)有限公司 Data identification method and device, storage medium and electronic equipment
CN111401030A (en) * 2018-12-28 2020-07-10 北京嘀嘀无限科技发展有限公司 Service abnormity identification method, device, server and readable storage medium
CN111444326A (en) * 2020-03-30 2020-07-24 腾讯科技(深圳)有限公司 Text data processing method, device, equipment and storage medium
CN111523308A (en) * 2020-03-18 2020-08-11 大箴(杭州)科技有限公司 Chinese word segmentation method and device and computer equipment
CN111782803A (en) * 2020-06-05 2020-10-16 京东数字科技控股有限公司 Work order processing method and device, electronic equipment and storage medium
CN112417860A (en) * 2020-12-08 2021-02-26 携程计算机技术(上海)有限公司 Training sample enhancement method, system, device and storage medium
CN112445907A (en) * 2019-09-02 2021-03-05 顺丰科技有限公司 Text emotion classification method, device and equipment and storage medium
CN112579768A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Emotion classification model training method, text emotion classification method and text emotion classification device
CN112632971A (en) * 2020-12-18 2021-04-09 上海明略人工智能(集团)有限公司 Word vector training method and system for entity matching
CN112926631A (en) * 2021-02-01 2021-06-08 大箴(杭州)科技有限公司 Financial text classification method and device and computer equipment
CN113051401A (en) * 2021-04-06 2021-06-29 明品云(北京)数据科技有限公司 Text structured labeling method, system, device and medium
WO2021134524A1 (en) * 2019-12-31 2021-07-08 深圳市欢太科技有限公司 Data processing method, apparatus, electronic device, and storage medium
CN113111175A (en) * 2020-04-28 2021-07-13 北京明亿科技有限公司 Extreme behavior identification method, device, equipment and medium based on deep learning model
CN113177109A (en) * 2021-05-27 2021-07-27 中国平安人寿保险股份有限公司 Text weak labeling method, device, equipment and storage medium
CN113240485A (en) * 2021-05-10 2021-08-10 北京沃东天骏信息技术有限公司 Training method of text generation model, and text generation method and device
CN113642678A (en) * 2021-10-12 2021-11-12 南京山猫齐动信息技术有限公司 Method, device and storage medium for generating confrontation message sample
CN113723114A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 Semantic analysis method, device and equipment based on multi-intent recognition and storage medium
CN114091469A (en) * 2021-11-23 2022-02-25 杭州萝卜智能技术有限公司 Sample expansion based network public opinion analysis method
CN115861606A (en) * 2022-05-09 2023-03-28 北京中关村科金技术有限公司 Method and device for classifying long-tail distribution documents and storage medium

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110879934B (en) * 2019-10-31 2023-05-23 杭州电子科技大学 Text prediction method based on Wide & Deep learning model
CN110837732B (en) * 2019-10-31 2024-01-26 北京奇艺世纪科技有限公司 Method and device for identifying intimacy between target persons, electronic equipment and storage medium
CN111125323B (en) * 2019-11-21 2024-01-19 腾讯科技(深圳)有限公司 Chat corpus labeling method and device, electronic equipment and storage medium
CN112861533A (en) * 2019-11-26 2021-05-28 阿里巴巴集团控股有限公司 Entity word recognition method and device
CN111046177A (en) * 2019-11-26 2020-04-21 方正璞华软件(武汉)股份有限公司 Automatic arbitration case prejudging method and device
CN110991612A (en) * 2019-11-29 2020-04-10 交通银行股份有限公司 Message analysis method of international routine real-time reasoning model based on word vector
CN110968702B (en) * 2019-11-29 2023-05-09 北京明略软件系统有限公司 Method and device for extracting rational relation
CN111078546B (en) * 2019-12-05 2023-06-16 北京云聚智慧科技有限公司 Page feature expression method and electronic equipment
CN111078883A (en) * 2019-12-13 2020-04-28 北京明略软件系统有限公司 Risk index analysis method and device, electronic equipment and storage medium
CN111191119B (en) * 2019-12-16 2023-12-12 绍兴市上虞区理工高等研究院 Neural network-based scientific and technological achievement self-learning method and device
CN112989032A (en) * 2019-12-17 2021-06-18 医渡云(北京)技术有限公司 Entity relationship classification method, apparatus, medium and electronic device
CN111309855A (en) * 2019-12-24 2020-06-19 中国银行股份有限公司 Text information processing method and system
CN113302683B (en) * 2019-12-24 2023-08-04 深圳市优必选科技股份有限公司 Multi-tone word prediction method, disambiguation method, device, apparatus, and computer-readable storage medium
CN113052191A (en) * 2019-12-26 2021-06-29 航天信息股份有限公司 Training method, device, equipment and medium of neural language network model
CN111125317A (en) * 2019-12-27 2020-05-08 携程计算机技术(上海)有限公司 Model training, classification, system, device and medium for conversational text classification
CN111221950A (en) * 2019-12-30 2020-06-02 航天信息股份有限公司 Method and device for analyzing weak emotion of user
CN111259148B (en) * 2020-01-19 2024-03-26 北京小米松果电子有限公司 Information processing method, device and storage medium
CN111309859B (en) * 2020-01-21 2023-07-07 上饶市中科院云计算中心大数据研究院 Scenic spot network public praise emotion analysis method and device
CN111310464B (en) * 2020-02-17 2024-02-02 北京明略软件系统有限公司 Word vector acquisition model generation method and device and word vector acquisition method and device
CN111325562B (en) * 2020-02-17 2023-08-01 武汉轻工大学 Grain safety tracing system and method
CN111367962B (en) * 2020-02-28 2024-01-30 北京金堤科技有限公司 Database updating method and device, computer readable storage medium and electronic equipment
CN113449097A (en) * 2020-03-24 2021-09-28 百度在线网络技术(北京)有限公司 Method and device for generating countermeasure sample, electronic equipment and storage medium
CN111309920B (en) * 2020-03-26 2023-03-24 清华大学深圳国际研究生院 Text classification method, terminal equipment and computer readable storage medium
CN111680225B (en) * 2020-04-26 2023-08-18 国家计算机网络与信息安全管理中心 WeChat financial message analysis method and system based on machine learning
CN111652281B (en) * 2020-04-30 2023-08-18 中国平安财产保险股份有限公司 Information data classification method, device and readable storage medium
CN111680155A (en) * 2020-05-13 2020-09-18 新华网股份有限公司 Text classification method and device, electronic equipment and computer storage medium
CN111737993B (en) * 2020-05-26 2024-04-02 浙江华云电力工程设计咨询有限公司 Method for extracting equipment health state from fault defect text of power distribution network equipment
CN111709233B (en) * 2020-05-27 2023-04-18 西安交通大学 Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network
CN111601314B (en) * 2020-05-27 2023-04-28 北京亚鸿世纪科技发展有限公司 Method and device for double judging bad short message by pre-training model and short message address
CN111680804B (en) * 2020-06-02 2023-09-01 中国电力科学研究院有限公司 Method, equipment and computer readable medium for generating operation checking work ticket
CN111680803B (en) * 2020-06-02 2023-09-01 中国电力科学研究院有限公司 Operation checking work ticket generation system
CN111832292B (en) * 2020-06-03 2024-02-02 北京百度网讯科技有限公司 Text recognition processing method, device, electronic equipment and storage medium
CN113761882A (en) * 2020-06-08 2021-12-07 北京沃东天骏信息技术有限公司 Dictionary construction method and device
CN111737999A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Sequence labeling method, device and equipment and readable storage medium
CN111783451A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and apparatus for enhancing text samples
CN114004234A (en) * 2020-07-28 2022-02-01 深圳Tcl数字技术有限公司 Semantic recognition method, storage medium and terminal equipment
CN111930942B (en) * 2020-08-07 2023-08-15 腾讯云计算(长沙)有限责任公司 Text classification method, language model training method, device and equipment
CN111966944B (en) * 2020-08-17 2024-04-09 中电科大数据研究院有限公司 Model construction method for multi-level user comment security audit
CN112015895A (en) * 2020-08-26 2020-12-01 广东电网有限责任公司 Patent text classification method and device
CN112016319B (en) * 2020-09-08 2023-12-15 平安科技(深圳)有限公司 Pre-training model acquisition and disease entity labeling method, device and storage medium
CN112101015B (en) * 2020-09-08 2024-01-26 腾讯科技(深圳)有限公司 Method and device for identifying multi-label object
CN114281928A (en) * 2020-09-28 2022-04-05 中国移动通信集团广西有限公司 Model generation method, device and equipment based on text data
CN113392209B (en) * 2020-10-26 2023-09-19 腾讯科技(深圳)有限公司 Text clustering method based on artificial intelligence, related equipment and storage medium
CN112287639A (en) * 2020-10-30 2021-01-29 上海中通吉网络技术有限公司 Intelligent customer service work order classification method
CN112529743B (en) * 2020-12-18 2023-08-08 平安银行股份有限公司 Contract element extraction method, device, electronic equipment and medium
CN112650837B (en) * 2020-12-28 2023-12-12 上海秒针网络科技有限公司 Text quality control method and system combining classification algorithm and unsupervised algorithm
CN112765936B (en) * 2020-12-31 2024-02-23 出门问问(武汉)信息科技有限公司 Training method and device for operation based on language model
CN112784061A (en) * 2021-01-27 2021-05-11 数贸科技(北京)有限公司 Knowledge graph construction method and device, computing equipment and storage medium
CN112948573B (en) * 2021-02-05 2024-04-02 北京百度网讯科技有限公司 Text label extraction method, device, equipment and computer storage medium
CN112948583A (en) * 2021-02-26 2021-06-11 中国光大银行股份有限公司 Data classification method and device, storage medium and electronic device
CN113011183B (en) * 2021-03-23 2023-09-05 北京科东电力控制系统有限责任公司 Unstructured text data processing method and system in electric power regulation and control field
CN113033198B (en) * 2021-03-25 2022-08-26 平安国际智慧城市科技股份有限公司 Similar text pushing method and device, electronic equipment and computer storage medium
CN113032573B (en) * 2021-04-30 2024-01-23 同方知网数字出版技术股份有限公司 Large-scale text classification method and system combining topic semantics and TF-IDF algorithm
CN113377965B (en) * 2021-06-30 2024-02-23 中国农业银行股份有限公司 Method and related device for sensing text keywords
CN113627530B (en) * 2021-08-11 2023-09-15 中国平安人寿保险股份有限公司 Similar problem text generation method, device, equipment and medium
CN114090601B (en) * 2021-11-23 2023-11-03 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN114638195B (en) * 2022-01-21 2022-11-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-task learning-based ground detection method
CN114443849B (en) * 2022-02-09 2023-10-27 北京百度网讯科技有限公司 Labeling sample selection method and device, electronic equipment and storage medium
CN116307792B (en) * 2022-10-12 2024-03-12 广州市阿尔法软件信息技术有限公司 Urban physical examination subject scene-oriented evaluation method and device
CN115952290B (en) * 2023-03-09 2023-06-02 太极计算机股份有限公司 Case characteristic labeling method, device and equipment based on active learning and semi-supervised learning
CN116361463B (en) * 2023-03-27 2023-12-08 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) Earthquake disaster information extraction method, device, equipment and medium
CN117093715B (en) * 2023-10-18 2023-12-29 湖南财信数字科技有限公司 Word stock expansion method, system, computer equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142913A (en) * 2013-05-07 2014-11-12 株式会社日立制作所 Distinguishing method and distinguishing system for polarities of words and expressions
CN104331506A (en) * 2014-11-20 2015-02-04 北京理工大学 Multiclass emotion analyzing method and system facing bilingual microblog text
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN106598940A (en) * 2016-11-01 2017-04-26 四川用联信息技术有限公司 Text similarity solution algorithm based on global optimization of keyword quality
CN107122382A (en) * 2017-02-16 2017-09-01 江苏大学 A kind of patent classification method based on specification
WO2017202125A1 (en) * 2016-05-25 2017-11-30 华为技术有限公司 Text classification method and apparatus
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
US20180032508A1 (en) * 2016-07-28 2018-02-01 Abbyy Infopoisk Llc Aspect-based sentiment analysis using machine learning methods
CN107729374A (en) * 2017-09-13 2018-02-23 厦门快商通科技股份有限公司 A kind of extending method of sentiment dictionary and text emotion recognition methods

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023967A (en) * 2010-11-11 2011-04-20 清华大学 Text emotion classifying method in stock field
US8676730B2 (en) * 2011-07-11 2014-03-18 Accenture Global Services Limited Sentiment classifiers based on feature extraction
CN103559174B (en) * 2013-09-30 2016-03-09 东软集团股份有限公司 Semantic emotion classification characteristic value extraction and system
SG11201704150WA (en) * 2014-11-24 2017-06-29 Agency Science Tech & Res A method and system for sentiment classification and emotion classification
CN106547738B (en) * 2016-11-02 2019-05-07 北京亿美软通科技有限公司 A kind of overdue short message intelligent method of discrimination of financial class based on text mining

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142913A (en) * 2013-05-07 2014-11-12 株式会社日立制作所 Distinguishing method and distinguishing system for polarities of words and expressions
CN104331506A (en) * 2014-11-20 2015-02-04 北京理工大学 Multiclass emotion analyzing method and system facing bilingual microblog text
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
WO2017202125A1 (en) * 2016-05-25 2017-11-30 华为技术有限公司 Text classification method and apparatus
US20180032508A1 (en) * 2016-07-28 2018-02-01 Abbyy Infopoisk Llc Aspect-based sentiment analysis using machine learning methods
CN106598940A (en) * 2016-11-01 2017-04-26 四川用联信息技术有限公司 Text similarity solution algorithm based on global optimization of keyword quality
CN107122382A (en) * 2017-02-16 2017-09-01 江苏大学 A kind of patent classification method based on specification
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN107729374A (en) * 2017-09-13 2018-02-23 厦门快商通科技股份有限公司 A kind of extending method of sentiment dictionary and text emotion recognition methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
爱暖手的苦咖啡: ""adaboost"", 《HTTPS://BAIKE.BAIDU.COM/HISTORY/ADABOOST/4531273/122627277》 *

Cited By (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299276B (en) * 2018-11-15 2021-11-19 创新先进技术有限公司 Method and device for converting text into word embedding and text classification
CN109299276A (en) * 2018-11-15 2019-02-01 阿里巴巴集团控股有限公司 One kind converting the text to word insertion, file classification method and device
CN109614499A (en) * 2018-11-22 2019-04-12 阿里巴巴集团控股有限公司 A kind of dictionary generating method, new word discovery method, apparatus and electronic equipment
CN109614499B (en) * 2018-11-22 2023-02-17 创新先进技术有限公司 Dictionary generation method, new word discovery method, device and electronic equipment
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN109684634A (en) * 2018-12-17 2019-04-26 北京百度网讯科技有限公司 Sentiment analysis method, apparatus, equipment and storage medium
CN109684634B (en) * 2018-12-17 2023-07-25 北京百度网讯科技有限公司 Emotion analysis method, device, equipment and storage medium
CN109741190A (en) * 2018-12-27 2019-05-10 清华大学 A kind of method, system and the equipment of the classification of personal share bulletin
CN111401030A (en) * 2018-12-28 2020-07-10 北京嘀嘀无限科技发展有限公司 Service abnormity identification method, device, server and readable storage medium
CN111401030B (en) * 2018-12-28 2024-01-09 北京嘀嘀无限科技发展有限公司 Method and device for identifying service abnormality, server and readable storage medium
CN109685156A (en) * 2018-12-30 2019-04-26 浙江新铭智能科技有限公司 A kind of acquisition methods of the classifier of mood for identification
CN110008464A (en) * 2019-01-02 2019-07-12 阿里巴巴集团控股有限公司 Construction method, device, server and the readable storage medium storing program for executing of business dictionary
CN109871444A (en) * 2019-01-16 2019-06-11 北京邮电大学 A kind of file classification method and system
CN109871889A (en) * 2019-01-31 2019-06-11 内蒙古工业大学 Mass psychology appraisal procedure under emergency event
CN109783632A (en) * 2019-02-15 2019-05-21 腾讯科技(深圳)有限公司 Customer service information-pushing method, device, computer equipment and storage medium
CN110059187B (en) * 2019-04-10 2022-06-07 华侨大学 Deep learning text classification method integrating shallow semantic pre-judging mode
CN110059187A (en) * 2019-04-10 2019-07-26 华侨大学 A kind of deep learning file classification method of integrated shallow semantic anticipation mode
CN110232914A (en) * 2019-05-20 2019-09-13 平安普惠企业管理有限公司 A kind of method for recognizing semantics, device and relevant device
CN110347821B (en) * 2019-05-29 2023-08-25 华东理工大学 Text category labeling method, electronic equipment and readable storage medium
CN110347821A (en) * 2019-05-29 2019-10-18 华东理工大学 A kind of method, electronic equipment and the readable storage medium storing program for executing of text categories mark
CN110210028B (en) * 2019-05-30 2023-04-28 杭州远传新业科技股份有限公司 Method, device, equipment and medium for extracting domain feature words aiming at voice translation text
CN110210028A (en) * 2019-05-30 2019-09-06 杭州远传新业科技有限公司 For domain feature words extracting method, device, equipment and the medium of speech translation text
CN110188204A (en) * 2019-06-11 2019-08-30 腾讯科技(深圳)有限公司 A kind of extension corpora mining method, apparatus, server and storage medium
CN110188204B (en) * 2019-06-11 2022-10-04 腾讯科技(深圳)有限公司 Extended corpus mining method and device, server and storage medium
CN110674289A (en) * 2019-07-04 2020-01-10 南瑞集团有限公司 Method, device and storage medium for judging article belonged classification based on word segmentation weight
CN110457475B (en) * 2019-07-25 2023-06-30 创新先进技术有限公司 Method and system for text classification system construction and annotation corpus expansion
CN110457475A (en) * 2019-07-25 2019-11-15 阿里巴巴集团控股有限公司 A kind of method and system expanded for text classification system construction and mark corpus
CN110489556A (en) * 2019-08-22 2019-11-22 重庆锐云科技有限公司 Quality evaluating method, device, server and storage medium about follow-up record
CN112445907A (en) * 2019-09-02 2021-03-05 顺丰科技有限公司 Text emotion classification method, device and equipment and storage medium
CN110704581B (en) * 2019-09-11 2024-03-08 创新先进技术有限公司 Text emotion analysis method and device executed by computer
CN110704581A (en) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 Computer-executed text emotion analysis method and device
CN110597958A (en) * 2019-09-12 2019-12-20 苏州思必驰信息科技有限公司 Text classification model training and using method and device
CN110597958B (en) * 2019-09-12 2022-03-25 思必驰科技股份有限公司 Text classification model training and using method and device
CN112579768A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Emotion classification model training method, text emotion classification method and text emotion classification device
CN111104510A (en) * 2019-11-15 2020-05-05 南京中新赛克科技有限责任公司 Word embedding-based text classification training sample expansion method
CN111104510B (en) * 2019-11-15 2023-05-09 南京中新赛克科技有限责任公司 Text classification training sample expansion method based on word embedding
CN110990567A (en) * 2019-11-25 2020-04-10 国家电网有限公司 Electric power audit text classification method for enhancing domain features
CN111177403A (en) * 2019-12-16 2020-05-19 恩亿科(北京)数据科技有限公司 Sample data processing method and device
CN111177403B (en) * 2019-12-16 2023-06-23 恩亿科(北京)数据科技有限公司 Sample data processing method and device
CN111177378A (en) * 2019-12-20 2020-05-19 北京淇瑀信息科技有限公司 Text mining method and device and electronic equipment
CN111177378B (en) * 2019-12-20 2023-09-26 北京淇瑀信息科技有限公司 Text mining method and device and electronic equipment
CN111144097A (en) * 2019-12-25 2020-05-12 华中科技大学鄂州工业技术研究院 Modeling method and device for emotion tendency classification model of dialog text
CN111144097B (en) * 2019-12-25 2023-08-18 华中科技大学鄂州工业技术研究院 Modeling method and device for emotion tendency classification model of dialogue text
CN111159589A (en) * 2019-12-30 2020-05-15 中国银联股份有限公司 Classification dictionary establishing method, merchant data classification method, device and equipment
CN111159589B (en) * 2019-12-30 2023-10-20 中国银联股份有限公司 Classification dictionary establishment method, merchant data classification method, device and equipment
WO2021134524A1 (en) * 2019-12-31 2021-07-08 深圳市欢太科技有限公司 Data processing method, apparatus, electronic device, and storage medium
CN111143569B (en) * 2019-12-31 2023-05-02 腾讯科技(深圳)有限公司 Data processing method, device and computer readable storage medium
CN111143569A (en) * 2019-12-31 2020-05-12 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN111339268A (en) * 2020-02-19 2020-06-26 北京百度网讯科技有限公司 Entity word recognition method and device
CN111339268B (en) * 2020-02-19 2023-08-15 北京百度网讯科技有限公司 Entity word recognition method and device
CN111523308A (en) * 2020-03-18 2020-08-11 大箴(杭州)科技有限公司 Chinese word segmentation method and device and computer equipment
CN111523308B (en) * 2020-03-18 2024-01-26 大箴(杭州)科技有限公司 Chinese word segmentation method and device and computer equipment
CN111325033A (en) * 2020-03-20 2020-06-23 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111444326A (en) * 2020-03-30 2020-07-24 腾讯科技(深圳)有限公司 Text data processing method, device, equipment and storage medium
CN111444326B (en) * 2020-03-30 2023-10-20 腾讯科技(深圳)有限公司 Text data processing method, device, equipment and storage medium
CN113111175A (en) * 2020-04-28 2021-07-13 北京明亿科技有限公司 Extreme behavior identification method, device, equipment and medium based on deep learning model
CN111368555A (en) * 2020-05-27 2020-07-03 腾讯科技(深圳)有限公司 Data identification method and device, storage medium and electronic equipment
CN111782803A (en) * 2020-06-05 2020-10-16 京东数字科技控股有限公司 Work order processing method and device, electronic equipment and storage medium
CN112417860A (en) * 2020-12-08 2021-02-26 携程计算机技术(上海)有限公司 Training sample enhancement method, system, device and storage medium
CN112632971B (en) * 2020-12-18 2023-08-25 上海明略人工智能(集团)有限公司 Word vector training method and system for entity matching
CN112632971A (en) * 2020-12-18 2021-04-09 上海明略人工智能(集团)有限公司 Word vector training method and system for entity matching
CN112926631A (en) * 2021-02-01 2021-06-08 大箴(杭州)科技有限公司 Financial text classification method and device and computer equipment
CN113051401A (en) * 2021-04-06 2021-06-29 明品云(北京)数据科技有限公司 Text structured labeling method, system, device and medium
CN113240485A (en) * 2021-05-10 2021-08-10 北京沃东天骏信息技术有限公司 Training method of text generation model, and text generation method and device
CN113177109A (en) * 2021-05-27 2021-07-27 中国平安人寿保险股份有限公司 Text weak labeling method, device, equipment and storage medium
CN113723114A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 Semantic analysis method, device and equipment based on multi-intent recognition and storage medium
CN113642678A (en) * 2021-10-12 2021-11-12 南京山猫齐动信息技术有限公司 Method, device and storage medium for generating confrontation message sample
CN114091469B (en) * 2021-11-23 2022-08-19 杭州萝卜智能技术有限公司 Network public opinion analysis method based on sample expansion
CN114091469A (en) * 2021-11-23 2022-02-25 杭州萝卜智能技术有限公司 Sample expansion based network public opinion analysis method
CN115861606B (en) * 2022-05-09 2023-09-08 北京中关村科金技术有限公司 Classification method, device and storage medium for long-tail distributed documents
CN115861606A (en) * 2022-05-09 2023-03-28 北京中关村科金技术有限公司 Method and device for classifying long-tail distribution documents and storage medium

Also Published As

Publication number Publication date
CN108804512B (en) 2020-11-24
WO2019200806A1 (en) 2019-10-24

Similar Documents

Publication Publication Date Title
CN108804512A (en) Generating means, method and the computer readable storage medium of textual classification model
CN110287479B (en) Named entity recognition method, electronic device and storage medium
CN108717406A (en) Text mood analysis method, device and storage medium
CN104268197B (en) A kind of industry comment data fine granularity sentiment analysis method
CN108629043A (en) Extracting method, device and the storage medium of webpage target information
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN108416384A (en) A kind of image tag mask method, system, equipment and readable storage medium storing program for executing
CN108664473A (en) Recognition methods, electronic device and the readable storage medium storing program for executing of text key message
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN107818105A (en) The recommendation method and server of application program
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN107943847A (en) Business connection extracting method, device and storage medium
CN107169001A (en) A kind of textual classification model optimization method based on mass-rent feedback and Active Learning
CN107251060A (en) For the pre-training and/or transfer learning of sequence label device
CN110110335A (en) A kind of name entity recognition method based on Overlay model
CN109376240A (en) A kind of text analyzing method and terminal
CN107633036A (en) A kind of microblog users portrait method, electronic equipment, storage medium, system
CN111309910A (en) Text information mining method and device
CN106095845A (en) File classification method and device
CN107797989A (en) Enterprise name recognition methods, electronic equipment and computer-readable recording medium
CN111783468A (en) Text processing method, device, equipment and medium
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN111522908A (en) Multi-label text classification method based on BiGRU and attention mechanism
CN111475615A (en) Fine-grained emotion prediction method, device and system for emotion enhancement and storage medium
CN107169061A (en) A kind of text multi-tag sorting technique for merging double information sources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant