CN110516041A - A kind of file classification method of interactive system - Google Patents

A kind of file classification method of interactive system Download PDF

Info

Publication number
CN110516041A
CN110516041A CN201910802162.XA CN201910802162A CN110516041A CN 110516041 A CN110516041 A CN 110516041A CN 201910802162 A CN201910802162 A CN 201910802162A CN 110516041 A CN110516041 A CN 110516041A
Authority
CN
China
Prior art keywords
prediction model
class
prediction
model
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910802162.XA
Other languages
Chinese (zh)
Inventor
吴龙飞
孙艺斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yong Yida Robot Co Ltd
Original Assignee
Shenzhen Yong Yida Robot Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yong Yida Robot Co Ltd filed Critical Shenzhen Yong Yida Robot Co Ltd
Priority to CN201910802162.XA priority Critical patent/CN110516041A/en
Publication of CN110516041A publication Critical patent/CN110516041A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of file classification methods of interactive system, including model training and model prediction two parts: model training is in the interactive system of database for containing tens of to hundreds of different fields, use two classification prediction model Ma of all database training, the database of different field is divided into two major classes, is respectively trained to obtain the second class prediction model Mc of each class in the first kind prediction model Mb and the second major class of each class in first major class;Model prediction is to be predicted using two classification prediction model Ma the text text after user speech identification, obtain prediction result, if as a result belonging to first kind prediction model Mb, then predicted using first kind prediction model Mb, judge whether prediction result score is greater than threshold value, to choose specific prediction model.Solve the problems, such as that existing machine learning algorithm is bad in accuracy and real-time of the human-computer dialogue field to text classification.

Description

A kind of file classification method of interactive system
Technical field
The present invention relates to a kind of human-computer dialogue text training methods, and in particular to a kind of text classification of interactive system Method.
Background technique
In recent years, with the rapid development of artificial intelligence technology, core of the interactive system as artificial intelligence field One of technology also greatly facilitates people's lives and work while improving people and machine communication efficiency.How effectively Speaking for acquisition user be intended that interactive key technology.
Due to the complexity and diversity of natural language, tens of or even up to a hundred necks are usually contained in interactive system Domain, when being classified using machine learning method to so many field, needed for the accuracy and train classification models of classification Time is not ideal.
Existing machine learning classification algorithm, in the case where corpus of text is constant, the quantity and training classification mould of classification Time needed for type is positively correlated, that is, classification number is more, and the time required for train classification models is also longer.In people In machine conversational system, since the field for needing to use is more, when classifying to the corpus of text of a large amount of different fields, then need Even time a couple of days, the iteration update of debugging and system for model in more than ten hour produces serious obstruction.
It is therefore desirable to be optimized to machine learning file classification method, to be more preferably applied to interactive system, obtain Better using effect and more preferably practicability.
Summary of the invention
The purpose of the present invention is to provide a kind of file classification methods of interactive system, to solve existing engineering Practise the algorithm problem bad in accuracy and real-time of the human-computer dialogue field to text classification.
To achieve the above object, the present invention adopts the following technical scheme:
A kind of file classification method of interactive system, the classification method include model training and model prediction two Point:
The model training is to make in the interactive system of database for containing tens of to hundreds of different fields With two classification prediction model Ma of all database training, the database of different field is divided into two major classes, training two major classes The prediction model of interior each class obtains each class in the first kind prediction model Mb and the second major class of each class in first major class Second class prediction model Mc;
The model prediction is to be predicted using two classification prediction model Ma the text text after user speech identification, It obtains prediction result, if as a result belonging to first kind prediction model Mb, is predicted using first kind prediction model Mb, judged Whether prediction result score is greater than threshold value, if score be greater than threshold value if use first kind prediction model Mb predict as a result, otherwise Predicted using the second class prediction model Mc, if result be greater than threshold value if use the second class prediction model Mc predict as a result, Otherwise the conduct prediction result that score is high in first kind prediction model Mb and the second class prediction model Mc is taken;
The model prediction is to be predicted using two classification prediction model Ma the text text after user speech identification, It obtains prediction result, if as a result belonging to the second class prediction model Mc, is predicted using the second class prediction model Mc, judged Whether prediction result score is greater than threshold value, if score be greater than threshold value if use the second class prediction model Mc predict as a result, otherwise Predicted using first kind prediction model Mb, if result be greater than threshold value if use first kind prediction model Mb predict as a result, Otherwise the conduct prediction result that score is high in first kind prediction model Mb and the second class prediction model Mc is taken.
Preferably, above-mentioned threshold value is empirical value, is tested in actual products by designer with the method for test of many times Out.
Preferably, the process of two major classes is divided into above-mentioned model training to the database of different field are as follows: will first count According to library number consecutively, median taken to number, first database to median is classified as the first kind, to the last one after median Database is the second class.
Preferably, the process of two major classes is divided into above-mentioned model training to the database of different field are as follows: will first count According to library number consecutively, it is that even number is classified as the first kind to number, is that odd number is classified as the second class to number.
Preferably, the above-mentioned classification of acquisition two prediction model (Ma), first kind prediction model (Mb) and the second class prediction model (Mc) training method is identical.
Preferably, above-mentioned to use two classification prediction model (Ma), first kind prediction model (Mb) and the second class prediction model (Mc) model prediction method is identical
Preferably, above-mentioned file classification method runs on vector machine.
A kind of support vector machines includes at least memory, processor, is stored with computer program, processor on memory Above method step is realized when executing the computer program on the memory.
The present invention has the advantage that
The present invention is optimized for file classification method of the machine learning algorithm in human-computer dialogue field, realizes pair Text efficiently, is accurately classified, and the efficiency and accuracy of interactive text classification are effectively improved.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the model training of the file classification method embodiment of interactive system of the present invention.
Fig. 2 is a kind of flow chart of the model prediction of the file classification method embodiment of interactive system of the present invention.
Specific embodiment
Embodiments of the present invention are illustrated by particular specific embodiment below, those skilled in the art can be by this explanation Content disclosed by book is understood other advantages and efficacy of the present invention easily.
It should be clear that this specification structure depicted in this specification institute accompanying drawings, ratio, size etc., only to cooperate specification to be taken off The content shown is not intended to limit the invention enforceable qualifications so that those skilled in the art understands and reads, therefore Do not have technical essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the present invention Under the effect of can be generated and the purpose that can reach, it should all still fall in disclosed technology contents and obtain the model that can cover In enclosing.Meanwhile cited such as "upper", "lower", " left side ", the right side in this specification ", the term of " centre ", be merely convenient to chat That states is illustrated, rather than to limit the scope of the invention, relativeness is altered or modified, and is changing skill without essence It is held in art, when being also considered as the enforceable scope of the present invention.
Embodiment 1
A kind of file classification method of interactive system, the classification method include model training and model prediction two Point:
Referring to Fig. 1, the model training is the human-computer dialogue in the database for containing tens of to hundreds of different fields In system, using two classification prediction model Ma of all database training, the database of different field is divided into two major classes, is instructed The prediction model for practicing each class in two major classes, obtains in the first kind prediction model Mb and the second major class of each class in first major class Second class prediction model Mc of each class;Assuming that field S1, S2 ..., Sn are shared, corresponding training text language in each field Material is respectively C1, C2 ..., Cn.Using the two disaggregated model Ma in whole field S1, S2 ..., Sn training field, field is put down Two major classes, i.e. S1~Sn/2 and Sn/2~Sn are divided into, using corpus of text C1 corresponding under S1, S2 ..., Sn/2, C2 ..., Cn/2 carries out the training of n/2 disaggregated model, obtains first kind prediction model Mb.Using under Sn/2, Sn/2+1 ..., Sn Corresponding corpus of text Cn/2, Cn/2+1 ... Cn carries out the training of n/2 disaggregated model, obtains the second class class model Mc.
Referring to fig. 2, the model prediction is to use two classification prediction model Ma to the text text after user speech identification It is predicted, obtains prediction result, if as a result belonging to first kind prediction model Mb, carried out using first kind prediction model Mb Prediction, judges whether prediction result score is greater than threshold value, uses first kind prediction model Mb to predict if score is greater than threshold value As a result, otherwise being predicted using the second class prediction model Mc, use the second class prediction model Mc pre- if result is greater than threshold value Survey as a result, otherwise taking the conduct prediction result that score is high in first kind prediction model Mb and the second class prediction model Mc;
The model prediction is to be predicted using two classification prediction model Ma the text text after user speech identification, It obtains prediction result, if as a result belonging to the second class prediction model Mc, is predicted using the second class prediction model Mc, judged Whether prediction result score is greater than threshold value, if score be greater than threshold value if use the second class prediction model Mc predict as a result, otherwise Predicted using first kind prediction model Mb, if result be greater than threshold value if use first kind prediction model Mb predict as a result, Otherwise the conduct prediction result that score is high in first kind prediction model Mb and the second class prediction model Mc is taken.
Preferably, above-mentioned threshold value is empirical value, is tested in actual products by designer with the method for test of many times Out.
Preferably, the process of two major classes is divided into above-mentioned model training to the database of different field are as follows: will first count According to library number consecutively, median taken to number, first database to median is classified as the first kind, to the last one after median Database is the second class.
Preferably, the process of two major classes is divided into above-mentioned model training to the database of different field are as follows: will first count According to library number consecutively, it is that even number is classified as the first kind to number, is that odd number is classified as the second class to number.
Preferably, the above-mentioned classification of acquisition two prediction model (Ma), first kind prediction model (Mb) and the second class prediction model (Mc) training method is identical.
Preferably, above-mentioned to use two classification prediction model (Ma), first kind prediction model (Mb) and the second class prediction model (Mc) model prediction method is identical
Preferably, above-mentioned file classification method runs on vector machine.
By taking entertainment for children educates human-computer dialogue file classification method as an example, all database includes music, story, weather, Cross-talk, storytelling, Beijing opera, perpetual calendar, menu, news, poem, national literature, character learning, translator of English, area conversion, volume conversion, together Antonym explains that history, Chinese idiom is explained;Field is divided into two major classes: one kind is amusement class: music, story, weather, phase Sound, storytelling, Beijing opera, perpetual calendar, menu, news;Another kind of is educational: poem, national literature, character learning, translator of English, area change It calculates, volume conversion explains, history, Chinese idiom is explained with antonym.
Model training stage:
All corpus under amusement class are expressed as entertaining, educational lower all corpus are expressed as educating, two classification of training Prediction model Ma;Use corpus training first kind prediction model Mb under amusement class prediction model Ma;Use educational prediction model Corpus trains the second class prediction model Mc under Ma.
Model service stage:
To the text after speech recognition, two classification predictions are carried out using Ma first, prediction belongs to amusement class or educational. If belonging to amusement class and being greater than threshold value, is predicted using Mb, return to prediction result;If belonged to educational and big It in threshold value, is then predicted using Mc, returns to prediction result;If belonging to amusement class and being less than threshold value, carried out using Mc pre- It surveys, prediction result is greater than threshold value, returns to the prediction result;If belonging to educational and being less than threshold value, predicted using Mb, Prediction result is greater than threshold value, returns to the prediction result;If belonging to amusement class and being less than threshold value, predicted using Mc, in advance It surveys result and is less than threshold value, predicted using Mb, choose prediction threshold value the greater in Mb and Mc, return to the result;If belonged to It is educational and be less than threshold value, predicted using Mb, prediction result be less than threshold value, predicted using Mc, choose Mb and Mc Middle prediction threshold value the greater, returns to the result.
Although above having used general explanation and specific embodiment, the present invention is described in detail, at this On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.

Claims (8)

1. a kind of file classification method of interactive system, it is characterised in that: the classification method includes model training and mould Type predicts two parts:
The model training is in the interactive system of database for containing tens of to hundreds of different fields, using complete Portion's database training two is classified prediction model (Ma), and the database of different field is divided into two major classes, is trained in two major classes The prediction model of each class obtains each class in the first kind prediction model (Mb) and the second major class of each class in first major class Second class prediction model (Mc);
The model prediction is to be predicted using two classification prediction model (Ma) the text text after user speech identification, is obtained Prediction result out is predicted using first kind prediction model (Mb), is sentenced if as a result belonging to first kind prediction model (Mb) Whether disconnected prediction result score is greater than threshold value, it is using that first kind prediction model (Mb) predict if score is greater than threshold value as a result, Otherwise it is predicted using the second class prediction model (Mc), uses the second class prediction model (Mc) to predict if result is greater than threshold value As a result, otherwise taking the conduct prediction result that score is high in first kind prediction model (Mb) and the second class prediction model (Mc);
The model prediction is to be predicted using two classification prediction model (Ma) the text text after user speech identification, is obtained Prediction result out is predicted using the second class prediction model (Mc), is sentenced if as a result belonging to the second class prediction model (Mc) Whether disconnected prediction result score is greater than threshold value, it is using that the second class prediction model (Mc) predicts if score is greater than threshold value as a result, Otherwise it is predicted using first kind prediction model (Mb), uses first kind prediction model (Mb) to predict if result is greater than threshold value As a result, otherwise taking the conduct prediction result that score is high in first kind prediction model (Mb) and the second class prediction model (Mc).
2. a kind of file classification method of interactive system according to claim 1, it is characterised in that: the threshold value is Empirical value is obtained in actual products by designer with the method test of test of many times.
3. a kind of file classification method of interactive system according to claim 1, it is characterised in that: the model instruction The process of two major classes is divided into white silk to the database of different field are as follows: first by database number consecutively, centre is taken to number Value, first database to median are classified as the first kind, to the last one database are the second class after median.
4. a kind of file classification method of interactive system according to claim 1, it is characterised in that: the model instruction The process of two major classes is divided into white silk to the database of different field are as follows: be even number to number first by database number consecutively Be classified as the first kind, be that odd number is classified as the second class to number.
5. a kind of file classification method of interactive system according to claim 1, it is characterised in that: described to obtain two Classification prediction model (Ma), first kind prediction model (Mb) are identical with the training method of the second class prediction model (Mc).
6. a kind of file classification method of interactive system according to claim 1, it is characterised in that: described to use two Classification prediction model (Ma), first kind prediction model (Mb) are identical with the model prediction method of the second class prediction model (Mc).
7. a kind of file classification method of interactive system according to claim 1, it is characterised in that: the text point Class method runs on vector machine.
8. a kind of support vector machines includes at least memory, processor, is stored with computer program on the memory, special Sign is: the processor realizes method and step described in claim 1 when executing the computer program on the memory.
CN201910802162.XA 2019-08-28 2019-08-28 A kind of file classification method of interactive system Pending CN110516041A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910802162.XA CN110516041A (en) 2019-08-28 2019-08-28 A kind of file classification method of interactive system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910802162.XA CN110516041A (en) 2019-08-28 2019-08-28 A kind of file classification method of interactive system

Publications (1)

Publication Number Publication Date
CN110516041A true CN110516041A (en) 2019-11-29

Family

ID=68627611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910802162.XA Pending CN110516041A (en) 2019-08-28 2019-08-28 A kind of file classification method of interactive system

Country Status (1)

Country Link
CN (1) CN110516041A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184316A (en) * 2015-08-28 2015-12-23 国网智能电网研究院 Support vector machine power grid business classification method based on feature weight learning
CN105310696A (en) * 2015-11-06 2016-02-10 中国科学院计算技术研究所 Fall detection model construction method as well as corresponding fall detection method and apparatus
CN106021461A (en) * 2016-05-17 2016-10-12 深圳市中润四方信息技术有限公司 Text classification method and text classification system
CN106326914A (en) * 2016-08-08 2017-01-11 诸暨市奇剑智能科技有限公司 SVM-based pearl multi-classification method
CN107704853A (en) * 2017-11-24 2018-02-16 重庆邮电大学 A kind of recognition methods of the traffic lights based on multi-categorizer
CN108062561A (en) * 2017-12-05 2018-05-22 华南理工大学 A kind of short time data stream Forecasting Methodology based on long memory network model in short-term
CN109471938A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal
CN110162633A (en) * 2019-05-21 2019-08-23 深圳市珍爱云信息技术有限公司 Voice data is intended to determine method, apparatus, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184316A (en) * 2015-08-28 2015-12-23 国网智能电网研究院 Support vector machine power grid business classification method based on feature weight learning
CN105310696A (en) * 2015-11-06 2016-02-10 中国科学院计算技术研究所 Fall detection model construction method as well as corresponding fall detection method and apparatus
CN106021461A (en) * 2016-05-17 2016-10-12 深圳市中润四方信息技术有限公司 Text classification method and text classification system
CN106326914A (en) * 2016-08-08 2017-01-11 诸暨市奇剑智能科技有限公司 SVM-based pearl multi-classification method
CN107704853A (en) * 2017-11-24 2018-02-16 重庆邮电大学 A kind of recognition methods of the traffic lights based on multi-categorizer
CN108062561A (en) * 2017-12-05 2018-05-22 华南理工大学 A kind of short time data stream Forecasting Methodology based on long memory network model in short-term
CN109471938A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal
CN110162633A (en) * 2019-05-21 2019-08-23 深圳市珍爱云信息技术有限公司 Voice data is intended to determine method, apparatus, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110168495B (en) Searchable database of trained artificial intelligence objects
CN110442718B (en) Statement processing method and device, server and storage medium
CN111339255B (en) Target emotion analysis method, model training method, medium, and device
CN110309305A (en) Machine based on multitask joint training reads understanding method and computer storage medium
CN107577662A (en) Towards the semantic understanding system and method for Chinese text
CN105632251A (en) 3D virtual teacher system having voice function and method thereof
CN110427629A (en) Semi-supervised text simplified model training method and system
CN109359290B (en) Knowledge point determining method of test question text, electronic equipment and storage medium
CN109918501A (en) Method, apparatus, equipment and the storage medium of news article classification
CN103123633A (en) Generation method of evaluation parameters and information searching method based on evaluation parameters
CN111694937A (en) Interviewing method and device based on artificial intelligence, computer equipment and storage medium
CN113688245B (en) Processing method, device and equipment of pre-training language model based on artificial intelligence
CN111400473A (en) Method and device for training intention recognition model, storage medium and electronic equipment
CN114281957A (en) Natural language data query method and device, electronic equipment and storage medium
CN109614480A (en) A kind of generation method and device of the autoabstract based on production confrontation network
Zhu An educational approach to machine learning with mobile applications
CN113392640A (en) Title determining method, device, equipment and storage medium
CN117218482A (en) Model training method, video processing device and electronic equipment
CN111062216B (en) Named entity identification method, device, terminal and readable medium
Zylich et al. Linguistic skill modeling for second language acquisition
Gregg Perceptual structures and semantic relations
CN116956902A (en) Text rewriting method, device, equipment and computer readable storage medium
CN117216544A (en) Model training method, natural language processing method, device and storage medium
CN110516041A (en) A kind of file classification method of interactive system
CN115658885A (en) Intelligent text labeling method and system, intelligent terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129

RJ01 Rejection of invention patent application after publication