CN108959275A - It is man-machine to white silk system based on online language translation - Google Patents
It is man-machine to white silk system based on online language translation Download PDFInfo
- Publication number
- CN108959275A CN108959275A CN201810751506.4A CN201810751506A CN108959275A CN 108959275 A CN108959275 A CN 108959275A CN 201810751506 A CN201810751506 A CN 201810751506A CN 108959275 A CN108959275 A CN 108959275A
- Authority
- CN
- China
- Prior art keywords
- ginfo
- sentence
- information
- data
- iinfo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013519 translation Methods 0.000 title claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 12
- 101100517651 Caenorhabditis elegans num-1 gene Proteins 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 abstract description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
In order to promote interactive intelligence and accuracy, the text customer service robot intelligence learning method based on big data that the present invention provides a kind of, comprising: translation module, for voice messaging to be carried out textual processing;Categorization module, for utilizing the dialogue data O for the textual form for being used for training to classify according to context;Training module is trained data O for detecting likability information Ginfo, similar sentence multiplicity information Iinfo, dialogue duration information Linfo.Calculating process of the present invention is quick, and the self study efficiency after carrying out SVM training greatly improves.
Description
Technical field
The present invention relates to field of artificial intelligence, more particularly, to man-machine to white silk system based on online language translation
System.
Background technique
In present human-computer dialogue question answering system, after user's input problem, what identification user to be inquired is intended that whole
The part of core in a question answering system, it is intended that identification is correct but accuracy rate is too low, when will cause the later period to user's return answer, answers
The problem of case can not select optimal answer too much;Intention assessment mistake will cause the meaning that can not understand user, to can give
User provides its undesired answer or can not directly provide answer.Existing question answering system mainly passes through computer
Algorithm logic realizes that basic process includes three case study, information retrieval and answer extracting processes.In these three processes
In, there is careless mistake in any one link, and user will be unable to obtain correct result.More importantly due to this question and answer system
The adjustability of system is poor, can not utilize customer problem, and allowing oneself becomes more intelligent, so when user inputs similarly again
It asks and mentions, be based on same logic, user still can not obtain correct result, unless the algorithm logic to this set question answering system carries out
Modification.It can be seen that the adjustability of question answering system has become the critical issue for influencing question answering system accuracy and timeliness.
The method of existing intention assessment, which is all based on, manually marks a large amount of corpus to be trained and predict, due to needing
A large amount of artificial mark, can have many uncontrollable factors, for example each mark personnel can lead the difference that corpus understands
Cause different annotation results, to identical problem have repeat mark as a result, and having identical language in different classification annotations
The mark of material;And when needing to add new intent classifier, it is necessary to be discussed, be determined by related personnel, retrained
Mark personnel could start to be labeled work, and machine can not add new classification automatically.During entire training pattern
A large amount of manpower and material resources can be consumed, and since many uncontrollable factors will affect the speed and progress of function training.
Summary of the invention
In order to promote interactive intelligence and accuracy, the present invention provides a kind of people based on online language translation
Machine is to white silk system, comprising:
Translation module, for voice messaging to be carried out textual processing;
Categorization module, for utilizing the dialogue data O for the textual form for being used for training to classify according to context;
Training module, for detecting likability information Ginfo, similar sentence multiplicity information Iinfo, dialogue duration information
Linfo is trained data O.
Further, the context includes pre-sales, mid-sales and three kinds of contexts, these three contexts are respectively provided with each other not after sale
With, predetermined weight.
Further, the likability information Ginfo include using term of courtesy number information Ginfo_wordnum,
The number information Ginfo_facenum and expression of word content information Ginfo_wordcontent and use expression are corresponding
ASCII character Ginfo_facecontent.
Further, the similar sentence multiplicity information Iinfo include the number information Iinfo_num of repeat statement with
And word content information Iinfo_content.
Further, described be trained to data O includes:
Different terms are split into according to semanteme to the dialogue data for trained textual form;
For g-th of sentence and the g+1 sentence, similarity convolution is carried out to wherein different semantic corresponding words, and
By convolution be worth it is maximum be defined as maximum word, by convolution be worth it is the smallest be defined as minimum word, g 1,2 ..., Num1, wherein
Num1 indicates the sentence number in the dialogue data of the textual form for training;
For the g+1 sentence, above-mentioned minimum word is deleted, and by the dialogue data for being used for trained textual form
In each sentence delete the minimum word in next statement adjacent thereto, wherein first full sentence retains, thus obtain by
Intermediate Session data R made of the multiple sentences and the first sentence obtained after above-mentioned deletion merge sequentially in time;
If sample training collection is combined into TRAIN={ (R, Ginfo_wordcontent, Ginfo_facecontent, Iinfo_
Content }, by each element in TRAIN using frequency of occurrence as instead identification, vacant locations with Ginfo_wordnum,
The arithmetic average of Ginfo_facenum, Iinfo_num and 4 quotient remainder fill, formed matrix A 1;It will be each in TRAIN
A element is using frequency of occurrence as instead identification, and vacant locations are with Ginfo_wordnum, Ginfo_facenum, Iinfo_num
Geometric mean and 4 quotient remainder fill, formed matrix A 2;
The characteristic value CH1 of the calculating matrix A1 and characteristic value CH2 of matrix A 2, according to pre-sales, mid-sales and three kinds of contexts after sale
By CH1 and CH2 multiplied by predetermined weight;If the number of iterations Iter is the upper integer of the geometrical mean of (CH1+CH2),
Within the scope of data O using maximum word as initial solution to ((Li-1*CH1+Li+1*CH2)/(Li-1*CH2+Li+1*CH1)) into
Row iteration takes upper integer M to obtained final iterative value m;M SVM training, above-mentioned i=1 ..., N are carried out to data O.
Calculating process of the present invention is simple, and the self study efficiency after carrying out SVM training greatly improves.
Specific embodiment
It is man-machine to white silk system based on online language translation that the present invention provides a kind of, comprising:
Translation module, for voice messaging to be carried out textual processing;
Categorization module, for utilizing the dialogue data O for the textual form for being used for training to classify according to context;
Training module, for detecting likability information Ginfo, similar sentence multiplicity information Iinfo, dialogue duration information
Linfo is trained data O.
Preferably, the context includes pre-sales, mid-sales and three kinds of contexts, these three contexts are respectively provided with different from each other after sale
, predetermined weight.
Preferably, the likability information Ginfo includes number information Ginfo_wordnum, the word using term of courtesy
The number information Ginfo_facenum and expression of language content information Ginfo_wordcontent and use expression are corresponding
ASCII character Ginfo_facecontent.
Preferably, the similar sentence multiplicity information Iinfo include repeat statement number information Iinfo_num and
Word content information Iinfo_content.
Preferably, described be trained to data O includes:
Different terms are split into according to semanteme to the dialogue data for trained textual form;
For g-th of sentence and the g+1 sentence, similarity convolution is carried out to wherein different semantic corresponding words, and
By convolution be worth it is maximum be defined as maximum word, by convolution be worth it is the smallest be defined as minimum word, g 1,2 ..., Num1, wherein
Num1 indicates the sentence number in the dialogue data of the textual form for training;
For the g+1 sentence, above-mentioned minimum word is deleted, and by the dialogue data for being used for trained textual form
In each sentence delete the minimum word in next statement adjacent thereto, wherein first full sentence retains, thus obtain by
Intermediate Session data R made of the multiple sentences and the first sentence obtained after above-mentioned deletion merge sequentially in time;
If sample training collection is combined into TRAIN={ (R, Ginfo_wordcontent, Ginfo_facecontent, Iinfo_
Content }, by each element in TRAIN using frequency of occurrence as instead identification, vacant locations with Ginfo_wordnum,
The arithmetic average of Ginfo_facenum, Iinfo_num and 4 quotient remainder fill, formed matrix A 1;It will be each in TRAIN
A element is using frequency of occurrence as instead identification, and vacant locations are with Ginfo_wordnum, Ginfo_facenum, Iinfo_num
Geometric mean and 4 quotient remainder fill, formed matrix A 2;
The characteristic value CH1 of the calculating matrix A1 and characteristic value CH2 of matrix A 2, according to pre-sales, mid-sales and three kinds of contexts after sale
By CH1 and CH2 multiplied by predetermined weight;If the number of iterations Iter is the upper integer of the geometrical mean of (CH1+CH2),
Within the scope of data O using maximum word as initial solution to ((Li-1*CH1+Li+1*CH2)/(Li-1*CH2+Li+1*CH1)) into
Row iteration takes upper integer M to obtained final iterative value m;M SVM training, above-mentioned i=1 ..., N are carried out to data O.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects
It describes in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all
Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in guarantor of the invention
Within the scope of shield.
Claims (5)
1. man-machine to white silk system based on online language translation, comprising:
Translation module, for voice messaging to be carried out textual processing;
Categorization module, for utilizing the dialogue data O for the textual form for being used for training to classify according to context;
Training module, for detecting likability information Ginfo, similar sentence multiplicity information Iinfo, dialogue duration information
Linfo is trained data O.
2. the method according to claim 1, wherein the context includes pre-sales, mid-sales and three kinds of contexts after sale,
These three contexts are respectively provided with different from each other, predetermined weight.
3. according to the method described in claim 2, it is characterized in that, the likability information Ginfo includes using term of courtesy
Number information Ginfo_wordnum, word content information Ginfo_wordcontent and using expression number information
The Ginfo_facenum and corresponding ASCII character Ginfo_facecontent of expression.
4. according to the method described in claim 3, it is characterized in that, the similar sentence multiplicity information Iinfo includes repeating
The number information Iinfo_num and word content information Iinfo_content of sentence.
5. according to the method described in claim 4, it is characterized in that, described be trained to data O includes:
Different terms are split into according to semanteme to the dialogue data for trained textual form;
For g-th of sentence and the g+1 sentence, similarity convolution is carried out to wherein different semantic corresponding words, and will volume
Product value is maximum to be defined as maximum word, by convolution be worth it is the smallest be defined as minimum word, g 1,2 ..., Num1, wherein Num1
Indicate the sentence number in the dialogue data of the textual form for training;
For the g+1 sentence, above-mentioned minimum word is deleted, and will be in the dialogue data that trained textual form is used for
Each sentence deletes the minimum word in next statement adjacent thereto, wherein first full sentence retains, to obtain by passing through
Intermediate Session data R made of the multiple sentences and the first sentence obtained after above-mentioned deletion merge sequentially in time;
If sample training collection is combined into TRAIN={ (R, Ginfo_wordcontent, Ginfo_facecontent, Iinfo_
Content }, by each element in TRAIN using frequency of occurrence as instead identification, vacant locations with Ginfo_wordnum,
The arithmetic average of Ginfo_facenum, Iinfo_num and 4 quotient remainder fill, formed matrix A 1;It will be each in TRAIN
A element is using frequency of occurrence as instead identification, and vacant locations are with Ginfo_wordnum, Ginfo_facenum, Iinfo_num
Geometric mean and 4 quotient remainder fill, formed matrix A 2;
The characteristic value CH1 of the calculating matrix A1 and characteristic value CH2 of matrix A 2, according to pre-sales, mid-sales and three kinds of contexts after sale are by CH1
With CH2 multiplied by predetermined weight;If the number of iterations Iter is the upper integer of the geometrical mean of (CH1+CH2), in data O
It is changed using maximum word as initial solution to ((Li-1*CH1+Li+1*CH2)/(Li-1*CH2+Li+1*CH1)) in range
In generation, takes upper integer M to obtained final iterative value m;M SVM training, above-mentioned i=1 ..., N are carried out to data O.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810751506.4A CN108959275B (en) | 2018-07-10 | 2018-07-10 | Man-machine sparring system based on online language translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810751506.4A CN108959275B (en) | 2018-07-10 | 2018-07-10 | Man-machine sparring system based on online language translation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108959275A true CN108959275A (en) | 2018-12-07 |
CN108959275B CN108959275B (en) | 2022-05-17 |
Family
ID=64482593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810751506.4A Active CN108959275B (en) | 2018-07-10 | 2018-07-10 | Man-machine sparring system based on online language translation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108959275B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897888A (en) * | 2018-07-10 | 2018-11-27 | 四川淘金你我信息技术有限公司 | It is man-machine to white silk method under voice customer service training scene |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279528A (en) * | 2013-05-31 | 2013-09-04 | 俞志晨 | Question-answering system and question-answering method based on man-machine integration |
CN104301554A (en) * | 2013-07-18 | 2015-01-21 | 中兴通讯股份有限公司 | Device and method used for detecting service quality of customer service staff |
US20150220511A1 (en) * | 2014-02-04 | 2015-08-06 | Maluuba Inc. | Method and system for generating natural language training data |
US20150363388A1 (en) * | 2014-06-11 | 2015-12-17 | Facebook, Inc. | Classifying languages for objects and entities |
CN107870896A (en) * | 2016-09-23 | 2018-04-03 | 苏宁云商集团股份有限公司 | A kind of dialog analysis method and device |
CN108170848A (en) * | 2018-01-18 | 2018-06-15 | 重庆邮电大学 | A kind of session operational scenarios sorting technique towards China Mobile's intelligent customer service |
-
2018
- 2018-07-10 CN CN201810751506.4A patent/CN108959275B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279528A (en) * | 2013-05-31 | 2013-09-04 | 俞志晨 | Question-answering system and question-answering method based on man-machine integration |
CN104301554A (en) * | 2013-07-18 | 2015-01-21 | 中兴通讯股份有限公司 | Device and method used for detecting service quality of customer service staff |
US20150220511A1 (en) * | 2014-02-04 | 2015-08-06 | Maluuba Inc. | Method and system for generating natural language training data |
US20150363388A1 (en) * | 2014-06-11 | 2015-12-17 | Facebook, Inc. | Classifying languages for objects and entities |
CN107870896A (en) * | 2016-09-23 | 2018-04-03 | 苏宁云商集团股份有限公司 | A kind of dialog analysis method and device |
CN108170848A (en) * | 2018-01-18 | 2018-06-15 | 重庆邮电大学 | A kind of session operational scenarios sorting technique towards China Mobile's intelligent customer service |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897888A (en) * | 2018-07-10 | 2018-11-27 | 四川淘金你我信息技术有限公司 | It is man-machine to white silk method under voice customer service training scene |
CN108897888B (en) * | 2018-07-10 | 2021-08-24 | 四川淘金你我信息技术有限公司 | Man-machine sparring method under voice customer service training scene |
Also Published As
Publication number | Publication date |
---|---|
CN108959275B (en) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106776538A (en) | The information extracting method of enterprise's noncanonical format document | |
CN109308323A (en) | A kind of construction method, device and the equipment of causality knowledge base | |
CN105868179A (en) | Intelligent asking-answering method and device | |
CN109783637A (en) | Electric power overhaul text mining method based on deep neural network | |
CN115858758A (en) | Intelligent customer service knowledge graph system with multiple unstructured data identification | |
CN111143571B (en) | Entity labeling model training method, entity labeling method and device | |
KR20200139008A (en) | User intention-analysis based contract recommendation and autocomplete service using deep learning | |
Aljedaani et al. | Learning sentiment analysis for accessibility user reviews | |
CN110851593A (en) | Complex value word vector construction method based on position and semantics | |
CN113157860A (en) | Electric power equipment maintenance knowledge graph construction method based on small-scale data | |
CN116010581A (en) | Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene | |
CN110287495A (en) | A kind of power marketing profession word recognition method and system | |
CN112036179B (en) | Electric power plan information extraction method based on text classification and semantic frame | |
CN108959275A (en) | It is man-machine to white silk system based on online language translation | |
CN108897888A (en) | It is man-machine to white silk method under voice customer service training scene | |
CN108959588A (en) | Text customer service robot intelligence learning method based on big data | |
CN106776568A (en) | Based on the rationale for the recommendation generation method that user evaluates | |
CN115952282A (en) | Intelligent bank customer complaint diversion handling method and system based on NLP technology | |
CN112232681B (en) | Intelligent examination paper marking method for computational analysis type non-choice questions | |
CN115238093A (en) | Model training method and device, electronic equipment and storage medium | |
CN112988704A (en) | AI consultation database cluster building method and system | |
CN112579666A (en) | Intelligent question-answering system and method and related equipment | |
Xu et al. | SQL-DP: A Novel Difficulty Prediction Framework for SQL Programming Problems. | |
CN111209729A (en) | Method and device for identifying financial subject calculation relationship based on sequence labeling | |
CN106776533A (en) | Method and system for analyzing a piece of text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A human-machine training system based on online language translation Granted publication date: 20220517 Pledgee: Chengdu Rural Commercial Bank Co.,Ltd. Tianfu New Area Branch Pledgor: SICHUAN TAOJIN NIWO INFORMATION TECHNOLOGY Co.,Ltd. Registration number: Y2024980013623 |