CN110197188A - Method, system, equipment and the storage medium of business scenario prediction, classification - Google Patents
Method, system, equipment and the storage medium of business scenario prediction, classification Download PDFInfo
- Publication number
- CN110197188A CN110197188A CN201810160035.XA CN201810160035A CN110197188A CN 110197188 A CN110197188 A CN 110197188A CN 201810160035 A CN201810160035 A CN 201810160035A CN 110197188 A CN110197188 A CN 110197188A
- Authority
- CN
- China
- Prior art keywords
- business scenario
- history
- input information
- business
- component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses method, system, equipment and the storage mediums of a kind of prediction of business scenario, classification.Wherein the prediction technique of business scenario includes: to preset customized dictionary, and customized dictionary includes N number of word, and wherein N is positive integer;Obtain the history input information of all users;Information is inputted for each history and adds label data, and label data includes multiple business scenarios;Each history input information is segmented;Indicate that each history input information, feature vector include N number of component with feature vector, N number of component corresponds respectively to each word in customized dictionary, and the value of N number of component respectively indicates the frequency that each word occurs in the history input information through segmenting;Training data is inputted into support vector machines, training data includes feature vector and label data, and training obtains prediction model, and prediction model is used to input information prediction target service scene according to user.
Description
Technical field
The present invention relates to machine learning field, in particular to a kind of business scenario prediction, the method for classification, system, equipment
And storage medium.
Background technique
With the fast development of artificial intelligence, machine learning algorithm has made great progress in Internet technical field.?
In human-computer interaction interface, according to user's input content, predicts and most think user point to user the business scenario of browsing, be future
The trend of Internet technology development.
The current technical solution for realizing business scenario classification uses Stamford CoreNLP handling implement, carries out basis first
The analysis such as language bottom layer participle, part-of-speech tagging, then writes canonical matching template, and then extracts the specific industry in specific words art
Business scene.The method that the technical solution extracts business scenario is more inflexible, only there is corresponding canonical matching template, Cai Nengcong
Specific transactions scene is extracted in specific words art.And with the expansion of business scenario, in order to extract the specific transactions in specific words art
Scene then needs to write more and more canonical matching templates, then more wasteful manpower and program resource, applies also not
It is enough flexible.
Summary of the invention
The technical problem to be solved by the present invention is in order to overcome in the prior art realize business scenario classification method compare
Inflexible defect provides a kind of method, system, equipment and storage medium that business scenario is predicted, classifies.
The present invention is to solve above-mentioned technical problem by following technical proposals:
A kind of prediction technique of business scenario, it is characterized in that, the prediction technique includes:
Customized dictionary is preset, the customized dictionary includes N number of word, and wherein N is positive integer;
Obtain the history input information of all users;
Information is inputted for each history and adds label data, and the label data includes multiple business scenarios;
Each history input information is segmented;
Each history input information is indicated with feature vector, described eigenvector includes N number of component, and described N number of point
Amount corresponds respectively to each word in the customized dictionary, and the value of N number of component respectively indicates each word and exists
The frequency occurred in history input information through segmenting;
Training data is inputted into support vector machines, the training data includes described eigenvector and the label data,
Training obtains prediction model, and the prediction model is used to input information prediction target service scene according to user.
Preferably, the step of obtaining the history input information of all users specifically includes:
The input journal for obtaining and cleaning according to preset rules all users obtains history input information.
Preferably, the input journal includes voice input journal.
Preferably, the business scenario comprises at least one of the following:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential look into
Ask business scenario, after-sale service business scenario, whole station through business scenario, unknown business scenario.
Preferably, described eigenvector further includes the N+1 component, and if the value of N number of component is 0, the N
The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.
A kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, it is characterized in that, the processor realizes the pre- of any of the above-described kind of business scenario when executing the computer program
Survey method.
A kind of computer readable storage medium, is stored thereon with computer program, it is characterized in that, the computer program
The prediction technique of any of the above-described kind of business scenario is realized when being executed by processor.
A kind of business scenario classification method, it is characterized in that, the business scenario classification method includes:
Prediction model is obtained using the prediction technique of any of the above-described kind of business scenario;
It obtains user speech and inputs information;
Information prediction target service scene is inputted according to the user speech using the prediction model.
A kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, it is characterized in that, the processor realizes above-mentioned business scenario classification method when executing the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, it is characterized in that, the computer program
Above-mentioned business scenario classification method is realized when being executed by processor.
A kind of forecasting system of business scenario, it is characterized in that, the forecasting system includes:
Dictionary presetting module, for presetting customized dictionary, the customized dictionary includes N number of word, and wherein N is positive whole
Number;
Historical information obtains module, and the history for obtaining all users inputs information;
Labeling module adds label data for inputting information for each history, and the label data includes multiple
Business scenario;
Word segmentation module, for being segmented to each history input information;
Feature vector representation module, for indicating that each history inputs information, described eigenvector with feature vector
Including N number of component, N number of component corresponds respectively to each word in the customized dictionary, the value point of N number of component
The frequency that each word occurs in the history input information through segmenting is not indicated;
Training module, for training data to be inputted support vector machines, the training data include described eigenvector and
The label data, training obtain prediction model, and the prediction model is used to input information prediction target service field according to user
Scape.
Preferably, the historical information obtains the input that module is also used to obtain and clean all users according to preset rules
Log obtains history input information.
Preferably, the input journal includes voice input journal.
Preferably, the business scenario comprises at least one of the following:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential look into
Ask business scenario, after-sale service business scenario, whole station through business scenario, unknown business scenario.
Preferably, described eigenvector further includes the N+1 component, and if the value of N number of component is 0, the N
The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.
A kind of business scenario categorizing system, it is characterized in that, the business scenario categorizing system includes voice messaging input
The forecasting system of module and any of the above-described kind of business scenario;
The voice messaging input module is for obtaining user speech input information;
The prediction model is used to input information prediction target service scene according to the user speech.
The positive effect of the present invention is that: the present invention is based on supporting vector machine model, feature vector to input and
Label data is trained, and the prediction model according to user's input prediction target service scene can be obtained, compared to this traditional
Smooth good fortune canonical matching template, flexible in application and coverage are wide.
Detailed description of the invention
Fig. 1 is the flow chart according to the prediction technique of the business scenario of the embodiment of the present invention 1.
Fig. 2 is the partial process view according to the prediction technique of the business scenario of the embodiment of the present invention 1.
Fig. 3 is the hardware structural diagram according to the electronic equipment of the embodiment of the present invention 2.
Fig. 4 is the flow chart according to the business scenario classification method of the embodiment of the present invention 4.
Fig. 5 is the structural schematic diagram according to the forecasting system of the business scenario of the embodiment of the present invention 7.
Fig. 6 is the structural schematic diagram according to the business scenario categorizing system of the embodiment of the present invention 8.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality
It applies among a range.
Embodiment 1
The present embodiment provides a kind of prediction technique of business scenario, Fig. 1 shows the flow chart of the present embodiment.Such as Fig. 1 institute
Show, the prediction technique of the business scenario of the present embodiment the following steps are included:
Step 101 presets customized dictionary;
Step 102, the history for obtaining all users input information;
Step 103 inputs information addition label data for each history;
Step 104 segments each history input information;
Step 105 indicates that each history inputs information with feature vector
Feature vector and label data are inputted support vector machines by step 106, and training obtains prediction model.
Specifically, in a step 101, customized dictionary includes N number of word (N is positive integer).It should be appreciated that custom words
Library can be configured according to actual needs, such as may include Chinese common dictionary and current commercial product word dictionary
With brand word dictionary, to which all living scenes of user can be covered.
As shown in Fig. 2, step 102 may further include following steps:
Step 1021, the input journal for obtaining all users;
Step 1022, the input journal that all users are cleaned according to preset rules.
In step 1021, the input journal of acquisition both may include the text that user is generated by written form input
Input journal also may include the voice input journal that user is generated by speech form input, to fully understand user
Demand.In step 1022, by preset rules, can will such as " ", "." in such skimble-skamble input
Appearance is cleared out, and then obtains the valuable history input information of tool to be further processed.
In step 103, information is inputted for each history by way of manually marking and add label data, to identify
Business scenario belonging to each history input information, so the label data in the present embodiment includes multiple business scenarios.
Specifically, special object inquiry business scene, scene meaning can be identified with label data ACT_COMMODITY
Refer to the buying intention of user or search commodity, the scene corresponding history input information can be with are as follows: I wants to buy the plane of child
Picture mosaic;You may I ask well either with or without the salt that iodine is not added;Design of scattered small flowers and plants one-piece dress etc..
Order inquiries business scenario can be identified with label data ACT_ORDER, which means and order or object
Stream is related, and the corresponding history input information of the scene can be with are as follows: where is the thing that I buys;The soymilk powder that I buys what fastly
It passs;Our things all where etc..
It can be identified with label data ACT_DISCOUNT and obscure preferential inquiry business scene, which means preferential work
Dynamic inquiry or coupon information inquiry, the corresponding history input information of the scene can be with are as follows: it is preferential to subtract 300 for number full 3000
Certificate I how neck less than;How discount coupon is led;It is preferential etc. that has.
Specific preferential inquiry business scene, scene meaning can be identified with label data ACT_SPECIFY_DISCOUNT
Refer to preferential inquiry to special object, the corresponding history input information of the scene can be with are as follows: I wants to buy the millet hand cheaply to give a discount
Machine;A invigorating now please be recommend dynamic to have preferential eye-protecting desk lamp etc..
After-sale service business scenario can be identified with label data ACT_AFTER_SALES, which means and replace
Goods, the after-sale services such as to report for repairment related, and the corresponding history input information of the scene can be with are as follows: Huawei freely plays how much 5 screens are broken;
I will return goods;I will exchange goods.
The through business scenario of whole station can be identified with label data ACT_SHORT_CUT, which means specifically to take
It is engaged in module, the scene corresponding history input information can be with are as follows: shopping cart;Customer service etc..
Unknown business scenario can be identified with label data UN_KNOWN, which means that this history input information does not belong to
In above-mentioned all business scenarios, such history input information can be with are as follows: you guess that I wants ask you what;How to purchase by group etc..
At step 104, according to customized dictionary preset in step 101, each history input information is segmented,
And then in step 105, indicate that each history inputs information with feature vector.Specifically, in step 105, feature vector packet
N number of component is included, which corresponds respectively to each word in customized dictionary, and the value of N number of component respectively indicates each
The frequency that word occurs in the history input information through segmenting.For example, customized dictionary be millet, white, delivery is beautiful,
Air-conditioning, electronic cigarette }, including 6 words, it is " to the air-conditioning of the beauty of my parcel received white " that then the history inputs that history, which inputs information,
Information feature vector can be expressed as [0,1,0,1,1,0].
Further, it is contemplated that the quantity of word is limited in customized dictionary, and there are history input information not to include
The situation of word in customized dictionary, feature vector can add N+1 component at this time, if the value of N number of component is 0, N
The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.For example, customized dictionary is { millet, white, delivery, beauty
, air-conditioning, electronic cigarette }, it is " to the air-conditioning of the beauty of my parcel received white " that then the history inputs information feature that history, which inputs information,
Vector can be expressed as [0,1,0,1,1,0,0], and it is " you may I ask well either with or without the salt that iodine is not added " that history, which inputs information, then this is gone through
History input information feature vector can be expressed as [0,0,0,0,0,0,1].
In step 106, using the label data that step 103 obtains and the feature vector that step 105 obtains as training number
According to input support vector machines, to prediction model is obtained after support vector machines training, which, which is used to be inputted according to user, believes
Breath prediction target service scene.
Specifically, in support vector machines in the present embodiment, for given training dataset { (X(1),y(1)),(X(2),y(2)),…,(X(n),y(n)) (wherein X(i)It can indicate feature vector, y(i)Can indicate label data) and separate super flat
Face defines separating hyperplane about sample point (X(i),y(i)) function interval are as follows: γi=y(i)(W.X(i)+ b), definition separates super
Plane concentrates the function interval of all sample points about separating hyperplane is divided between the function of training dataset about training data
Minimum value.The correctness and certainty that function interval can be predicted with presentation class, but in separating hyperplane, if it is joined
Number W and b expands as original 2 times simultaneously, this is for separating hyperplane, and there is no any changes, but for being spaced letter
Number, expands as original 2 times, in order to solve the problems, can introduce geometry interval.
A determining value in order to make interval, can parameter W to separating hyperplane add certain constraints, such as
Normalization, works as W.X(i)+ b and y(i)When jack per line, indicate that prediction is correct, sample to the geometry interval S between separating hyperplane can be with
It indicates are as follows:
Separating hyperplane maximum for geometry interval, each sample need to meet:
In function interval, the value at function interval has no effect on the solution of optimal problem.
Meet the mathematical formulae of margin maximization above, condition is harsher, it is desirable that all samples are all linear separabilities
, but in a practical situation, data set is difficult to meet such condition, for a data set, wherein there are the spies of part
Dissimilarity, but by these distinguished points remove after, remaining most sample at set be linear separability.
Solution to the minimum optimization problem of belt restraining can be converted into the solution of separating hyperplane, in the present embodiment
In, asking for unconstrained optimization problem is translated into using lagrange's method of multipliers for the optimization problem solving of belt restraining
Solution.
Separable problem nonlinear for one, can will be non-linear by the way of kernel function (such as gaussian kernel function)
The problem of be converted to linear problem.
By the duality of Lagrange, the optimization problem of original belt restraining is converted to its antithesis and asked by the present embodiment
Topic, and by the solution to dual problem, the optimal solution of dual problem is obtained, the optimal solution of primal problem is finally obtained.Sequence
The thought of minimum optimization algorithm is that a big problem is divided into a series of small problems, by asking these subproblems
Solution, reaches the solution procedure to dual problem.
The present embodiment obtains history input information on the basis of the input journal of existing user, inputs to each history
Information adds the label data of identification service scene, and indicates each historical data with feature vector, and then each history inputs
The feature vector and label data of information form one group of training data, and obtained multiple groups training data is inputted support vector machines
After training, the available prediction model according to user's input prediction target service scene.And then face ever-expanding industry
Business scene, without correspondingly writing more and more canonical matching templates, compared to traditional Stamford canonical matching template, this reality
Apply the wider using more flexible and coverage of example.
Embodiment 2
The present embodiment provides a kind of electronic equipment, electronic equipment can be showed by way of calculating equipment (such as can be with
For server apparatus), including memory, processor and store the computer journey that can be run on a memory and on a processor
The prediction technique of the business scenario of the offer of embodiment 1 may be implemented in sequence when wherein processor executes computer program.
Fig. 3 shows the hardware structural diagram of the present embodiment, as shown in figure 3, electronic equipment 9 specifically includes:
At least one processor 91, at least one processor 92 and for connecting different system components (including processor
91 and memory 92) bus 93, in which:
Bus 93 includes data/address bus, address bus and control bus.
Memory 92 includes volatile memory, such as random access memory (RAM) 921 and/or cache storage
Device 922 can further include read-only memory (ROM) 923.
Memory 92 further includes program/utility 925 with one group of (at least one) program module 924, such
Program module 924 includes but is not limited to: operating system, one or more application program, other program modules and program number
According to the realization that may include network environment in, each of these examples or certain combination.
Processor 91 by the computer program that is stored in memory 92 of operation, thereby executing various function application and
Data processing, such as the prediction technique of business scenario provided by the embodiment of the present invention 1.
Electronic equipment 9 may further be communicated with one or more external equipments 94 (such as keyboard, sensing equipment etc.).This
Kind communication can be carried out by input/output (I/O) interface 95.Also, electronic equipment 9 can also by network adapter 96 with
One or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Net
Network adapter 96 is communicated by bus 93 with other modules of electronic equipment 9.It should be understood that although not shown in the drawings, can tie
It closes electronic equipment 9 and uses other hardware and/or software module, including but not limited to: microcode, device driver, redundancy processing
Device, external disk drive array, RAID (disk array) system, tape drive and data backup storage system etc..
It should be noted that although being referred to several units/modules or subelement/mould of electronic equipment in the above detailed description
Block, but it is this division be only exemplary it is not enforceable.In fact, being retouched above according to presently filed embodiment
The feature and function for two or more units/modules stated can embody in a units/modules.Conversely, above description
A units/modules feature and function can with further division be embodied by multiple units/modules.
Embodiment 3
A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with
The prediction technique for the business scenario that embodiment 1 provides is realized when processor executes.
Wherein, what readable storage medium storing program for executing can use more specifically can include but is not limited to: portable disc, hard disk, random
Access memory, read-only memory, erasable programmable read only memory, light storage device, magnetic memory device or above-mentioned times
The suitable combination of meaning.
In possible embodiment, the present invention is also implemented as a kind of form of program product comprising program generation
Code, when described program product is run on the terminal device, said program code is realized in fact for executing the terminal device
Apply the step in the prediction technique of the business scenario in example 1.
Wherein it is possible to be write with any combination of one or more programming languages for executing program of the invention
Code, said program code can be executed fully on a user device, partly execute on a user device, is only as one
Vertical software package executes, part executes on a remote device or executes on a remote device completely on a user device for part.
Embodiment 4
The present embodiment provides a kind of business scenario classification method, Fig. 4 shows the flow chart of the present embodiment.As shown in figure 4,
The business scenario classification method of the present embodiment the following steps are included:
Step 201, the prediction technique of the business scenario provided using embodiment 1 obtain prediction model;
Step 202 obtains user speech input information;
Step 203 inputs information prediction target service scene according to user speech using prediction model.
Specifically, in human-computer interaction interface, user can pass to expression certainly by way of voice with terminal device ditch
Oneself demand.The input journal of traverse user is it is found that a word that user interacts with terminal device is often concentrated and expresses use
The demand at family, thus in the present embodiment can extract a word that user interacts with terminal device as prediction model
Input, and then prediction model can export the target service scene of the desired browsing of user of prediction according to the input content.Therefore,
Business scenario classification method provided in this embodiment can suit the demand of user, improve the Experience Degree of user.
Embodiment 5
The present embodiment provides a kind of electronic equipment, electronic equipment can be showed by way of calculating equipment (such as can be with
For server apparatus), including memory, processor and store the computer journey that can be run on a memory and on a processor
The business scenario classification method of the offer of embodiment 4 may be implemented in sequence when wherein processor executes computer program.
Embodiment 6
A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with
The business scenario classification method that embodiment 4 provides is realized when processor executes.
Embodiment 7
Embodiment 7 provides a kind of forecasting system of business scenario, and Fig. 5 shows the structural schematic diagram of the present embodiment.Such as Fig. 5
Shown, the forecasting system 10 of the business scenario of the present embodiment specifically includes: dictionary presetting module 1, historical information obtain module 2,
Labeling module 3, word segmentation module 4, feature vector representation module 5 and training module 6.
Specifically, for dictionary presetting module 1 for presetting customized dictionary, which includes that (N's N number of word is positive
Integer).It should be appreciated that customized dictionary can be configured according to actual needs, such as may include the common dictionary of Chinese, with
And current commercial product word dictionary and brand word dictionary, to which all living scenes of user can be covered.
Historical information obtains the history input information that module 2 is used to obtain all users, and specifically, historical information obtains mould
Block 2 can be used for obtaining and cleaning the input journal of all users according to preset rules, and then obtain history input information.Its
In, the input journal of acquisition both may include the text input journal that user is generated by written form input, also can wrap
The voice input journal that user is generated by speech form input is included, to fully understand the demand of user.In addition, by pre-
If regular, can will such as " ", "." such skimble-skamble input content clears out, and then obtains into one
The valuable history of tool for walking processing inputs information.
Labeling module 3, which is used to input information for each history, adds label data, specifically can be by the side that manually marks
Formula is that each history inputs information addition label data, to identify business scenario belonging to each history input information, so
Label data in the present embodiment includes multiple business scenarios.
Specifically, special object inquiry business scene, scene meaning can be identified with label data ACT_COMMODITY
Refer to the buying intention of user or search commodity, the scene corresponding history input information can be with are as follows: I wants to buy the plane of child
Picture mosaic;You may I ask well either with or without the salt that iodine is not added;Design of scattered small flowers and plants one-piece dress etc..
Order inquiries business scenario can be identified with label data ACT_ORDER, which means and order or object
Stream is related, and the corresponding history input information of the scene can be with are as follows: where is the thing that I buys;The soymilk powder that I buys what fastly
It passs;Our things all where etc..
It can be identified with label data ACT_DISCOUNT and obscure preferential inquiry business scene, which means preferential work
Dynamic inquiry or coupon information inquiry, the corresponding history input information of the scene can be with are as follows: it is preferential to subtract 300 for number full 3000
Certificate I how neck less than;How discount coupon is led;It is preferential etc. that has.
Specific preferential inquiry business scene, scene meaning can be identified with label data ACT_SPECIFY_DISCOUNT
Refer to preferential inquiry to special object, the corresponding history input information of the scene can be with are as follows: I wants to buy the millet hand cheaply to give a discount
Machine;A invigorating now please be recommend dynamic to have preferential eye-protecting desk lamp etc..
After-sale service business scenario can be identified with label data ACT_AFTER_SALES, which means and replace
Goods, the after-sale services such as to report for repairment related, and the corresponding history input information of the scene can be with are as follows: Huawei freely plays how much 5 screens are broken;
I will return goods;I will exchange goods.
The through business scenario of whole station can be identified with label data ACT_SHORT_CUT, which means specifically to take
It is engaged in module, the scene corresponding history input information can be with are as follows: shopping cart;Customer service etc..
Unknown business scenario can be identified with label data UN_KNOWN, which means that this history input information does not belong to
In above-mentioned all business scenarios, such history input information can be with are as follows: you guess that I wants ask you what;How to purchase by group etc..
Word segmentation module 4 is used to segment each history input information, and specifically, word segmentation module 4 is default according to dictionary
The preset customized dictionary of module 1 segments each history input information, and then feature vector representation module 5 uses feature
Vector indicates that each history inputs information.Wherein, feature vector includes N number of component, which corresponds respectively to custom words
Each word in library, the value of N number of component respectively indicate the frequency that each word occurs in the history input information through segmenting
It is secondary.For example, customized dictionary is { millet, white, delivery is beautiful, air-conditioning, electronic cigarette }, including 6 words, history input letter
Breath is " air-conditioning for giving the beauty of my parcel received white ", then history input information feature vector can be expressed as [0,1,0,1,1,
0]。
Further, it is contemplated that the quantity of word is limited in customized dictionary, and there are history input information not to include
The situation of word in customized dictionary, feature vector can add N+1 component at this time, if the value of N number of component is 0, N
The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.For example, customized dictionary is { millet, white, delivery, beauty
, air-conditioning, electronic cigarette }, it is " to the air-conditioning of the beauty of my parcel received white " that then the history inputs information feature that history, which inputs information,
Vector can be expressed as [0,1,0,1,1,0,0], and it is " you may I ask well either with or without the salt that iodine is not added " that history, which inputs information, then this is gone through
History input information feature vector can be expressed as [0,0,0,0,0,0,1].
Training module 6 is used to training data inputting support vector machines, and specifically, training module 6 adds labeling module 3
Label data and the obtained feature vector of feature vector representation module 5 as training data input support vector machines, to support
Prediction model is obtained after vector machine training, which is used to input information prediction target service scene according to user.
Specifically, in support vector machines in the present embodiment, for given training dataset { (X(1),y(1)),(X(2),y(2)),…,(X(n),y(n)) (wherein X(i)It can indicate feature vector, y(i)Can indicate label data) and separate super flat
Face defines separating hyperplane about sample point (X(i),y(i)) function interval are as follows: γi=y(i)(W.X(i)+ b), definition separates super
Plane concentrates the function interval of all sample points about separating hyperplane is divided between the function of training dataset about training data
Minimum value.The correctness and certainty that function interval can be predicted with presentation class, but in separating hyperplane, if it is joined
Number W and b expands as original 2 times simultaneously, this is for separating hyperplane, and there is no any changes, but for being spaced letter
Number, expands as original 2 times, in order to solve the problems, can introduce geometry interval.
A determining value in order to make interval, can parameter W to separating hyperplane add certain constraints, such as
Normalization, works as W.X(i)+ b and y(i)When jack per line, indicate that prediction is correct, sample to the geometry interval S between separating hyperplane can be with
It indicates are as follows:
Separating hyperplane maximum for geometry interval, each sample need to meet:
In function interval, the value at function interval has no effect on the solution of optimal problem.
Meet the mathematical formulae of margin maximization above, condition is harsher, it is desirable that all samples are all linear separabilities
, but in a practical situation, data set is difficult to meet such condition, for a data set, wherein there are the spies of part
Dissimilarity, but by these distinguished points remove after, remaining most sample at set be linear separability.
Solution to the minimum optimization problem of belt restraining can be converted into the solution of separating hyperplane, in the present embodiment
In, asking for unconstrained optimization problem is translated into using lagrange's method of multipliers for the optimization problem solving of belt restraining
Solution.
Separable problem nonlinear for one, can will be non-linear by the way of kernel function (such as gaussian kernel function)
The problem of be converted to linear problem.
By the duality of Lagrange, the optimization problem of original belt restraining is converted to its antithesis and asked by the present embodiment
Topic, and by the solution to dual problem, the optimal solution of dual problem is obtained, the optimal solution of primal problem is finally obtained.Sequence
The thought of minimum optimization algorithm is that a big problem is divided into a series of small problems, by asking these subproblems
Solution, reaches the solution procedure to dual problem.
The present embodiment obtains history input information on the basis of the input journal of existing user, inputs to each history
Information adds the label data of identification service scene, and indicates each historical data with feature vector, and then each history inputs
The feature vector and label data of information form one group of training data, and obtained multiple groups training data is inputted support vector machines
After training, the available prediction model according to user's input prediction target service scene.And then face ever-expanding industry
Business scene, without correspondingly writing more and more canonical matching templates, compared to traditional Stamford canonical matching template, this reality
Apply the wider using more flexible and coverage of example.
Embodiment 8
The present embodiment provides a kind of business scenario categorizing system, Fig. 6 shows the structural schematic diagram of the present embodiment.Such as Fig. 6
Shown, the business scenario categorizing system of the present embodiment specifically includes: the business of voice messaging input module 7 and the offer of embodiment 7
The forecasting system 10 of scene.Wherein, voice messaging input module 7 is obtained for obtaining user speech input information, forecasting system 10
The prediction model obtained is used to input information prediction target service scene according to user speech.
Specifically, in human-computer interaction interface, user can pass to expression certainly by way of voice with terminal device ditch
Oneself demand.The input journal of traverse user is it is found that a word that user interacts with terminal device is often concentrated and expresses use
The demand at family, thus in the present embodiment can extract a word that user interacts with terminal device as prediction model
Input, and then prediction model can export the target service scene of the desired browsing of user of prediction according to the input content.Therefore,
Business scenario categorizing system provided in this embodiment can suit the demand of user, improve the Experience Degree of user.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only
For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from
Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and
Modification each falls within protection scope of the present invention.
Claims (16)
1. a kind of prediction technique of business scenario, which is characterized in that the prediction technique includes:
Customized dictionary is preset, the customized dictionary includes N number of word, and wherein N is positive integer;
Obtain the history input information of all users;
Information is inputted for each history and adds label data, and the label data includes multiple business scenarios;
Each history input information is segmented;
Each history input information is indicated with feature vector, and described eigenvector includes N number of component, N number of component point
Not Dui Yingyu each word in the customized dictionary, the value of N number of component respectively indicate each word through point
The frequency occurred in the history input information of word;
Training data is inputted into support vector machines, the training data includes described eigenvector and the label data, training
Prediction model is obtained, the prediction model is used to input information prediction target service scene according to user.
2. the prediction technique of business scenario as described in claim 1, which is characterized in that obtain the history input letter of all users
The step of breath, specifically includes:
The input journal for obtaining and cleaning according to preset rules all users obtains history input information.
3. the prediction technique of business scenario as claimed in claim 2, which is characterized in that the input journal includes voice input
Log.
4. the prediction technique of business scenario as described in claim 1, which is characterized in that the business scenario include it is following at least
It is a kind of:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential inquiry industry
Scene, after-sale service business scenario, the whole station of being engaged in through business scenario, unknown business scenario.
5. the prediction technique of business scenario as described in claim 1, which is characterized in that described eigenvector further includes N+1
A component, if the value of N number of component is 0, the value of the N+1 component is 1;Otherwise, the N+1 component
Value is 0.
6. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor is realized as described in any one of claim 1-5 when executing the computer program
Business scenario prediction technique.
7. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
Processor realizes the prediction technique of business scenario according to any one of claims 1 to 5 when executing.
8. a kind of business scenario classification method, which is characterized in that the business scenario classification method includes:
Prediction model is obtained using the prediction technique of business scenario according to any one of claims 1 to 5;
It obtains user speech and inputs information;
Information prediction target service scene is inputted according to the user speech using the prediction model.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor realizes business scenario as claimed in claim 8 when executing the computer program
Classification method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
Business scenario classification method as claimed in claim 8 is realized when being executed by processor.
11. a kind of forecasting system of business scenario, which is characterized in that the forecasting system includes:
Dictionary presetting module, for presetting customized dictionary, the customized dictionary includes N number of word, and wherein N is positive integer;
Historical information obtains module, and the history for obtaining all users inputs information;
Labeling module adds label data for inputting information for each history, and the label data includes multiple business
Scene;
Word segmentation module, for being segmented to each history input information;
Feature vector representation module, for indicating that each history inputs information with feature vector, described eigenvector includes N
A component, N number of component correspond respectively to each word in the customized dictionary, and the value of N number of component distinguishes table
Show the frequency that each word occurs in the history input information through segmenting;
Training module, for training data to be inputted support vector machines, the training data includes described eigenvector and described
Label data, training obtain prediction model, and the prediction model is used to input information prediction target service scene according to user.
12. such as the forecasting system of claim 11 business scenario, which is characterized in that the historical information obtains module and is also used to obtain
The input journal for taking and cleaning according to preset rules all users obtains history input information.
13. such as the forecasting system of claim 12 business scenario, which is characterized in that the input journal includes voice input day
Will.
14. such as the forecasting system of claim 11 business scenario, which is characterized in that the business scenario includes following at least one
Kind:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential inquiry industry
Scene, after-sale service business scenario, the whole station of being engaged in through business scenario, unknown business scenario.
15. such as the forecasting system of claim 11 business scenario, which is characterized in that described eigenvector further includes N+1 points
Amount, if the value of N number of component is 0, the value of the N+1 component is 1;Otherwise, the value of the N+1 component is
0。
16. a kind of business scenario categorizing system, which is characterized in that the business scenario categorizing system includes voice messaging input mould
The forecasting system of block and the business scenario as described in any one of claim 11-15;
The voice messaging input module is for obtaining user speech input information;
The prediction model is used to input information prediction target service scene according to the user speech.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810160035.XA CN110197188A (en) | 2018-02-26 | 2018-02-26 | Method, system, equipment and the storage medium of business scenario prediction, classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810160035.XA CN110197188A (en) | 2018-02-26 | 2018-02-26 | Method, system, equipment and the storage medium of business scenario prediction, classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110197188A true CN110197188A (en) | 2019-09-03 |
Family
ID=67750774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810160035.XA Pending CN110197188A (en) | 2018-02-26 | 2018-02-26 | Method, system, equipment and the storage medium of business scenario prediction, classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110197188A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111445139A (en) * | 2020-03-26 | 2020-07-24 | 平安普惠企业管理有限公司 | Business process simulation method and device, storage medium and electronic equipment |
CN111613212A (en) * | 2020-05-13 | 2020-09-01 | 携程旅游信息技术(上海)有限公司 | Speech recognition method, system, electronic device and storage medium |
CN111882224A (en) * | 2020-07-30 | 2020-11-03 | 上加下信息技术成都有限公司 | Method and device for classifying consumption scenes |
CN112749079A (en) * | 2019-10-31 | 2021-05-04 | 中国移动通信集团浙江有限公司 | Defect classification method and device for software test and computing equipment |
CN113362124A (en) * | 2020-03-06 | 2021-09-07 | 北京沃东天骏信息技术有限公司 | Order processing method, device, equipment and computer readable storage medium |
CN113781062A (en) * | 2020-08-03 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | User label display method and device |
CN115102871A (en) * | 2022-05-20 | 2022-09-23 | 浙江大学 | Energy internet control terminal service processing method based on service feature vector |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111933A (en) * | 2013-04-17 | 2014-10-22 | 阿里巴巴集团控股有限公司 | Method and device for acquiring business object label and building training model |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
US20170124071A1 (en) * | 2015-10-30 | 2017-05-04 | Alibaba Group Holding Limited | Method and system for statistics-based machine translation |
CN106997341A (en) * | 2017-03-22 | 2017-08-01 | 山东大学 | A kind of innovation scheme matching process, device, server and system |
-
2018
- 2018-02-26 CN CN201810160035.XA patent/CN110197188A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111933A (en) * | 2013-04-17 | 2014-10-22 | 阿里巴巴集团控股有限公司 | Method and device for acquiring business object label and building training model |
US20170124071A1 (en) * | 2015-10-30 | 2017-05-04 | Alibaba Group Holding Limited | Method and system for statistics-based machine translation |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
CN106997341A (en) * | 2017-03-22 | 2017-08-01 | 山东大学 | A kind of innovation scheme matching process, device, server and system |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112749079A (en) * | 2019-10-31 | 2021-05-04 | 中国移动通信集团浙江有限公司 | Defect classification method and device for software test and computing equipment |
CN112749079B (en) * | 2019-10-31 | 2023-12-26 | 中国移动通信集团浙江有限公司 | Defect classification method and device for software test and computing equipment |
CN113362124A (en) * | 2020-03-06 | 2021-09-07 | 北京沃东天骏信息技术有限公司 | Order processing method, device, equipment and computer readable storage medium |
CN111445139A (en) * | 2020-03-26 | 2020-07-24 | 平安普惠企业管理有限公司 | Business process simulation method and device, storage medium and electronic equipment |
CN111613212A (en) * | 2020-05-13 | 2020-09-01 | 携程旅游信息技术(上海)有限公司 | Speech recognition method, system, electronic device and storage medium |
CN111613212B (en) * | 2020-05-13 | 2023-10-31 | 携程旅游信息技术(上海)有限公司 | Speech recognition method, system, electronic device and storage medium |
CN111882224A (en) * | 2020-07-30 | 2020-11-03 | 上加下信息技术成都有限公司 | Method and device for classifying consumption scenes |
CN113781062A (en) * | 2020-08-03 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | User label display method and device |
CN115102871A (en) * | 2022-05-20 | 2022-09-23 | 浙江大学 | Energy internet control terminal service processing method based on service feature vector |
CN115102871B (en) * | 2022-05-20 | 2023-10-03 | 浙江大学 | Service feature vector-based energy internet control terminal service processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110197188A (en) | Method, system, equipment and the storage medium of business scenario prediction, classification | |
CN109190044B (en) | Personalized recommendation method, device, server and medium | |
US10025980B2 (en) | Assisting people with understanding charts | |
WO2020125445A1 (en) | Classification model training method, classification method, device and medium | |
CN107633007B (en) | Commodity comment data tagging system and method based on hierarchical AP clustering | |
US20170200205A1 (en) | Method and system for analyzing user reviews | |
CN103164463B (en) | Method and device for recommending labels | |
US9286380B2 (en) | Social media data analysis system and method | |
CN104239331B (en) | A kind of method and apparatus for realizing comment search engine sequence | |
US20180053234A1 (en) | Description information generation and presentation systems, methods, and devices | |
US11741094B2 (en) | Method and system for identifying core product terms | |
CN110674620A (en) | Target file generation method, device, medium and electronic equipment | |
US11055735B2 (en) | Creating meta-descriptors of marketing messages to facilitate in delivery performance analysis, delivery performance prediction and offer selection | |
CN112528638B (en) | Abnormal object identification method and device, electronic equipment and storage medium | |
CN110633398A (en) | Method for confirming central word, searching method, device and storage medium | |
CN107665221A (en) | The sorting technique and device of keyword | |
CN113065069B (en) | Bidirectional employment recommendation method and device based on data portrait | |
CN115759014A (en) | Dynamic intelligent analysis method and system and electronic equipment | |
CN109933793B (en) | Text polarity identification method, device and equipment and readable storage medium | |
US20200104901A1 (en) | Information processing apparatus | |
CN107688600B (en) | Knowledge point mining method and device | |
CN113327132A (en) | Multimedia recommendation method, device, equipment and storage medium | |
EP4283496A1 (en) | Techniques for automatic filling of an input form to generate a listing | |
Lo et al. | An emperical study on application of big data analytics to automate service desk business process | |
CN113127597A (en) | Processing method and device for search information and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |