CN110197188A - Method, system, equipment and the storage medium of business scenario prediction, classification - Google Patents

Method, system, equipment and the storage medium of business scenario prediction, classification Download PDF

Info

Publication number
CN110197188A
CN110197188A CN201810160035.XA CN201810160035A CN110197188A CN 110197188 A CN110197188 A CN 110197188A CN 201810160035 A CN201810160035 A CN 201810160035A CN 110197188 A CN110197188 A CN 110197188A
Authority
CN
China
Prior art keywords
business scenario
history
input information
business
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810160035.XA
Other languages
Chinese (zh)
Inventor
王颖帅
李晓霞
苗诗雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810160035.XA priority Critical patent/CN110197188A/en
Publication of CN110197188A publication Critical patent/CN110197188A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses method, system, equipment and the storage mediums of a kind of prediction of business scenario, classification.Wherein the prediction technique of business scenario includes: to preset customized dictionary, and customized dictionary includes N number of word, and wherein N is positive integer;Obtain the history input information of all users;Information is inputted for each history and adds label data, and label data includes multiple business scenarios;Each history input information is segmented;Indicate that each history input information, feature vector include N number of component with feature vector, N number of component corresponds respectively to each word in customized dictionary, and the value of N number of component respectively indicates the frequency that each word occurs in the history input information through segmenting;Training data is inputted into support vector machines, training data includes feature vector and label data, and training obtains prediction model, and prediction model is used to input information prediction target service scene according to user.

Description

Method, system, equipment and the storage medium of business scenario prediction, classification
Technical field
The present invention relates to machine learning field, in particular to a kind of business scenario prediction, the method for classification, system, equipment And storage medium.
Background technique
With the fast development of artificial intelligence, machine learning algorithm has made great progress in Internet technical field.? In human-computer interaction interface, according to user's input content, predicts and most think user point to user the business scenario of browsing, be future The trend of Internet technology development.
The current technical solution for realizing business scenario classification uses Stamford CoreNLP handling implement, carries out basis first The analysis such as language bottom layer participle, part-of-speech tagging, then writes canonical matching template, and then extracts the specific industry in specific words art Business scene.The method that the technical solution extracts business scenario is more inflexible, only there is corresponding canonical matching template, Cai Nengcong Specific transactions scene is extracted in specific words art.And with the expansion of business scenario, in order to extract the specific transactions in specific words art Scene then needs to write more and more canonical matching templates, then more wasteful manpower and program resource, applies also not It is enough flexible.
Summary of the invention
The technical problem to be solved by the present invention is in order to overcome in the prior art realize business scenario classification method compare Inflexible defect provides a kind of method, system, equipment and storage medium that business scenario is predicted, classifies.
The present invention is to solve above-mentioned technical problem by following technical proposals:
A kind of prediction technique of business scenario, it is characterized in that, the prediction technique includes:
Customized dictionary is preset, the customized dictionary includes N number of word, and wherein N is positive integer;
Obtain the history input information of all users;
Information is inputted for each history and adds label data, and the label data includes multiple business scenarios;
Each history input information is segmented;
Each history input information is indicated with feature vector, described eigenvector includes N number of component, and described N number of point Amount corresponds respectively to each word in the customized dictionary, and the value of N number of component respectively indicates each word and exists The frequency occurred in history input information through segmenting;
Training data is inputted into support vector machines, the training data includes described eigenvector and the label data, Training obtains prediction model, and the prediction model is used to input information prediction target service scene according to user.
Preferably, the step of obtaining the history input information of all users specifically includes:
The input journal for obtaining and cleaning according to preset rules all users obtains history input information.
Preferably, the input journal includes voice input journal.
Preferably, the business scenario comprises at least one of the following:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential look into Ask business scenario, after-sale service business scenario, whole station through business scenario, unknown business scenario.
Preferably, described eigenvector further includes the N+1 component, and if the value of N number of component is 0, the N The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.
A kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, it is characterized in that, the processor realizes the pre- of any of the above-described kind of business scenario when executing the computer program Survey method.
A kind of computer readable storage medium, is stored thereon with computer program, it is characterized in that, the computer program The prediction technique of any of the above-described kind of business scenario is realized when being executed by processor.
A kind of business scenario classification method, it is characterized in that, the business scenario classification method includes:
Prediction model is obtained using the prediction technique of any of the above-described kind of business scenario;
It obtains user speech and inputs information;
Information prediction target service scene is inputted according to the user speech using the prediction model.
A kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, it is characterized in that, the processor realizes above-mentioned business scenario classification method when executing the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, it is characterized in that, the computer program Above-mentioned business scenario classification method is realized when being executed by processor.
A kind of forecasting system of business scenario, it is characterized in that, the forecasting system includes:
Dictionary presetting module, for presetting customized dictionary, the customized dictionary includes N number of word, and wherein N is positive whole Number;
Historical information obtains module, and the history for obtaining all users inputs information;
Labeling module adds label data for inputting information for each history, and the label data includes multiple Business scenario;
Word segmentation module, for being segmented to each history input information;
Feature vector representation module, for indicating that each history inputs information, described eigenvector with feature vector Including N number of component, N number of component corresponds respectively to each word in the customized dictionary, the value point of N number of component The frequency that each word occurs in the history input information through segmenting is not indicated;
Training module, for training data to be inputted support vector machines, the training data include described eigenvector and The label data, training obtain prediction model, and the prediction model is used to input information prediction target service field according to user Scape.
Preferably, the historical information obtains the input that module is also used to obtain and clean all users according to preset rules Log obtains history input information.
Preferably, the input journal includes voice input journal.
Preferably, the business scenario comprises at least one of the following:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential look into Ask business scenario, after-sale service business scenario, whole station through business scenario, unknown business scenario.
Preferably, described eigenvector further includes the N+1 component, and if the value of N number of component is 0, the N The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.
A kind of business scenario categorizing system, it is characterized in that, the business scenario categorizing system includes voice messaging input The forecasting system of module and any of the above-described kind of business scenario;
The voice messaging input module is for obtaining user speech input information;
The prediction model is used to input information prediction target service scene according to the user speech.
The positive effect of the present invention is that: the present invention is based on supporting vector machine model, feature vector to input and Label data is trained, and the prediction model according to user's input prediction target service scene can be obtained, compared to this traditional Smooth good fortune canonical matching template, flexible in application and coverage are wide.
Detailed description of the invention
Fig. 1 is the flow chart according to the prediction technique of the business scenario of the embodiment of the present invention 1.
Fig. 2 is the partial process view according to the prediction technique of the business scenario of the embodiment of the present invention 1.
Fig. 3 is the hardware structural diagram according to the electronic equipment of the embodiment of the present invention 2.
Fig. 4 is the flow chart according to the business scenario classification method of the embodiment of the present invention 4.
Fig. 5 is the structural schematic diagram according to the forecasting system of the business scenario of the embodiment of the present invention 7.
Fig. 6 is the structural schematic diagram according to the business scenario categorizing system of the embodiment of the present invention 8.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality It applies among a range.
Embodiment 1
The present embodiment provides a kind of prediction technique of business scenario, Fig. 1 shows the flow chart of the present embodiment.Such as Fig. 1 institute Show, the prediction technique of the business scenario of the present embodiment the following steps are included:
Step 101 presets customized dictionary;
Step 102, the history for obtaining all users input information;
Step 103 inputs information addition label data for each history;
Step 104 segments each history input information;
Step 105 indicates that each history inputs information with feature vector
Feature vector and label data are inputted support vector machines by step 106, and training obtains prediction model.
Specifically, in a step 101, customized dictionary includes N number of word (N is positive integer).It should be appreciated that custom words Library can be configured according to actual needs, such as may include Chinese common dictionary and current commercial product word dictionary With brand word dictionary, to which all living scenes of user can be covered.
As shown in Fig. 2, step 102 may further include following steps:
Step 1021, the input journal for obtaining all users;
Step 1022, the input journal that all users are cleaned according to preset rules.
In step 1021, the input journal of acquisition both may include the text that user is generated by written form input Input journal also may include the voice input journal that user is generated by speech form input, to fully understand user Demand.In step 1022, by preset rules, can will such as " ", "." in such skimble-skamble input Appearance is cleared out, and then obtains the valuable history input information of tool to be further processed.
In step 103, information is inputted for each history by way of manually marking and add label data, to identify Business scenario belonging to each history input information, so the label data in the present embodiment includes multiple business scenarios.
Specifically, special object inquiry business scene, scene meaning can be identified with label data ACT_COMMODITY Refer to the buying intention of user or search commodity, the scene corresponding history input information can be with are as follows: I wants to buy the plane of child Picture mosaic;You may I ask well either with or without the salt that iodine is not added;Design of scattered small flowers and plants one-piece dress etc..
Order inquiries business scenario can be identified with label data ACT_ORDER, which means and order or object Stream is related, and the corresponding history input information of the scene can be with are as follows: where is the thing that I buys;The soymilk powder that I buys what fastly It passs;Our things all where etc..
It can be identified with label data ACT_DISCOUNT and obscure preferential inquiry business scene, which means preferential work Dynamic inquiry or coupon information inquiry, the corresponding history input information of the scene can be with are as follows: it is preferential to subtract 300 for number full 3000 Certificate I how neck less than;How discount coupon is led;It is preferential etc. that has.
Specific preferential inquiry business scene, scene meaning can be identified with label data ACT_SPECIFY_DISCOUNT Refer to preferential inquiry to special object, the corresponding history input information of the scene can be with are as follows: I wants to buy the millet hand cheaply to give a discount Machine;A invigorating now please be recommend dynamic to have preferential eye-protecting desk lamp etc..
After-sale service business scenario can be identified with label data ACT_AFTER_SALES, which means and replace Goods, the after-sale services such as to report for repairment related, and the corresponding history input information of the scene can be with are as follows: Huawei freely plays how much 5 screens are broken; I will return goods;I will exchange goods.
The through business scenario of whole station can be identified with label data ACT_SHORT_CUT, which means specifically to take It is engaged in module, the scene corresponding history input information can be with are as follows: shopping cart;Customer service etc..
Unknown business scenario can be identified with label data UN_KNOWN, which means that this history input information does not belong to In above-mentioned all business scenarios, such history input information can be with are as follows: you guess that I wants ask you what;How to purchase by group etc..
At step 104, according to customized dictionary preset in step 101, each history input information is segmented, And then in step 105, indicate that each history inputs information with feature vector.Specifically, in step 105, feature vector packet N number of component is included, which corresponds respectively to each word in customized dictionary, and the value of N number of component respectively indicates each The frequency that word occurs in the history input information through segmenting.For example, customized dictionary be millet, white, delivery is beautiful, Air-conditioning, electronic cigarette }, including 6 words, it is " to the air-conditioning of the beauty of my parcel received white " that then the history inputs that history, which inputs information, Information feature vector can be expressed as [0,1,0,1,1,0].
Further, it is contemplated that the quantity of word is limited in customized dictionary, and there are history input information not to include The situation of word in customized dictionary, feature vector can add N+1 component at this time, if the value of N number of component is 0, N The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.For example, customized dictionary is { millet, white, delivery, beauty , air-conditioning, electronic cigarette }, it is " to the air-conditioning of the beauty of my parcel received white " that then the history inputs information feature that history, which inputs information, Vector can be expressed as [0,1,0,1,1,0,0], and it is " you may I ask well either with or without the salt that iodine is not added " that history, which inputs information, then this is gone through History input information feature vector can be expressed as [0,0,0,0,0,0,1].
In step 106, using the label data that step 103 obtains and the feature vector that step 105 obtains as training number According to input support vector machines, to prediction model is obtained after support vector machines training, which, which is used to be inputted according to user, believes Breath prediction target service scene.
Specifically, in support vector machines in the present embodiment, for given training dataset { (X(1),y(1)),(X(2),y(2)),…,(X(n),y(n)) (wherein X(i)It can indicate feature vector, y(i)Can indicate label data) and separate super flat Face defines separating hyperplane about sample point (X(i),y(i)) function interval are as follows: γi=y(i)(W.X(i)+ b), definition separates super Plane concentrates the function interval of all sample points about separating hyperplane is divided between the function of training dataset about training data Minimum value.The correctness and certainty that function interval can be predicted with presentation class, but in separating hyperplane, if it is joined Number W and b expands as original 2 times simultaneously, this is for separating hyperplane, and there is no any changes, but for being spaced letter Number, expands as original 2 times, in order to solve the problems, can introduce geometry interval.
A determining value in order to make interval, can parameter W to separating hyperplane add certain constraints, such as Normalization, works as W.X(i)+ b and y(i)When jack per line, indicate that prediction is correct, sample to the geometry interval S between separating hyperplane can be with It indicates are as follows:
Separating hyperplane maximum for geometry interval, each sample need to meet:
In function interval, the value at function interval has no effect on the solution of optimal problem.
Meet the mathematical formulae of margin maximization above, condition is harsher, it is desirable that all samples are all linear separabilities , but in a practical situation, data set is difficult to meet such condition, for a data set, wherein there are the spies of part Dissimilarity, but by these distinguished points remove after, remaining most sample at set be linear separability.
Solution to the minimum optimization problem of belt restraining can be converted into the solution of separating hyperplane, in the present embodiment In, asking for unconstrained optimization problem is translated into using lagrange's method of multipliers for the optimization problem solving of belt restraining Solution.
Separable problem nonlinear for one, can will be non-linear by the way of kernel function (such as gaussian kernel function) The problem of be converted to linear problem.
By the duality of Lagrange, the optimization problem of original belt restraining is converted to its antithesis and asked by the present embodiment Topic, and by the solution to dual problem, the optimal solution of dual problem is obtained, the optimal solution of primal problem is finally obtained.Sequence The thought of minimum optimization algorithm is that a big problem is divided into a series of small problems, by asking these subproblems Solution, reaches the solution procedure to dual problem.
The present embodiment obtains history input information on the basis of the input journal of existing user, inputs to each history Information adds the label data of identification service scene, and indicates each historical data with feature vector, and then each history inputs The feature vector and label data of information form one group of training data, and obtained multiple groups training data is inputted support vector machines After training, the available prediction model according to user's input prediction target service scene.And then face ever-expanding industry Business scene, without correspondingly writing more and more canonical matching templates, compared to traditional Stamford canonical matching template, this reality Apply the wider using more flexible and coverage of example.
Embodiment 2
The present embodiment provides a kind of electronic equipment, electronic equipment can be showed by way of calculating equipment (such as can be with For server apparatus), including memory, processor and store the computer journey that can be run on a memory and on a processor The prediction technique of the business scenario of the offer of embodiment 1 may be implemented in sequence when wherein processor executes computer program.
Fig. 3 shows the hardware structural diagram of the present embodiment, as shown in figure 3, electronic equipment 9 specifically includes:
At least one processor 91, at least one processor 92 and for connecting different system components (including processor 91 and memory 92) bus 93, in which:
Bus 93 includes data/address bus, address bus and control bus.
Memory 92 includes volatile memory, such as random access memory (RAM) 921 and/or cache storage Device 922 can further include read-only memory (ROM) 923.
Memory 92 further includes program/utility 925 with one group of (at least one) program module 924, such Program module 924 includes but is not limited to: operating system, one or more application program, other program modules and program number According to the realization that may include network environment in, each of these examples or certain combination.
Processor 91 by the computer program that is stored in memory 92 of operation, thereby executing various function application and Data processing, such as the prediction technique of business scenario provided by the embodiment of the present invention 1.
Electronic equipment 9 may further be communicated with one or more external equipments 94 (such as keyboard, sensing equipment etc.).This Kind communication can be carried out by input/output (I/O) interface 95.Also, electronic equipment 9 can also by network adapter 96 with One or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Net Network adapter 96 is communicated by bus 93 with other modules of electronic equipment 9.It should be understood that although not shown in the drawings, can tie It closes electronic equipment 9 and uses other hardware and/or software module, including but not limited to: microcode, device driver, redundancy processing Device, external disk drive array, RAID (disk array) system, tape drive and data backup storage system etc..
It should be noted that although being referred to several units/modules or subelement/mould of electronic equipment in the above detailed description Block, but it is this division be only exemplary it is not enforceable.In fact, being retouched above according to presently filed embodiment The feature and function for two or more units/modules stated can embody in a units/modules.Conversely, above description A units/modules feature and function can with further division be embodied by multiple units/modules.
Embodiment 3
A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with The prediction technique for the business scenario that embodiment 1 provides is realized when processor executes.
Wherein, what readable storage medium storing program for executing can use more specifically can include but is not limited to: portable disc, hard disk, random Access memory, read-only memory, erasable programmable read only memory, light storage device, magnetic memory device or above-mentioned times The suitable combination of meaning.
In possible embodiment, the present invention is also implemented as a kind of form of program product comprising program generation Code, when described program product is run on the terminal device, said program code is realized in fact for executing the terminal device Apply the step in the prediction technique of the business scenario in example 1.
Wherein it is possible to be write with any combination of one or more programming languages for executing program of the invention Code, said program code can be executed fully on a user device, partly execute on a user device, is only as one Vertical software package executes, part executes on a remote device or executes on a remote device completely on a user device for part.
Embodiment 4
The present embodiment provides a kind of business scenario classification method, Fig. 4 shows the flow chart of the present embodiment.As shown in figure 4, The business scenario classification method of the present embodiment the following steps are included:
Step 201, the prediction technique of the business scenario provided using embodiment 1 obtain prediction model;
Step 202 obtains user speech input information;
Step 203 inputs information prediction target service scene according to user speech using prediction model.
Specifically, in human-computer interaction interface, user can pass to expression certainly by way of voice with terminal device ditch Oneself demand.The input journal of traverse user is it is found that a word that user interacts with terminal device is often concentrated and expresses use The demand at family, thus in the present embodiment can extract a word that user interacts with terminal device as prediction model Input, and then prediction model can export the target service scene of the desired browsing of user of prediction according to the input content.Therefore, Business scenario classification method provided in this embodiment can suit the demand of user, improve the Experience Degree of user.
Embodiment 5
The present embodiment provides a kind of electronic equipment, electronic equipment can be showed by way of calculating equipment (such as can be with For server apparatus), including memory, processor and store the computer journey that can be run on a memory and on a processor The business scenario classification method of the offer of embodiment 4 may be implemented in sequence when wherein processor executes computer program.
Embodiment 6
A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with The business scenario classification method that embodiment 4 provides is realized when processor executes.
Embodiment 7
Embodiment 7 provides a kind of forecasting system of business scenario, and Fig. 5 shows the structural schematic diagram of the present embodiment.Such as Fig. 5 Shown, the forecasting system 10 of the business scenario of the present embodiment specifically includes: dictionary presetting module 1, historical information obtain module 2, Labeling module 3, word segmentation module 4, feature vector representation module 5 and training module 6.
Specifically, for dictionary presetting module 1 for presetting customized dictionary, which includes that (N's N number of word is positive Integer).It should be appreciated that customized dictionary can be configured according to actual needs, such as may include the common dictionary of Chinese, with And current commercial product word dictionary and brand word dictionary, to which all living scenes of user can be covered.
Historical information obtains the history input information that module 2 is used to obtain all users, and specifically, historical information obtains mould Block 2 can be used for obtaining and cleaning the input journal of all users according to preset rules, and then obtain history input information.Its In, the input journal of acquisition both may include the text input journal that user is generated by written form input, also can wrap The voice input journal that user is generated by speech form input is included, to fully understand the demand of user.In addition, by pre- If regular, can will such as " ", "." such skimble-skamble input content clears out, and then obtains into one The valuable history of tool for walking processing inputs information.
Labeling module 3, which is used to input information for each history, adds label data, specifically can be by the side that manually marks Formula is that each history inputs information addition label data, to identify business scenario belonging to each history input information, so Label data in the present embodiment includes multiple business scenarios.
Specifically, special object inquiry business scene, scene meaning can be identified with label data ACT_COMMODITY Refer to the buying intention of user or search commodity, the scene corresponding history input information can be with are as follows: I wants to buy the plane of child Picture mosaic;You may I ask well either with or without the salt that iodine is not added;Design of scattered small flowers and plants one-piece dress etc..
Order inquiries business scenario can be identified with label data ACT_ORDER, which means and order or object Stream is related, and the corresponding history input information of the scene can be with are as follows: where is the thing that I buys;The soymilk powder that I buys what fastly It passs;Our things all where etc..
It can be identified with label data ACT_DISCOUNT and obscure preferential inquiry business scene, which means preferential work Dynamic inquiry or coupon information inquiry, the corresponding history input information of the scene can be with are as follows: it is preferential to subtract 300 for number full 3000 Certificate I how neck less than;How discount coupon is led;It is preferential etc. that has.
Specific preferential inquiry business scene, scene meaning can be identified with label data ACT_SPECIFY_DISCOUNT Refer to preferential inquiry to special object, the corresponding history input information of the scene can be with are as follows: I wants to buy the millet hand cheaply to give a discount Machine;A invigorating now please be recommend dynamic to have preferential eye-protecting desk lamp etc..
After-sale service business scenario can be identified with label data ACT_AFTER_SALES, which means and replace Goods, the after-sale services such as to report for repairment related, and the corresponding history input information of the scene can be with are as follows: Huawei freely plays how much 5 screens are broken; I will return goods;I will exchange goods.
The through business scenario of whole station can be identified with label data ACT_SHORT_CUT, which means specifically to take It is engaged in module, the scene corresponding history input information can be with are as follows: shopping cart;Customer service etc..
Unknown business scenario can be identified with label data UN_KNOWN, which means that this history input information does not belong to In above-mentioned all business scenarios, such history input information can be with are as follows: you guess that I wants ask you what;How to purchase by group etc..
Word segmentation module 4 is used to segment each history input information, and specifically, word segmentation module 4 is default according to dictionary The preset customized dictionary of module 1 segments each history input information, and then feature vector representation module 5 uses feature Vector indicates that each history inputs information.Wherein, feature vector includes N number of component, which corresponds respectively to custom words Each word in library, the value of N number of component respectively indicate the frequency that each word occurs in the history input information through segmenting It is secondary.For example, customized dictionary is { millet, white, delivery is beautiful, air-conditioning, electronic cigarette }, including 6 words, history input letter Breath is " air-conditioning for giving the beauty of my parcel received white ", then history input information feature vector can be expressed as [0,1,0,1,1, 0]。
Further, it is contemplated that the quantity of word is limited in customized dictionary, and there are history input information not to include The situation of word in customized dictionary, feature vector can add N+1 component at this time, if the value of N number of component is 0, N The value of+1 component is 1;Otherwise, the value of the N+1 component is 0.For example, customized dictionary is { millet, white, delivery, beauty , air-conditioning, electronic cigarette }, it is " to the air-conditioning of the beauty of my parcel received white " that then the history inputs information feature that history, which inputs information, Vector can be expressed as [0,1,0,1,1,0,0], and it is " you may I ask well either with or without the salt that iodine is not added " that history, which inputs information, then this is gone through History input information feature vector can be expressed as [0,0,0,0,0,0,1].
Training module 6 is used to training data inputting support vector machines, and specifically, training module 6 adds labeling module 3 Label data and the obtained feature vector of feature vector representation module 5 as training data input support vector machines, to support Prediction model is obtained after vector machine training, which is used to input information prediction target service scene according to user.
Specifically, in support vector machines in the present embodiment, for given training dataset { (X(1),y(1)),(X(2),y(2)),…,(X(n),y(n)) (wherein X(i)It can indicate feature vector, y(i)Can indicate label data) and separate super flat Face defines separating hyperplane about sample point (X(i),y(i)) function interval are as follows: γi=y(i)(W.X(i)+ b), definition separates super Plane concentrates the function interval of all sample points about separating hyperplane is divided between the function of training dataset about training data Minimum value.The correctness and certainty that function interval can be predicted with presentation class, but in separating hyperplane, if it is joined Number W and b expands as original 2 times simultaneously, this is for separating hyperplane, and there is no any changes, but for being spaced letter Number, expands as original 2 times, in order to solve the problems, can introduce geometry interval.
A determining value in order to make interval, can parameter W to separating hyperplane add certain constraints, such as Normalization, works as W.X(i)+ b and y(i)When jack per line, indicate that prediction is correct, sample to the geometry interval S between separating hyperplane can be with It indicates are as follows:
Separating hyperplane maximum for geometry interval, each sample need to meet:
In function interval, the value at function interval has no effect on the solution of optimal problem.
Meet the mathematical formulae of margin maximization above, condition is harsher, it is desirable that all samples are all linear separabilities , but in a practical situation, data set is difficult to meet such condition, for a data set, wherein there are the spies of part Dissimilarity, but by these distinguished points remove after, remaining most sample at set be linear separability.
Solution to the minimum optimization problem of belt restraining can be converted into the solution of separating hyperplane, in the present embodiment In, asking for unconstrained optimization problem is translated into using lagrange's method of multipliers for the optimization problem solving of belt restraining Solution.
Separable problem nonlinear for one, can will be non-linear by the way of kernel function (such as gaussian kernel function) The problem of be converted to linear problem.
By the duality of Lagrange, the optimization problem of original belt restraining is converted to its antithesis and asked by the present embodiment Topic, and by the solution to dual problem, the optimal solution of dual problem is obtained, the optimal solution of primal problem is finally obtained.Sequence The thought of minimum optimization algorithm is that a big problem is divided into a series of small problems, by asking these subproblems Solution, reaches the solution procedure to dual problem.
The present embodiment obtains history input information on the basis of the input journal of existing user, inputs to each history Information adds the label data of identification service scene, and indicates each historical data with feature vector, and then each history inputs The feature vector and label data of information form one group of training data, and obtained multiple groups training data is inputted support vector machines After training, the available prediction model according to user's input prediction target service scene.And then face ever-expanding industry Business scene, without correspondingly writing more and more canonical matching templates, compared to traditional Stamford canonical matching template, this reality Apply the wider using more flexible and coverage of example.
Embodiment 8
The present embodiment provides a kind of business scenario categorizing system, Fig. 6 shows the structural schematic diagram of the present embodiment.Such as Fig. 6 Shown, the business scenario categorizing system of the present embodiment specifically includes: the business of voice messaging input module 7 and the offer of embodiment 7 The forecasting system 10 of scene.Wherein, voice messaging input module 7 is obtained for obtaining user speech input information, forecasting system 10 The prediction model obtained is used to input information prediction target service scene according to user speech.
Specifically, in human-computer interaction interface, user can pass to expression certainly by way of voice with terminal device ditch Oneself demand.The input journal of traverse user is it is found that a word that user interacts with terminal device is often concentrated and expresses use The demand at family, thus in the present embodiment can extract a word that user interacts with terminal device as prediction model Input, and then prediction model can export the target service scene of the desired browsing of user of prediction according to the input content.Therefore, Business scenario categorizing system provided in this embodiment can suit the demand of user, improve the Experience Degree of user.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and Modification each falls within protection scope of the present invention.

Claims (16)

1. a kind of prediction technique of business scenario, which is characterized in that the prediction technique includes:
Customized dictionary is preset, the customized dictionary includes N number of word, and wherein N is positive integer;
Obtain the history input information of all users;
Information is inputted for each history and adds label data, and the label data includes multiple business scenarios;
Each history input information is segmented;
Each history input information is indicated with feature vector, and described eigenvector includes N number of component, N number of component point Not Dui Yingyu each word in the customized dictionary, the value of N number of component respectively indicate each word through point The frequency occurred in the history input information of word;
Training data is inputted into support vector machines, the training data includes described eigenvector and the label data, training Prediction model is obtained, the prediction model is used to input information prediction target service scene according to user.
2. the prediction technique of business scenario as described in claim 1, which is characterized in that obtain the history input letter of all users The step of breath, specifically includes:
The input journal for obtaining and cleaning according to preset rules all users obtains history input information.
3. the prediction technique of business scenario as claimed in claim 2, which is characterized in that the input journal includes voice input Log.
4. the prediction technique of business scenario as described in claim 1, which is characterized in that the business scenario include it is following at least It is a kind of:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential inquiry industry Scene, after-sale service business scenario, the whole station of being engaged in through business scenario, unknown business scenario.
5. the prediction technique of business scenario as described in claim 1, which is characterized in that described eigenvector further includes N+1 A component, if the value of N number of component is 0, the value of the N+1 component is 1;Otherwise, the N+1 component Value is 0.
6. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor is realized as described in any one of claim 1-5 when executing the computer program Business scenario prediction technique.
7. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Processor realizes the prediction technique of business scenario according to any one of claims 1 to 5 when executing.
8. a kind of business scenario classification method, which is characterized in that the business scenario classification method includes:
Prediction model is obtained using the prediction technique of business scenario according to any one of claims 1 to 5;
It obtains user speech and inputs information;
Information prediction target service scene is inputted according to the user speech using the prediction model.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes business scenario as claimed in claim 8 when executing the computer program Classification method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Business scenario classification method as claimed in claim 8 is realized when being executed by processor.
11. a kind of forecasting system of business scenario, which is characterized in that the forecasting system includes:
Dictionary presetting module, for presetting customized dictionary, the customized dictionary includes N number of word, and wherein N is positive integer;
Historical information obtains module, and the history for obtaining all users inputs information;
Labeling module adds label data for inputting information for each history, and the label data includes multiple business Scene;
Word segmentation module, for being segmented to each history input information;
Feature vector representation module, for indicating that each history inputs information with feature vector, described eigenvector includes N A component, N number of component correspond respectively to each word in the customized dictionary, and the value of N number of component distinguishes table Show the frequency that each word occurs in the history input information through segmenting;
Training module, for training data to be inputted support vector machines, the training data includes described eigenvector and described Label data, training obtain prediction model, and the prediction model is used to input information prediction target service scene according to user.
12. such as the forecasting system of claim 11 business scenario, which is characterized in that the historical information obtains module and is also used to obtain The input journal for taking and cleaning according to preset rules all users obtains history input information.
13. such as the forecasting system of claim 12 business scenario, which is characterized in that the input journal includes voice input day Will.
14. such as the forecasting system of claim 11 business scenario, which is characterized in that the business scenario includes following at least one Kind:
Special object inquiry business scene, order inquiries business scenario obscure preferential inquiry business scene, specific preferential inquiry industry Scene, after-sale service business scenario, the whole station of being engaged in through business scenario, unknown business scenario.
15. such as the forecasting system of claim 11 business scenario, which is characterized in that described eigenvector further includes N+1 points Amount, if the value of N number of component is 0, the value of the N+1 component is 1;Otherwise, the value of the N+1 component is 0。
16. a kind of business scenario categorizing system, which is characterized in that the business scenario categorizing system includes voice messaging input mould The forecasting system of block and the business scenario as described in any one of claim 11-15;
The voice messaging input module is for obtaining user speech input information;
The prediction model is used to input information prediction target service scene according to the user speech.
CN201810160035.XA 2018-02-26 2018-02-26 Method, system, equipment and the storage medium of business scenario prediction, classification Pending CN110197188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810160035.XA CN110197188A (en) 2018-02-26 2018-02-26 Method, system, equipment and the storage medium of business scenario prediction, classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810160035.XA CN110197188A (en) 2018-02-26 2018-02-26 Method, system, equipment and the storage medium of business scenario prediction, classification

Publications (1)

Publication Number Publication Date
CN110197188A true CN110197188A (en) 2019-09-03

Family

ID=67750774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810160035.XA Pending CN110197188A (en) 2018-02-26 2018-02-26 Method, system, equipment and the storage medium of business scenario prediction, classification

Country Status (1)

Country Link
CN (1) CN110197188A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111445139A (en) * 2020-03-26 2020-07-24 平安普惠企业管理有限公司 Business process simulation method and device, storage medium and electronic equipment
CN111613212A (en) * 2020-05-13 2020-09-01 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN111882224A (en) * 2020-07-30 2020-11-03 上加下信息技术成都有限公司 Method and device for classifying consumption scenes
CN112749079A (en) * 2019-10-31 2021-05-04 中国移动通信集团浙江有限公司 Defect classification method and device for software test and computing equipment
CN113362124A (en) * 2020-03-06 2021-09-07 北京沃东天骏信息技术有限公司 Order processing method, device, equipment and computer readable storage medium
CN113781062A (en) * 2020-08-03 2021-12-10 北京沃东天骏信息技术有限公司 User label display method and device
CN115102871A (en) * 2022-05-20 2022-09-23 浙江大学 Energy internet control terminal service processing method based on service feature vector

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111933A (en) * 2013-04-17 2014-10-22 阿里巴巴集团控股有限公司 Method and device for acquiring business object label and building training model
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
US20170124071A1 (en) * 2015-10-30 2017-05-04 Alibaba Group Holding Limited Method and system for statistics-based machine translation
CN106997341A (en) * 2017-03-22 2017-08-01 山东大学 A kind of innovation scheme matching process, device, server and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111933A (en) * 2013-04-17 2014-10-22 阿里巴巴集团控股有限公司 Method and device for acquiring business object label and building training model
US20170124071A1 (en) * 2015-10-30 2017-05-04 Alibaba Group Holding Limited Method and system for statistics-based machine translation
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN106997341A (en) * 2017-03-22 2017-08-01 山东大学 A kind of innovation scheme matching process, device, server and system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749079A (en) * 2019-10-31 2021-05-04 中国移动通信集团浙江有限公司 Defect classification method and device for software test and computing equipment
CN112749079B (en) * 2019-10-31 2023-12-26 中国移动通信集团浙江有限公司 Defect classification method and device for software test and computing equipment
CN113362124A (en) * 2020-03-06 2021-09-07 北京沃东天骏信息技术有限公司 Order processing method, device, equipment and computer readable storage medium
CN111445139A (en) * 2020-03-26 2020-07-24 平安普惠企业管理有限公司 Business process simulation method and device, storage medium and electronic equipment
CN111613212A (en) * 2020-05-13 2020-09-01 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN111613212B (en) * 2020-05-13 2023-10-31 携程旅游信息技术(上海)有限公司 Speech recognition method, system, electronic device and storage medium
CN111882224A (en) * 2020-07-30 2020-11-03 上加下信息技术成都有限公司 Method and device for classifying consumption scenes
CN113781062A (en) * 2020-08-03 2021-12-10 北京沃东天骏信息技术有限公司 User label display method and device
CN115102871A (en) * 2022-05-20 2022-09-23 浙江大学 Energy internet control terminal service processing method based on service feature vector
CN115102871B (en) * 2022-05-20 2023-10-03 浙江大学 Service feature vector-based energy internet control terminal service processing method

Similar Documents

Publication Publication Date Title
CN110197188A (en) Method, system, equipment and the storage medium of business scenario prediction, classification
CN109190044B (en) Personalized recommendation method, device, server and medium
US10025980B2 (en) Assisting people with understanding charts
WO2020125445A1 (en) Classification model training method, classification method, device and medium
CN107633007B (en) Commodity comment data tagging system and method based on hierarchical AP clustering
US20170200205A1 (en) Method and system for analyzing user reviews
CN103164463B (en) Method and device for recommending labels
US9286380B2 (en) Social media data analysis system and method
CN104239331B (en) A kind of method and apparatus for realizing comment search engine sequence
US20180053234A1 (en) Description information generation and presentation systems, methods, and devices
US11741094B2 (en) Method and system for identifying core product terms
CN110674620A (en) Target file generation method, device, medium and electronic equipment
US11055735B2 (en) Creating meta-descriptors of marketing messages to facilitate in delivery performance analysis, delivery performance prediction and offer selection
CN112528638B (en) Abnormal object identification method and device, electronic equipment and storage medium
CN110633398A (en) Method for confirming central word, searching method, device and storage medium
CN107665221A (en) The sorting technique and device of keyword
CN113065069B (en) Bidirectional employment recommendation method and device based on data portrait
CN115759014A (en) Dynamic intelligent analysis method and system and electronic equipment
CN109933793B (en) Text polarity identification method, device and equipment and readable storage medium
US20200104901A1 (en) Information processing apparatus
CN107688600B (en) Knowledge point mining method and device
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
EP4283496A1 (en) Techniques for automatic filling of an input form to generate a listing
Lo et al. An emperical study on application of big data analytics to automate service desk business process
CN113127597A (en) Processing method and device for search information and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination