CN110347840A - Complain prediction technique, system, equipment and the storage medium of text categories - Google Patents

Complain prediction technique, system, equipment and the storage medium of text categories Download PDF

Info

Publication number
CN110347840A
CN110347840A CN201910650261.0A CN201910650261A CN110347840A CN 110347840 A CN110347840 A CN 110347840A CN 201910650261 A CN201910650261 A CN 201910650261A CN 110347840 A CN110347840 A CN 110347840A
Authority
CN
China
Prior art keywords
history
text data
data
classification
complaint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910650261.0A
Other languages
Chinese (zh)
Other versions
CN110347840B (en
Inventor
杨森
罗超
胡泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN201910650261.0A priority Critical patent/CN110347840B/en
Publication of CN110347840A publication Critical patent/CN110347840A/en
Application granted granted Critical
Publication of CN110347840B publication Critical patent/CN110347840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Abstract

The invention discloses prediction technique, system, equipment and the storage medium of a kind of complaint text categories of OTA platform, the prediction technique includes the history complaint text data for obtaining OTA platform;It complains text data to be clustered history, mark the complaint classification that processing every part of history of acquisition complains text data;Obtain history dimension data and history solid data;Establish the prediction model for predicting complaint classification belonging to complaint text data;It obtains target and complains text data;Target is complained into text data input prediction model, obtains the probability value that target complains text data to belong to every kind of complaint classification;Determine that target complains target belonging to text data to complain classification according to probability value.The present invention improves the precision of text classification, and realization automatically sorts out customer complaint content, relevant persons in charge is handled in the complaint classification being responsible in time oneself, also saves a large amount of manpower while improving user experience.

Description

Complain prediction technique, system, equipment and the storage medium of text categories
Technical field
The present invention relates to technical field of data processing, in particular to a kind of prediction side of the complaint text categories of OTA platform Method, system, equipment and storage medium.
Background technique
In OTA (Online Travel Agency, online tourism) platform, need to carry out classification processing to complaint text It determines its corresponding complaint classification, and then takes different solutions to improve to promote use according to different complaint classifications Family experience.
Currently, mostly using RNN (Recognition with Recurrent Neural Network) or the CNN of word-based insertion (volume greatly in text classification scene Product neural network) algorithm.However, although the text classification algorithm based on RNN can effectively be built for text context Mould captures context semanteme, but later moment in time needs to rely on the calculated result of previous moment, that is, can not achieve parallel place Reason, therefore generally require the training time grown very much.The algorithm of the CNN of word-based insertion is often because of OOV (unregistered word), spy It levies sparse and leads to model over-fitting, although the text classification algorithm based on CNN can solve problem that cannot be parallel, but be based on The text classification algorithm of CNN can only identify local text information, therefore will receive certain influence in precision aspect.
Summary of the invention
The technical problem to be solved by the present invention is in order to overcome in the prior art to complain text carry out classification processing calculation Method, which exists, is unable to parallel processing, and the training time is longer or precision is unsatisfactory for desired defect, provides a kind of complaint text of OTA platform The prediction technique, system, equipment and storage medium of this classification.
The present invention is to solve above-mentioned technical problem by following technical proposals:
The present invention provides a kind of prediction technique of the complaint text categories of OTA platform, and the prediction technique includes:
It obtains OTA platform corresponding history in history set period of time and complains text data;
It complains text data to be labeled processing the history, obtains every part of history and complain text data corresponding Complain classification;
Obtain history dimension data corresponding with history complaint text data and history entity number in the OTA platform According to;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
Using history complaint text data, the history dimension data and the history solid data as input, with The history complains the corresponding history of text data to complain classification as output, establishes for predicting to complain text data institute The prediction model of the complaint classification of category;
It obtains target and complains text data;
It complains text data to input the prediction model target, obtains the target and text data is complained to belong to often Kind complains the probability value of classification;
Determine that the target complains target belonging to text data to complain classification according to the probability value.
Preferably, the acquisition OTA platform is in history set period of time the step of corresponding history complaint text data Later, before the step of complaining text data to be labeled processing the history further include:
Text data is complained to carry out clustering processing the history using clustering algorithm;
It is described to complain text data to be labeled processing the history, it obtains every part of history and complains text data pair The step of complaint classification answered includes:
The history for belonging to same cluster result complaint text data is labeled as the same complaint classification.
Preferably, described complain text data, the history dimension data and the history solid data with the history As input, complains the corresponding history of text data to complain classification as output using the history, establish for predicting to throw Before the step of telling the prediction model of complaint classification belonging to text data further include:
To mark, treated that the history complains text data to pre-process.
Preferably, described determine that the target complains target belonging to text data to complain classification according to the probability value Step includes:
Determine that corresponding complaint classification is that the target complains the mesh belonging to text data when the probability value maximum Mark complains classification.
Preferably, the acquisition OTA platform is in history set period of time the step of corresponding history complaint text data Before further include:
It complains text data to carry out pre-training the history using a kind of BERT (natural language processing algorithm) algorithm to obtain Take language model;
It is described to complain text data, the history dimension data and the history solid data as defeated using the history Enter, complains the corresponding history of text data to complain classification as output using the history, establish for predicting to complain text The step of prediction model of complaint classification belonging to data includes:
BERT algorithm is used to complain text data, the history dimension data and the history solid data with the history As input, complains the corresponding history of text data to complain classification as output using the history, be based on the language mould Type is established when training by way of covering the part solid data at random for predicting to complain throwing belonging to text data Tell the prediction model of classification.
The present invention also provides a kind of forecasting system of the complaint text categories of OTA platform, the forecasting system includes history Text data obtains module, mark processing module, dimension and solid data and obtains module, surveys model building module, target text Data acquisition module, probability value obtain module and target complains classification to obtain module;
The history text data acquisition module is thrown for obtaining OTA platform corresponding history in history set period of time Tell text data;
The mark processing module is used to complain text data to be labeled processing the history, goes through described in every part of acquisition History complains the corresponding complaint classification of text data;
The dimension and solid data obtain module and complain text data with the history for obtaining in the OTA platform Corresponding history dimension data and history solid data;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
The prediction model establishes module for complaining text data, the history dimension data and described with the history History solid data complains the corresponding history of text data to complain classification as output as input, using the history, builds Found the prediction model for predicting complaint classification belonging to complaint text data;
The target text data acquisition module complains text data for obtaining target;
The probability value obtains module and is used to complaining the target into the text data input prediction model, described in acquisition Target complains text data to belong to every kind of probability value for complaining classification;
The target complains classification to obtain module and is used to determine that the target complains text data institute according to the probability value The target of category complains classification.
Preferably, the forecasting system further includes cluster module;
The cluster module is used to complain text data to carry out clustering processing the history using clustering algorithm;
The mark processing module is used to complain text data to be labeled as together the history for belonging to same cluster result The one complaint classification.
Preferably, the forecasting system further includes preprocessing module;
The preprocessing module is used for that treated that the history complains text data to pre-process to mark.
Preferably, the target complains classification to obtain module for determining corresponding complaint classification when the probability value maximum The target belonging to text data is complained to complain classification for the target.
Preferably, the forecasting system further includes that language model obtains module;
The language model obtains module and is used to complain text data to carry out pre-training the history using BERT algorithm Obtain language model;
The prediction model establishes module for using BERT algorithm to complain text data, history dimension with the history Degree evidence and the history solid data complain the corresponding history of text data to complain classification as input with the history As output, it is based on the language model, is used for when training by way of covering the part solid data at random to establish The prediction model of complaint classification belonging to text data is complained in prediction.
The present invention also provides a kind of electronic equipment, including memory, processor and storage on a memory and can handled The computer program run on device, which is characterized in that the processor realizes above-mentioned OTA platform when executing computer program Complain the prediction technique of text categories.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer journey The step of prediction technique of the complaint text categories of above-mentioned OTA platform is realized when sequence is executed by processor.
The positive effect of the present invention is that:
In the present invention, language model is obtained by pre-training, improved BERT algorithm is then used to complain text with history Data, history dimension data and history solid data complain classification as output as input, using history, are based on the language model Establish prediction model;Obtaining target using prediction model complains text data to belong to the probability value of every kind of complaint classification, and selects The highest complaint classification of probability value complains target belonging to text data to complain classification as target, improves the essence of prediction model Degree improves the accuracy of text classification, and realization automatically sorts out customer complaint content, and relevant persons in charge is existed The complaint classification being responsible at the first time oneself is handled, and also saves a large amount of people while improving user experience Power, to improve whole work efficiency.
Detailed description of the invention
Fig. 1 is the flow chart of the prediction technique of the complaint text categories of the OTA platform of the embodiment of the present invention 1.
Fig. 2 is the flow chart of the prediction technique of the complaint text categories of the OTA platform of the embodiment of the present invention 2.
Fig. 3 is the module diagram of the forecasting system of the complaint text categories of the OTA platform of the embodiment of the present invention 3.
Fig. 4 is the module diagram of the forecasting system of the complaint text categories of the OTA platform of the embodiment of the present invention 4.
Fig. 5 is the electronic equipment of the prediction technique of the complaint text categories of the realization OTA platform in the embodiment of the present invention 5 Structural schematic diagram.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality It applies among a range.
Embodiment 1
As shown in Figure 1, the prediction technique of the complaint text categories of the OTA platform of the present embodiment includes:
S101, OTA platform corresponding history complaint text data in history set period of time is obtained;
S102, it complains text data to be labeled processing history, obtains every part of history and complain the corresponding throwing of text data Tell classification;
S103, history dimension data corresponding with history complaint text data and history solid data in OTA platform are obtained;
Wherein, history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
History solid data is the data for characterizing the proper noun in hotel field;
Specifically, history dimension data includes order information, hotel information and user information etc., and wherein order information includes But it is not limited to the corresponding means of payment of order, conclusion of the business state, order type;Hotel information includes but is not limited to HOTEL FACILITIES equipment Title, information of real estate etc.;User information includes but is not limited to user's name, gender etc..
History solid data includes the proper noun in hotel field, such as advance payment, big double bed, is dodged.By history entity Data are stored with the format of dictionary.
S104, it complains text data, history dimension data and history solid data as input using history, is complained with history The corresponding history of text data complains classification as output, establishes for predicting to complain the pre- of complaint classification belonging to text data Survey model;
S105, target complaint text data is obtained;
S106, target is complained into text data input prediction model, obtains target and text data is complained to belong to every kind of complaint The probability value of classification;
S107, determine that target complains target belonging to text data to complain classification according to probability value.
Data screening is all made of the mode of stochastical sampling to guarantee the same distribution of data in the present embodiment.
In the present embodiment, complain text data, history dimension data and history solid data as input using history, to go through History complains classification to establish prediction model as output;Obtaining target using prediction model complains text data to belong to every kind of complaint class Other probability value, and the highest complaint classification of select probability value complains target belonging to text data to complain classification as target, The precision of text classification is improved, realization automatically sorts out customer complaint content, enables relevant persons in charge first The complaint classification that time is responsible for oneself is handled, and also saves a large amount of manpower while improving user experience.
Embodiment 2
As shown in Fig. 2, the prediction techniques of the complaint text categories of the OTA platform of the present embodiment is to embodiment 1 into one Step is improved, specifically:
After step S101, before step S102 further include:
S1020, text data is complained to carry out clustering processing history using clustering algorithm;
Wherein, clustering algorithm includes but is not limited to K-MEANS clustering algorithm (k means clustering algorithm), DBSCAN cluster calculation Method (a kind of density-based algorithms), mean shift clustering algorithm, hierarchical clustering algorithm and synthesis cluster.
Step S102 includes:
S1021, the history for belonging to same cluster result complaint text data is labeled as same complaint classification.
Specifically, during mark, relevant staff is selected in conjunction with business demand and cluster result, history is thrown Tell that text data is labeled.
After step S103, before step S104 further include:
S1040, text data is complained to pre-process mark treated history.
Specifically, pretreatment include but is not limited to by full-shape be converted to half-angle, by traditional font be converted to it is simplified, will capitalization convert For small letter, remove stop words and low-frequency word, filtering null value, filtering sensitive word.
In addition, before step S101 further include:
S1010, text data is complained to carry out pre-training acquisition language model history using BERT algorithm.
Step S104 includes:
S1041, BERT algorithm is used to complain text data, history dimension data and history solid data as defeated using history Enter, complains the corresponding history of text data to complain classification as output using history, language model is based on, by covering at random when training The mode of cover solid data come establish for predict complain text data belonging to complaint classification prediction model.
I.e. by being finely adjusted on the basis of the language model obtained with training, may be implemented compared to training from the beginning Faster convergence, while phase can be reached using the labeled data of less data volume, dimension and solid data in classification layer To better nicety of grading and effect;Word is specifically replaced with by phase by matching entities dictionary at mask (random to cover) The solid data answered can prevent label from revealing in this way, can establish the higher prediction model of precision.
In view of artificial labeled data can there is a certain error, text data can be complained to use history and be predicted Model predicted, then the history by most probable value in the section 0.5-0.7 complain the corresponding complaint classification of text data into Pedestrian's work marks again, re -training model, until most probable value, which is greater than 0.7, stops repetitive exercise, guarantees to predict with this The precision of model.
Step S107 is specifically included:
S1071, determine that corresponding complaint classification is that target complains target belonging to text data to complain when probability value maximum Classification.
In the present embodiment, language model is obtained by pre-training, improved BERT algorithm is then used to complain text with history Notebook data, history dimension data and history solid data complain classification as output as input, using history, are based on the language mould Type establishes prediction model;Obtaining target using prediction model complains text data to belong to the probability value of every kind of complaint classification, and selects The highest complaint classification of probability value, which is selected, as target complains target belonging to text data to complain classification, improves text classification Precision, realization automatically sort out customer complaint content, and relevant persons in charge is responsible for oneself in first time Complaint classification handled, also save a large amount of manpower while improving user experience, improve treatment effeciency.
Embodiment 3
As shown in figure 3, the forecasting system of the complaint text categories of the OTA platform of the present embodiment includes that history text data obtain Modulus block 1, mark processing module 2, dimension and solid data obtain module 3, survey model building module 4, target text data and obtain Modulus block 5, probability value obtain module 6 and target complains classification to obtain module 7.
History text data acquisition module 1 is complained for obtaining OTA platform corresponding history in history set period of time Text data;
Mark processing module 2 is used to complain text data to be labeled processing history, obtains every part of history and complains text The corresponding complaint classification of data;
Dimension and solid data obtain module 3 for obtaining history corresponding with history complaint text data in OTA platform Dimension data and history solid data;
Wherein, history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
History solid data is the data for characterizing the proper noun in hotel field;
Specifically, history dimension data includes order information, hotel information and user information etc., and wherein order information includes But it is not limited to the corresponding means of payment of order, conclusion of the business state, order type;Hotel information includes but is not limited to HOTEL FACILITIES equipment Title, information of real estate etc.;User information includes but is not limited to user's name, gender etc..
History solid data includes the proper noun in hotel field, such as advance payment, big double bed, is dodged.By history entity Data are stored with the format of dictionary.
Prediction model establishes module 4 for complaining text data, history dimension data and history solid data to make with history For input, complains the corresponding history of text data to complain classification as output using history, establish for predicting to complain text data The prediction model of affiliated complaint classification;
Target text data acquisition module 5 complains text data for obtaining target;
Probability value obtains module 6 and is used to target complaining text data input prediction model, obtains target and complains textual data According to the probability value for belonging to every kind of complaint classification;
Target complains classification to obtain module 7 and is used to determine that target complains target belonging to text data to complain according to probability value Classification.
Data screening is all made of the mode of stochastical sampling to guarantee the same distribution of data in the present embodiment.
In the present embodiment, complain text data, history dimension data and history solid data as input using history, to go through History complains classification to establish prediction model as output;Obtaining target using prediction model complains text data to belong to every kind of complaint class Other probability value, and the highest complaint classification of select probability value complains target belonging to text data to complain classification as target, The precision of text classification is improved, realization automatically sorts out customer complaint content, enables relevant persons in charge first The complaint classification that time is responsible for oneself is handled, and also saves a large amount of manpower while improving user experience.
Embodiment 4
As shown in figure 4, the forecasting systems of the complaint text categories of the OTA platform of the present embodiment is to embodiment 3 into one Step is improved, specifically:
Forecasting system further includes cluster module 8;
Cluster module 8 is used to complain text data to carry out clustering processing history using clustering algorithm;
Wherein, clustering algorithm includes but is not limited to K-MEANS clustering algorithm, DBSCAN clustering algorithm, mean shift cluster Algorithm, hierarchical clustering algorithm and synthesis cluster.
Mark processing module 2 is used to complain text data to be labeled as same complaint class the history for belonging to same cluster result Not.
Specifically, during mark, relevant staff is selected in conjunction with business demand and cluster result, history is thrown Tell that text data is labeled.
Forecasting system further includes preprocessing module 9;
Preprocessing module 9 is used to complain text data to pre-process mark treated history.
Specifically, pretreatment include but is not limited to by full-shape be converted to half-angle, by traditional font be converted to it is simplified, will capitalization convert For small letter, remove stop words and low-frequency word, filtering null value, filtering sensitive word.
Specifically, forecasting system further includes that language model obtains module 10;
Language model obtains module 10 and is used to complain text data progress pre-training to obtain language history using BERT algorithm Say model;
Prediction model establishes module 4 for using BERT algorithm to complain text data, history dimension data with history and go through Historical facts volume data complains the corresponding history of text data to complain classification as output as input, using history, is based on language model, It is established by way of covering part solid data at random when training for predicting to complain complaint classification belonging to text data Prediction model.
I.e. by being finely adjusted on the basis of the language model obtained with training, may be implemented compared to training from the beginning Faster convergence, while phase can be reached using the labeled data of less data volume, dimension and solid data in classification layer To better nicety of grading and effect;Word is specifically replaced with by corresponding entity number by matching entities dictionary in mask According to can prevent label from revealing in this way, can establish the higher prediction model of precision.
In view of artificial labeled data can there is a certain error, text data can be complained to use history and be predicted Model predicted, then the history by most probable value in the section 0.5-0.7 complain the corresponding complaint classification of text data into Pedestrian's work marks again, re -training model, until most probable value, which is greater than 0.7, stops repetitive exercise, guarantees to predict with this The precision of model.
Target complains classification to obtain module 7, and for determining, corresponding complaint classification is that target complains text when probability value maximum Target belonging to data complains classification.
In the present embodiment, language model is obtained by pre-training, improved BERT algorithm is then used to complain text with history Notebook data, history dimension data and history solid data complain classification as output as input, using history, are based on the language mould Type establishes prediction model;Obtaining target using prediction model complains text data to belong to the probability value of every kind of complaint classification, and selects The highest complaint classification of probability value, which is selected, as target complains target belonging to text data to complain classification, improves text classification Precision, realization automatically sort out customer complaint content, and relevant persons in charge is responsible for oneself in first time Complaint classification handled, also save a large amount of manpower while improving user experience.
Embodiment 5
Fig. 5 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention 5 provides.Electronic equipment include memory, Processor and storage are on a memory and the computer program that can run on a processor, processor realize implementation when executing program The prediction technique of the complaint text categories of OTA platform in example 1 or 2 in any one embodiment.The electronic equipment 30 that Fig. 5 is shown is only Only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, electronic equipment 30 can be showed in the form of universal computing device, such as it can set for server It is standby.The component of electronic equipment 30 can include but is not limited to: at least one above-mentioned processor 31, above-mentioned at least one processor 32, the bus 33 of different system components (including memory 32 and processor 31) is connected.
Bus 33 includes data/address bus, address bus and control bus.
Memory 32 may include volatile memory, such as random access memory (RAM) 321 and/or cache Memory 322 can further include read-only memory (ROM) 323.
Memory 32 can also include program/utility 325 with one group of (at least one) program module 324, this The program module 324 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.
Processor 31 by operation storage computer program in memory 32, thereby executing various function application and The prediction side of the complaint text categories of OTA platform in data processing, such as the embodiment of the present invention 1 or 2 in any one embodiment Method.
Electronic equipment 30 can also be communicated with one or more external equipments 34 (such as keyboard, sensing equipment etc.).It is this Communication can be carried out by input/output (I/O) interface 35.Also, the equipment 30 that model generates can also pass through Network adaptation Device 36 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) logical Letter.As shown in figure 5, the other modules for the equipment 30 that network adapter 36 is generated by bus 33 and model communicate.It should be understood that Although not shown in the drawings, the equipment 30 that can be generated with binding model uses other hardware and/or software module, including but unlimited In: microcode, device driver, redundant processor, external disk drive array, RAID (disk array) system, magnetic tape drive Device and data backup storage system etc..
It should be noted that although being referred to several units/modules or subelement/mould of electronic equipment in the above detailed description Block, but it is this division be only exemplary it is not enforceable.In fact, embodiment according to the present invention, is retouched above The feature and function for two or more units/modules stated can embody in a units/modules.Conversely, above description A units/modules feature and function can with further division be embodied by multiple units/modules.
Embodiment 6
A kind of computer readable storage medium is present embodiments provided, computer program is stored thereon with, program is processed The step in the prediction technique of the complaint text categories of the OTA platform in embodiment 1 or 2 in any one embodiment is realized when device executes Suddenly.
Wherein, what readable storage medium storing program for executing can use more specifically can include but is not limited to: portable disc, hard disk, random Access memory, read-only memory, erasable programmable read only memory, light storage device, magnetic memory device or above-mentioned times The suitable combination of meaning.
In possible embodiment, the present invention is also implemented as a kind of form of program product comprising program generation Code, when program product is run on the terminal device, program code is appointed for executing terminal device in realization embodiment 1 or 2 Step in the prediction technique of the complaint text categories of OTA platform in an embodiment of anticipating.
Wherein it is possible to be write with any combination of one or more programming languages for executing program of the invention Code, program code can be executed fully on a user device, partly execute on a user device, is independent as one Software package executes, part executes on a remote device or executes on a remote device completely on a user device for part.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and Modification each falls within protection scope of the present invention.

Claims (12)

1. a kind of prediction technique of the complaint text categories of OTA platform, which is characterized in that the prediction technique includes:
It obtains OTA platform corresponding history in history set period of time and complains text data;
It complains text data to be labeled processing the history, obtains every part of history and complain the corresponding complaint of text data Classification;
Obtain history dimension data corresponding with history complaint text data and history solid data in the OTA platform;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
Complain text data, the history dimension data and the history solid data as input using the history, with described History complains the corresponding history of text data to complain classification as output, establishes for predicting to complain belonging to text data Complain the prediction model of classification;
It obtains target and complains text data;
It complains text data to input the prediction model target, obtains the target and text data is complained to belong to every kind of throwing Tell the probability value of classification;
Determine that the target complains target belonging to text data to complain classification according to the probability value.
2. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that the acquisition OTA After platform the step of corresponding history complains text data in history set period of time, text data is complained to the history Before the step of being labeled processing further include:
Text data is complained to carry out clustering processing the history using clustering algorithm;
It is described to complain text data to be labeled processing the history, it obtains every part of history and complains text data corresponding Complain classification the step of include:
The history for belonging to same cluster result complaint text data is labeled as the same complaint classification.
3. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that described with described History complains text data, the history dimension data and the history solid data as input, complains text with the history The corresponding history of notebook data complains classification as output, establishes for predicting to complain complaint classification belonging to text data Before the step of prediction model further include:
To mark, treated that the history complains text data to pre-process.
4. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that described according to institute It states probability value and determines that the step of target complains target belonging to text data to complain classification includes:
Determine that corresponding complaint classification is that the target complains the target belonging to text data to throw when the probability value maximum Tell classification.
5. the prediction technique of the complaint text categories of OTA platform as described in claim 1, which is characterized in that the acquisition OTA Before platform the step of corresponding history complains text data in history set period of time further include:
It complains text data to carry out pre-training the history using BERT algorithm and obtains language model;
It is described to complain text data, the history dimension data and the history solid data as inputting using the history, with The history complains the corresponding history of text data to complain classification as output, establishes for predicting to complain text data institute The step of prediction model of the complaint classification of category includes:
Use BERT algorithm using the history complain text data, the history dimension data and the history solid data as Input complains the corresponding history of text data to complain classification as output, is based on the language model, instruction using the history It is established by way of covering the part solid data at random when practicing for predicting to complain complaint class belonging to text data Other prediction model.
6. a kind of forecasting system of the complaint text categories of OTA platform, which is characterized in that the forecasting system includes history text Data acquisition module, mark processing module, dimension and solid data obtain module, survey model building module, target text data Obtain module, probability value obtains module and target complains classification to obtain module;
The history text data acquisition module complains text for obtaining OTA platform corresponding history in history set period of time Notebook data;
The mark processing module is used to complain text data to be labeled processing the history, obtains every part of history and throws Tell the corresponding complaint classification of text data;
The dimension and solid data obtain module and complain text data corresponding with the history for obtaining in the OTA platform History dimension data and history solid data;
Wherein, the history dimension data is for characterizing user, order and/or the multi-dimensional data in hotel;
The history solid data is the data for characterizing the proper noun in hotel field;
The prediction model establishes module for complaining text data, the history dimension data and the history with the history Solid data complains the corresponding history of text data to complain classification as output as input, using the history, establishes and uses The prediction model of complaint classification belonging to text data is complained in prediction;
The target text data acquisition module complains text data for obtaining target;
The probability value obtains module and is used to complain text data to input the prediction model target, obtains the target Text data is complained to belong to the probability value of every kind of complaint classification;
The target complains classification to obtain module and is used to determine that the target is complained belonging to text data according to the probability value Target complains classification.
7. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the prediction system System further includes cluster module;
The cluster module is used to complain text data to carry out clustering processing the history using clustering algorithm;
The mark processing module is used to complain text data to be labeled as same institute the history for belonging to same cluster result State complaint classification.
8. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the prediction system System further includes preprocessing module;
The preprocessing module is used for that treated that the history complains text data to pre-process to mark.
9. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the target is thrown Tell that classification obtains module and is used to determine that corresponding complaint classification to be that the target complains text data institute when the probability value maximum The target belonged to complains classification.
10. the forecasting system of the complaint text categories of OTA platform as claimed in claim 6, which is characterized in that the prediction system System further includes that language model obtains module;
The language model obtains module and is used to complain text data to carry out pre-training acquisition the history using BERT algorithm Language model;
The prediction model establishes module for using BERT algorithm to complain text data, the history number of dimensions with the history According to the history solid data as inputting, using the history complain the corresponding history of text data complain classification as Output is based on the language model, is established by way of covering the part solid data at random for predicting when training Complain the prediction model of complaint classification belonging to text data.
11. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes OTA of any of claims 1-5 when executing computer program The prediction technique of the complaint text categories of platform.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The prediction technique of the complaint text categories of OTA platform of any of claims 1-5 is realized when being executed by processor Step.
CN201910650261.0A 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category Active CN110347840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910650261.0A CN110347840B (en) 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910650261.0A CN110347840B (en) 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category

Publications (2)

Publication Number Publication Date
CN110347840A true CN110347840A (en) 2019-10-18
CN110347840B CN110347840B (en) 2023-06-13

Family

ID=68178920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910650261.0A Active CN110347840B (en) 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category

Country Status (1)

Country Link
CN (1) CN110347840B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110930022A (en) * 2019-11-20 2020-03-27 携程计算机技术(上海)有限公司 Hotel static information detection method and system, electronic equipment and storage medium
CN111192160A (en) * 2019-12-17 2020-05-22 山大地纬软件股份有限公司 Power public opinion monitoring method and system based on multi-fractal optimization
CN111553817A (en) * 2020-04-24 2020-08-18 北京北大软件工程股份有限公司 Analysis method and system for goodness of fit of complaint reporting case and treatment department
CN112288446A (en) * 2020-10-28 2021-01-29 中国联合网络通信集团有限公司 Method and device for calculating complaint and claim
CN112925911A (en) * 2021-02-25 2021-06-08 平安普惠企业管理有限公司 Complaint classification method based on multi-modal data and related equipment thereof
CN113704407A (en) * 2021-08-30 2021-11-26 平安银行股份有限公司 Complaint amount analysis method, device, equipment and storage medium based on category analysis
CN113810212A (en) * 2020-06-15 2021-12-17 中国移动通信集团浙江有限公司 Root cause positioning method and device for 5G slice user complaints

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309948A (en) * 2013-05-20 2013-09-18 携程计算机技术(上海)有限公司 System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center
CN107844559A (en) * 2017-10-31 2018-03-27 国信优易数据有限公司 A kind of file classifying method, device and electronic equipment
US20180203754A1 (en) * 2017-01-17 2018-07-19 Bank Of America Corporation Individualized Channel Error Detection and Resolution
CN108573031A (en) * 2018-03-26 2018-09-25 上海万行信息科技有限公司 A kind of complaint sorting technique and system based on content
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN109684475A (en) * 2018-11-21 2019-04-26 斑马网络技术有限公司 Processing method, device, equipment and the storage medium of complaint
CN109726290A (en) * 2018-12-29 2019-05-07 咪咕数字传媒有限公司 Complain determination method and device, the computer readable storage medium of disaggregated model
CN109816399A (en) * 2019-01-07 2019-05-28 平安科技(深圳)有限公司 Complain management method, device, computer equipment and the storage medium of part
CN109858702A (en) * 2019-02-14 2019-06-07 中国联合网络通信集团有限公司 Client upgrades prediction technique, device, equipment and the readable storage medium storing program for executing complained
CN109918501A (en) * 2019-01-18 2019-06-21 平安科技(深圳)有限公司 Method, apparatus, equipment and the storage medium of news article classification
CN109982367A (en) * 2017-12-28 2019-07-05 中国移动通信集团四川有限公司 Mobile terminal Internet access customer complaint prediction technique, device, equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309948A (en) * 2013-05-20 2013-09-18 携程计算机技术(上海)有限公司 System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center
US20180203754A1 (en) * 2017-01-17 2018-07-19 Bank Of America Corporation Individualized Channel Error Detection and Resolution
CN107844559A (en) * 2017-10-31 2018-03-27 国信优易数据有限公司 A kind of file classifying method, device and electronic equipment
CN109982367A (en) * 2017-12-28 2019-07-05 中国移动通信集团四川有限公司 Mobile terminal Internet access customer complaint prediction technique, device, equipment and storage medium
CN108573031A (en) * 2018-03-26 2018-09-25 上海万行信息科技有限公司 A kind of complaint sorting technique and system based on content
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN109684475A (en) * 2018-11-21 2019-04-26 斑马网络技术有限公司 Processing method, device, equipment and the storage medium of complaint
CN109726290A (en) * 2018-12-29 2019-05-07 咪咕数字传媒有限公司 Complain determination method and device, the computer readable storage medium of disaggregated model
CN109816399A (en) * 2019-01-07 2019-05-28 平安科技(深圳)有限公司 Complain management method, device, computer equipment and the storage medium of part
CN109918501A (en) * 2019-01-18 2019-06-21 平安科技(深圳)有限公司 Method, apparatus, equipment and the storage medium of news article classification
CN109858702A (en) * 2019-02-14 2019-06-07 中国联合网络通信集团有限公司 Client upgrades prediction technique, device, equipment and the readable storage medium storing program for executing complained

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WENJING DUAN ET AL.: "Mining Online User-Generated Content: Using Sentiment Analysis Technique to Study Hotel Service Quality" *
唐雪薇: "旅游网络口碑信息特征对出游意向的影响" *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110930022A (en) * 2019-11-20 2020-03-27 携程计算机技术(上海)有限公司 Hotel static information detection method and system, electronic equipment and storage medium
CN111192160A (en) * 2019-12-17 2020-05-22 山大地纬软件股份有限公司 Power public opinion monitoring method and system based on multi-fractal optimization
CN111553817A (en) * 2020-04-24 2020-08-18 北京北大软件工程股份有限公司 Analysis method and system for goodness of fit of complaint reporting case and treatment department
CN113810212A (en) * 2020-06-15 2021-12-17 中国移动通信集团浙江有限公司 Root cause positioning method and device for 5G slice user complaints
CN112288446A (en) * 2020-10-28 2021-01-29 中国联合网络通信集团有限公司 Method and device for calculating complaint and claim
CN112288446B (en) * 2020-10-28 2023-06-06 中国联合网络通信集团有限公司 Calculation method and device for complaint and claim payment
CN112925911A (en) * 2021-02-25 2021-06-08 平安普惠企业管理有限公司 Complaint classification method based on multi-modal data and related equipment thereof
CN112925911B (en) * 2021-02-25 2022-08-12 平安普惠企业管理有限公司 Complaint classification method based on multi-modal data and related equipment thereof
CN113704407A (en) * 2021-08-30 2021-11-26 平安银行股份有限公司 Complaint amount analysis method, device, equipment and storage medium based on category analysis
CN113704407B (en) * 2021-08-30 2023-08-25 平安银行股份有限公司 Complaint volume analysis method, device, equipment and storage medium based on category analysis

Also Published As

Publication number Publication date
CN110347840B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN110347840A (en) Complain prediction technique, system, equipment and the storage medium of text categories
CN112015859B (en) Knowledge hierarchy extraction method and device for text, computer equipment and readable medium
US11868733B2 (en) Creating a knowledge graph based on text-based knowledge corpora
CN111066021A (en) Text data representation learning using random document embedding
CN110781294A (en) Training corpus refinement and incremental update
CN109299271A (en) Training sample generation, text data, public sentiment event category method and relevant device
CN112989761B (en) Text classification method and device
CN113435998B (en) Loan overdue prediction method and device, electronic equipment and storage medium
Noguti et al. Legal document classification: An application to law area prediction of petitions to public prosecution service
CN112270546A (en) Risk prediction method and device based on stacking algorithm and electronic equipment
CN110598070A (en) Application type identification method and device, server and storage medium
US11074043B2 (en) Automated script review utilizing crowdsourced inputs
US20230092274A1 (en) Training example generation to create new intents for chatbots
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
US20200173889A1 (en) Component testing plan considering distinguishable and undistinguishable components
EP4222635A1 (en) Lifecycle management for customized natural language processing
US20220100967A1 (en) Lifecycle management for customized natural language processing
CN111179055B (en) Credit line adjusting method and device and electronic equipment
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN112685374B (en) Log classification method and device and electronic equipment
CN111159370A (en) Short-session new problem generation method, storage medium and man-machine interaction device
CN112989054B (en) Text processing method and device
CN115525750A (en) Robot phonetics detection visualization method and device, electronic equipment and storage medium
CN115221323A (en) Cold start processing method, device, equipment and medium based on intention recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant