CN110347840B - Prediction method, system, equipment and storage medium for complaint text category - Google Patents

Prediction method, system, equipment and storage medium for complaint text category Download PDF

Info

Publication number
CN110347840B
CN110347840B CN201910650261.0A CN201910650261A CN110347840B CN 110347840 B CN110347840 B CN 110347840B CN 201910650261 A CN201910650261 A CN 201910650261A CN 110347840 B CN110347840 B CN 110347840B
Authority
CN
China
Prior art keywords
complaint
historical
text data
data
complaint text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910650261.0A
Other languages
Chinese (zh)
Other versions
CN110347840A (en
Inventor
杨森
罗超
胡泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN201910650261.0A priority Critical patent/CN110347840B/en
Publication of CN110347840A publication Critical patent/CN110347840A/en
Application granted granted Critical
Publication of CN110347840B publication Critical patent/CN110347840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a prediction method, a system, equipment and a storage medium of a complaint text class of an OTA platform, wherein the prediction method comprises the steps of obtaining historical complaint text data of the OTA platform; clustering and labeling the historical complaint text data to obtain complaint categories of each piece of the historical complaint text data; acquiring historical dimension data and historical entity data; establishing a prediction model for predicting a complaint category to which the complaint text data belong; acquiring target complaint text data; inputting the target complaint text data into a prediction model, and obtaining probability values of each complaint category to which the target complaint text data belong; and determining the target complaint category to which the target complaint text data belong according to the probability value. The method and the device improve the precision of text classification, realize automatic classification of the complaint content of the user, enable related responsible personnel to process the complaint categories responsible for the user in time, and save a great deal of manpower while improving the user experience.

Description

Prediction method, system, equipment and storage medium for complaint text category
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a system, equipment and a storage medium for predicting complaint text categories of an OTA platform.
Background
In an OTA (Online Travel Agency, online travel) platform, the complaint text needs to be classified to determine the corresponding complaint category, and then different solutions are adopted to improve according to different complaint categories so as to improve the user experience.
Currently, in text classification scenes, RNN (recurrent neural network) or CNN (convolutional neural network) based algorithms based on word embedding are mostly adopted. However, although the RNN-based text classification algorithm can effectively model a text context and capture context semantics, the latter moment needs to rely on the calculation result of the previous moment, that is, parallel processing cannot be realized, so that a long training time is often required. The CNN algorithm based on word embedding often causes model overfitting due to OOV (unknown words) and sparse features, and the CNN text classification algorithm can solve the problem that parallelism cannot be achieved, but the CNN text classification algorithm can only identify local text information, so that the precision is affected to a certain extent.
Disclosure of Invention
The invention aims to overcome the defects that an algorithm for classifying complaint texts in the prior art cannot be processed in parallel, the training time is long or the precision is not satisfied, and provides a method, a system, equipment and a storage medium for predicting the complaint text category of an OTA platform.
The invention solves the technical problems by the following technical scheme:
the invention provides a prediction method of complaint text category of an OTA platform, which comprises the following steps:
acquiring historical complaint text data corresponding to the OTA platform in a historical set time period;
labeling the historical complaint text data to obtain complaint categories corresponding to each piece of the historical complaint text data;
acquiring historical dimension data and historical entity data corresponding to the historical complaint text data in the OTA platform;
the historical dimension data are multidimensional data used for representing users, orders and/or hotels;
the historical entity data are data used for representing proper nouns in the hotel field;
taking the historical complaint text data, the historical dimension data and the historical entity data as input, taking the historical complaint category corresponding to the historical complaint text data as output, and establishing a prediction model for predicting the complaint category to which the complaint text data belongs;
acquiring target complaint text data;
inputting the target complaint text data into the prediction model, and obtaining probability values of each complaint category to which the target complaint text data belong;
and determining the target complaint category to which the target complaint text data belongs according to the probability value.
Preferably, after the step of obtaining the historical complaint text data corresponding to the OTA platform in the historical set time period, before the step of labeling the historical complaint text data, the method further includes:
clustering the historical complaint text data by adopting a clustering algorithm;
the step of labeling the historical complaint text data to obtain complaint categories corresponding to each piece of the historical complaint text data comprises the following steps:
and marking the historical complaint text data belonging to the same clustering result as the same complaint category.
Preferably, the step of establishing a prediction model for predicting a complaint category to which the complaint text data belongs further includes, with the historical complaint text data, the historical dimension data, and the historical entity data as inputs and with the historical complaint category corresponding to the historical complaint text data as outputs:
preprocessing the historical complaint text data after labeling.
Preferably, the step of determining the target complaint category to which the target complaint text data belongs according to the probability value includes:
and determining that the corresponding complaint category when the probability value is maximum is the target complaint category to which the target complaint text data belongs.
Preferably, before the step of obtaining the historical complaint text data corresponding to the OTA platform in the historical set time period, the method further includes:
pre-training the historical complaint text data by adopting a BERT (natural language processing algorithm) algorithm to obtain a language model;
the step of establishing a prediction model for predicting a complaint category to which the complaint text data belongs, with the history complaint text data, the history dimension data, and the history entity data as inputs and the history complaint category corresponding to the history complaint text data as outputs, includes:
and establishing a prediction model for predicting the complaint category to which the complaint text data belongs by randomly masking part of the entity data during training based on the language model by adopting a BERT algorithm and taking the historical complaint text data, the historical dimension data and the historical entity data as inputs and the historical complaint category corresponding to the historical complaint text data as output.
The invention also provides a prediction system of the complaint text category of the OTA platform, which comprises a historical text data acquisition module, a labeling processing module, a dimension and entity data acquisition module, a test model building module, a target text data acquisition module, a probability value acquisition module and a target complaint category acquisition module;
the historical text data acquisition module is used for acquiring historical complaint text data corresponding to the OTA platform in a historical set time period;
the marking processing module is used for marking the historical complaint text data and obtaining complaint categories corresponding to each piece of the historical complaint text data;
the dimension and entity data acquisition module is used for acquiring historical dimension data and historical entity data corresponding to the historical complaint text data in the OTA platform;
the historical dimension data are multidimensional data used for representing users, orders and/or hotels;
the historical entity data are data used for representing proper nouns in the hotel field;
the prediction model building module is used for taking the historical complaint text data, the historical dimension data and the historical entity data as input, taking the historical complaint category corresponding to the historical complaint text data as output, and building a prediction model for predicting the complaint category to which the complaint text data belongs;
the target text data acquisition module is used for acquiring target complaint text data;
the probability value acquisition module is used for inputting the target complaint text data into the prediction model and acquiring probability values of each complaint category to which the target complaint text data belong;
the target complaint category obtaining module is used for determining the target complaint category to which the target complaint text data belong according to the probability value.
Preferably, the prediction system further comprises a clustering module;
the clustering module is used for clustering the historical complaint text data by adopting a clustering algorithm;
the labeling processing module is used for labeling the historical complaint text data belonging to the same clustering result as the same complaint category.
Preferably, the prediction system further comprises a preprocessing module;
the preprocessing module is used for preprocessing the historical complaint text data after the labeling processing.
Preferably, the target complaint category obtaining module is configured to determine that the complaint category corresponding to the maximum probability value is the target complaint category to which the target complaint text data belongs.
Preferably, the prediction system further comprises a language model acquisition module;
the language model acquisition module is used for pre-training the historical complaint text data by adopting a BERT algorithm to acquire a language model;
the prediction model building module is used for building the prediction model for predicting the complaint category to which the complaint text data belongs by randomly masking part of the entity data during training based on the language model by taking the historical complaint text data, the historical dimension data and the historical entity data as inputs and the historical complaint category corresponding to the historical complaint text data as output by adopting a BERT algorithm.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the method for predicting the complaint text category of the OTA platform.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for predicting a complaint text class of an OTA platform described above.
The invention has the positive progress effects that:
in the method, a language model is obtained through pre-training, then an improved BERT algorithm is adopted to take historical complaint text data, historical dimension data and historical entity data as input, the historical complaint category as output, and a prediction model is built based on the language model; the probability value of each complaint category of the target complaint text data is obtained by adopting the prediction model, the complaint category with the highest probability value is selected as the target complaint category of the target complaint text data, the accuracy of the prediction model is improved, the accuracy of text classification is improved, the classification of the user complaint content is automatically realized, the related responsible personnel can process the complaint category responsible for the user at the first time, a great amount of manpower is saved while the user experience is improved, and the overall working efficiency is improved.
Drawings
Fig. 1 is a flowchart of a method for predicting complaint text category of OTA platform according to embodiment 1 of the invention.
Fig. 2 is a flowchart of a method for predicting complaint text category of the OTA platform according to embodiment 2 of the invention.
Fig. 3 is a schematic block diagram of a complaint text class prediction system of an OTA platform according to embodiment 3 of the invention.
Fig. 4 is a schematic block diagram of a complaint text class prediction system of an OTA platform according to embodiment 4 of the invention.
Fig. 5 is a schematic structural diagram of an electronic device for implementing a method for predicting a complaint text class of an OTA platform in embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
As shown in fig. 1, the method for predicting complaint text categories of the OTA platform of the embodiment includes:
s101, acquiring historical complaint text data corresponding to an OTA platform in a historical set time period;
s102, marking the historical complaint text data to obtain complaint categories corresponding to each piece of the historical complaint text data;
s103, acquiring historical dimension data and historical entity data corresponding to the historical complaint text data in an OTA platform;
the historical dimension data is multi-dimension data used for representing users, orders and/or hotels;
the historical entity data is data used for representing proper nouns in the hotel field;
specifically, the history dimension data comprises order information, hotel information, user information and the like, wherein the order information comprises, but is not limited to, a payment mode, a transaction state and an order type corresponding to an order; hotel information includes, but is not limited to, hotel facility device names, room source information, etc.; the user information includes, but is not limited to, user name, gender, etc.
Historical entity data includes proper nouns in the hotel field, such as pre-paid, big double bed, flash, etc. The historical entity data is stored in a dictionary format.
S104, taking the historical complaint text data, the historical dimension data and the historical entity data as input, taking the historical complaint category corresponding to the historical complaint text data as output, and establishing a prediction model for predicting the complaint category to which the complaint text data belongs;
s105, acquiring target complaint text data;
s106, inputting the target complaint text data into a prediction model, and obtaining probability values of each complaint category to which the target complaint text data belong;
s107, determining the target complaint category to which the target complaint text data belong according to the probability value.
In this embodiment, the data screening adopts a random sampling manner to ensure the same distribution of the data.
In the embodiment, the historical complaint text data, the historical dimension data and the historical entity data are taken as input, and the historical complaint category is taken as output to establish a prediction model; the probability value of each complaint category of the target complaint text data is obtained by adopting the prediction model, the complaint category with the highest probability value is selected as the target complaint category of the target complaint text data, the text classification precision is improved, the user complaint content is automatically classified, the related responsible personnel can process the complaint category responsible for the user at the first time, and a great deal of manpower is saved while the user experience is improved.
Example 2
As shown in fig. 2, the method for predicting complaint text category of the OTA platform of the present embodiment is a further improvement of embodiment 1, specifically:
after step S101, before step S102, the method further comprises:
s1020, clustering historical complaint text data by adopting a clustering algorithm;
among them, the clustering algorithms include, but are not limited to, K-MEANS clustering algorithm (K-MEANS clustering algorithm), DBSCAN clustering algorithm (a density-based clustering algorithm), mean shift clustering algorithm, hierarchical clustering algorithm, and synthetic clustering.
Step S102 includes:
s1021, marking the historical complaint text data belonging to the same clustering result as the same complaint category.
Specifically, in the labeling process, related staff are selected to label the historical complaint text data by combining the business requirements and the clustering results.
After step S103, before step S104, the method further comprises:
s1040, preprocessing the history complaint text data after the labeling processing.
Specifically, preprocessing includes, but is not limited to, converting full angles to half angles, converting traditional forms to simplified forms, converting uppercase to lowercase, deactivating words and low frequency words, filtering null values, filtering sensitive words.
In addition, before step S101, the method further includes:
s1010, pre-training the historical complaint text data by adopting a BERT algorithm to obtain a language model.
Step S104 includes:
s1041, adopting a BERT algorithm to take historical complaint text data, historical dimension data and historical entity data as input, taking the historical complaint category corresponding to the historical complaint text data as output, and establishing a prediction model for predicting the complaint category to which the complaint text data belongs by randomly masking part of entity data during training based on a language model.
The method has the advantages that by fine tuning on the basis of a language model obtained through training, faster convergence can be achieved compared with training from scratch, and meanwhile, relatively better classification precision and effect can be achieved by using marking data, dimensions and entity data with smaller data size in a classification layer; specifically, when mask (random masking) is performed, the word is replaced by corresponding entity data through matching with the entity dictionary, so that label leakage can be prevented, and a prediction model with higher precision can be established.
Considering that certain errors exist in manually marked data, the historical complaint text data can be predicted by adopting an obtained prediction model, then the complaint types corresponding to the historical complaint text data with the maximum probability value of 0.5-0.7 are manually marked again, the model is retrained, and iterative training is stopped until the maximum probability value is greater than 0.7, so that the accuracy of the prediction model is ensured.
The step S107 specifically includes:
s1071, determining that the corresponding complaint category with the maximum probability value is the target complaint category to which the target complaint text data belongs.
In the embodiment, a language model is obtained through pre-training, then an improved BERT algorithm is adopted to take historical complaint text data, historical dimension data and historical entity data as input, the historical complaint category as output, and a prediction model is built based on the language model; the probability value of each complaint category of the target complaint text data is obtained by adopting the prediction model, the complaint category with the highest probability value is selected as the target complaint category of the target complaint text data, the text classification precision is improved, the user complaint content is automatically classified, the complaint categories which are responsible for the user can be processed by related responsible personnel at the first time, a great amount of manpower is saved while the user experience is improved, and the processing efficiency is improved.
Example 3
As shown in fig. 3, the prediction system of complaint text category of the OTA platform of the present embodiment includes a history text data acquisition module 1, a labeling processing module 2, a dimension and entity data acquisition module 3, a test model creation module 4, a target text data acquisition module 5, a probability value acquisition module 6, and a target complaint category acquisition module 7.
The historical text data acquisition module 1 is used for acquiring historical complaint text data corresponding to the OTA platform in a historical set time period;
the marking processing module 2 is used for marking the historical complaint text data to obtain complaint categories corresponding to each piece of the historical complaint text data;
the dimension and entity data acquisition module 3 is used for acquiring historical dimension data and historical entity data corresponding to the historical complaint text data in the OTA platform;
the historical dimension data is multi-dimension data used for representing users, orders and/or hotels;
the historical entity data is data used for representing proper nouns in the hotel field;
specifically, the history dimension data comprises order information, hotel information, user information and the like, wherein the order information comprises, but is not limited to, a payment mode, a transaction state and an order type corresponding to an order; hotel information includes, but is not limited to, hotel facility device names, room source information, etc.; the user information includes, but is not limited to, user name, gender, etc.
Historical entity data includes proper nouns in the hotel field, such as pre-paid, big double bed, flash, etc. The historical entity data is stored in a dictionary format.
The prediction model building module 4 is used for taking the historical complaint text data, the historical dimension data and the historical entity data as input, taking the historical complaint category corresponding to the historical complaint text data as output, and building a prediction model for predicting the complaint category to which the complaint text data belongs;
the target text data acquisition module 5 is used for acquiring target complaint text data;
the probability value acquisition module 6 is used for inputting the target complaint text data into the prediction model to acquire probability values of each complaint category to which the target complaint text data belong;
the target complaint category obtaining module 7 is configured to determine, according to the probability value, a target complaint category to which the target complaint text data belongs.
In this embodiment, the data screening adopts a random sampling manner to ensure the same distribution of the data.
In the embodiment, the historical complaint text data, the historical dimension data and the historical entity data are taken as input, and the historical complaint category is taken as output to establish a prediction model; the probability value of each complaint category of the target complaint text data is obtained by adopting the prediction model, the complaint category with the highest probability value is selected as the target complaint category of the target complaint text data, the text classification precision is improved, the user complaint content is automatically classified, the related responsible personnel can process the complaint category responsible for the user at the first time, and a great deal of manpower is saved while the user experience is improved.
Example 4
As shown in fig. 4, the complaint text class prediction system of the OTA platform of the present embodiment is a further improvement of embodiment 3, specifically:
the prediction system further comprises a clustering module 8;
the clustering module 8 is used for clustering the historical complaint text data by adopting a clustering algorithm;
the clustering algorithms include, but are not limited to, K-MEANS clustering algorithms, DBSCAN clustering algorithms, mean shift clustering algorithms, hierarchical clustering algorithms, and synthetic clustering.
The labeling processing module 2 is used for labeling the historical complaint text data belonging to the same clustering result as the same complaint category.
Specifically, in the labeling process, related staff are selected to label the historical complaint text data by combining the business requirements and the clustering results.
The prediction system further comprises a preprocessing module 9;
the preprocessing module 9 is used for preprocessing the history complaint text data after the labeling processing.
Specifically, preprocessing includes, but is not limited to, converting full angles to half angles, converting traditional forms to simplified forms, converting uppercase to lowercase, deactivating words and low frequency words, filtering null values, filtering sensitive words.
Specifically, the prediction system further includes a language model acquisition module 10;
the language model obtaining module 10 is used for pre-training the historical complaint text data by adopting a BERT algorithm to obtain a language model;
the prediction model building module 4 is configured to use the BERT algorithm to take the historical complaint text data, the historical dimension data and the historical entity data as input, take the historical complaint category corresponding to the historical complaint text data as output, and build a prediction model for predicting the complaint category to which the complaint text data belongs by randomly masking part of the entity data during training based on the language model.
The method has the advantages that by fine tuning on the basis of a language model obtained through training, faster convergence can be achieved compared with training from scratch, and meanwhile, relatively better classification precision and effect can be achieved by using marking data, dimensions and entity data with smaller data size in a classification layer; specifically, when the mask is used, the word is replaced by corresponding entity data through matching with the entity dictionary, so that label leakage can be prevented, and a prediction model with higher precision can be established.
Considering that certain errors exist in manually marked data, the historical complaint text data can be predicted by adopting an obtained prediction model, then the complaint types corresponding to the historical complaint text data with the maximum probability value of 0.5-0.7 are manually marked again, the model is retrained, and iterative training is stopped until the maximum probability value is greater than 0.7, so that the accuracy of the prediction model is ensured.
The target complaint category obtaining module 7 is configured to determine that the complaint category corresponding to the maximum probability value is the target complaint category to which the target complaint text data belongs.
In the embodiment, a language model is obtained through pre-training, then an improved BERT algorithm is adopted to take historical complaint text data, historical dimension data and historical entity data as input, the historical complaint category as output, and a prediction model is built based on the language model; the probability value of each complaint category of the target complaint text data is obtained by adopting the prediction model, the complaint category with the highest probability value is selected as the target complaint category of the target complaint text data, the text classification precision is improved, the user complaint content is automatically classified, the related responsible personnel can process the complaint category responsible for the user at the first time, and a great deal of manpower is saved while the user experience is improved.
Example 5
Fig. 5 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention. The electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed implements the method of predicting complaint text categories for an OTA platform in either of embodiments 1 or 2. The electronic device 30 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 5, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be a server device, for example. Components of electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, a bus 33 connecting the different system components, including the memory 32 and the processor 31.
The bus 33 includes a data bus, an address bus, and a control bus.
Memory 32 may include volatile memory such as Random Access Memory (RAM) 321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 31 executes various functional applications and data processing, such as the complaint text class prediction method of the OTA platform in any one of the embodiments 1 or 2 of the present invention, by running a computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface 35. Also, model-generating device 30 may also communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet, via network adapter 36. As shown in fig. 5, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generating device 30, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the method for predicting complaint text categories for an OTA platform in either of embodiments 1 or 2.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present invention may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps in a method for predicting a complaint text class of an OTA platform in any one of embodiments 1 or 2 when the program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device, partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims (10)

1. The method for predicting the complaint text class of the OTA platform is characterized by comprising the following steps:
acquiring historical complaint text data corresponding to the OTA platform in a historical set time period;
the method comprises the steps of screening data of historical complaint text data in a random sampling mode so that the historical complaint text data are distributed in the same way;
labeling the historical complaint text data to obtain complaint categories corresponding to each piece of the historical complaint text data;
acquiring historical dimension data and historical entity data corresponding to the historical complaint text data in the OTA platform;
the historical dimension data are multidimensional data used for representing users, orders and/or hotels;
the historical entity data are data used for representing proper nouns in the hotel field;
taking the historical complaint text data, the historical dimension data and the historical entity data as input, and taking the historical complaint category corresponding to the historical complaint text data as output, and establishing a prediction model for predicting the complaint category to which the complaint text data belongs;
acquiring target complaint text data;
inputting the target complaint text data into the prediction model, and obtaining probability values of each complaint category to which the target complaint text data belong;
determining a target complaint category to which the target complaint text data belongs according to the probability value;
the step of obtaining the historical complaint text data corresponding to the OTA platform in the historical set time period further comprises the following steps:
pre-training the historical complaint text data by adopting a BERT algorithm to obtain a language model;
the step of establishing a prediction model for predicting a complaint category to which the complaint text data belongs includes the steps of:
and establishing a prediction model for predicting the complaint category to which the complaint text data belongs by randomly masking part of the entity data during training based on the language model by adopting a BERT algorithm and taking the historical complaint text data, the historical dimension data and the historical entity data as inputs and the historical complaint category corresponding to the historical complaint text data as output.
2. The method for predicting a complaint text class of an OTA platform according to claim 1 wherein the step of obtaining the historical complaint text data corresponding to the OTA platform in a set historical time period and before the step of labeling the historical complaint text data further comprises:
clustering the historical complaint text data by adopting a clustering algorithm;
the step of labeling the historical complaint text data to obtain complaint categories corresponding to each piece of the historical complaint text data comprises the following steps:
and marking the historical complaint text data belonging to the same clustering result as the same complaint category.
3. The method for predicting a complaint text class of an OTA platform according to claim 1, wherein the step of creating a prediction model for predicting a complaint class to which the complaint text data belongs further includes, before the step of creating a prediction model for predicting a complaint class to which the complaint text data belongs, with the historical complaint text data, the historical dimension data, and the historical entity data as inputs and the historical complaint class corresponding to the historical complaint text data as outputs:
preprocessing the historical complaint text data after labeling.
4. The method for predicting a complaint text class of an OTA platform according to claim 1 wherein the step of determining a target complaint class to which the target complaint text data belongs according to the probability value comprises:
and determining that the corresponding complaint category when the probability value is maximum is the target complaint category to which the target complaint text data belongs.
5. The system for predicting the complaint text class of the OTA platform is characterized by comprising a historical text data acquisition module, a labeling processing module, a dimension and entity data acquisition module, a prediction model establishment module, a target text data acquisition module, a probability value acquisition module and a target complaint class acquisition module;
the historical text data acquisition module is used for acquiring historical complaint text data corresponding to the OTA platform in a historical set time period;
the method comprises the steps of screening data of historical complaint text data in a random sampling mode so that the historical complaint text data are distributed in the same way;
the marking processing module is used for marking the historical complaint text data and obtaining complaint categories corresponding to each piece of the historical complaint text data;
the dimension and entity data acquisition module is used for acquiring historical dimension data and historical entity data corresponding to the historical complaint text data in the OTA platform;
the historical dimension data are multidimensional data used for representing users, orders and/or hotels;
the historical entity data are data used for representing proper nouns in the hotel field;
the prediction model building module is used for taking the historical complaint text data, the historical dimension data and the historical entity data as input, taking the historical complaint category corresponding to the historical complaint text data as output, and building a prediction model for predicting the complaint category to which the complaint text data belongs;
the target text data acquisition module is used for acquiring target complaint text data;
the probability value acquisition module is used for inputting the target complaint text data into the prediction model and acquiring probability values of each complaint category to which the target complaint text data belong;
the target complaint category acquisition module is used for determining a target complaint category to which the target complaint text data belong according to the probability value;
the prediction system further comprises a language model acquisition module;
the language model acquisition module is used for pre-training the historical complaint text data by adopting a BERT algorithm to acquire a language model;
the prediction model building module is used for building the prediction model for predicting the complaint category to which the complaint text data belongs by randomly masking part of the entity data during training based on the language model by taking the historical complaint text data, the historical dimension data and the historical entity data as inputs and the historical complaint category corresponding to the historical complaint text data as output by adopting a BERT algorithm.
6. The OTA platform complaint text class prediction system of claim 5 further comprising a clustering module;
the clustering module is used for clustering the historical complaint text data by adopting a clustering algorithm;
the labeling processing module is used for labeling the historical complaint text data belonging to the same clustering result as the same complaint category.
7. The OTA platform complaint text class prediction system of claim 5 wherein the prediction system further comprises a preprocessing module;
the preprocessing module is used for preprocessing the historical complaint text data after the labeling processing.
8. The system of claim 5, wherein the target complaint category obtaining module is configured to determine that the complaint category to which the target complaint text data belongs is the complaint category to which the probability value is largest.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of predicting complaint text categories for an OTA platform of any one of claims 1-4 when the computer program is executed by the processor.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the method of predicting a complaint text class of an OTA platform as claimed in any one of claims 1 to 4.
CN201910650261.0A 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category Active CN110347840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910650261.0A CN110347840B (en) 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910650261.0A CN110347840B (en) 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category

Publications (2)

Publication Number Publication Date
CN110347840A CN110347840A (en) 2019-10-18
CN110347840B true CN110347840B (en) 2023-06-13

Family

ID=68178920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910650261.0A Active CN110347840B (en) 2019-07-18 2019-07-18 Prediction method, system, equipment and storage medium for complaint text category

Country Status (1)

Country Link
CN (1) CN110347840B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110930022A (en) * 2019-11-20 2020-03-27 携程计算机技术(上海)有限公司 Hotel static information detection method and system, electronic equipment and storage medium
CN111192160A (en) * 2019-12-17 2020-05-22 山大地纬软件股份有限公司 Power public opinion monitoring method and system based on multi-fractal optimization
CN113205105A (en) * 2020-01-16 2021-08-03 北京沃东天骏信息技术有限公司 Prediction method and device for complaint user
CN111553817A (en) * 2020-04-24 2020-08-18 北京北大软件工程股份有限公司 Analysis method and system for goodness of fit of complaint reporting case and treatment department
CN113810212B (en) * 2020-06-15 2023-04-18 中国移动通信集团浙江有限公司 Root cause positioning method and device for 5G slice user complaints
CN112052994A (en) * 2020-08-28 2020-12-08 中信银行股份有限公司 Customer complaint upgrade prediction method and device and electronic equipment
CN112288446B (en) * 2020-10-28 2023-06-06 中国联合网络通信集团有限公司 Calculation method and device for complaint and claim payment
CN112925911B (en) * 2021-02-25 2022-08-12 平安普惠企业管理有限公司 Complaint classification method based on multi-modal data and related equipment thereof
CN113704407B (en) * 2021-08-30 2023-08-25 平安银行股份有限公司 Complaint volume analysis method, device, equipment and storage medium based on category analysis
CN114168734A (en) * 2021-12-07 2022-03-11 中国农业银行股份有限公司 Client event list classification method, device, equipment and storage medium
CN114491019A (en) * 2021-12-28 2022-05-13 中国电信股份有限公司 Method, apparatus and storage medium for classifying complaint information

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309948A (en) * 2013-05-20 2013-09-18 携程计算机技术(上海)有限公司 System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center
CN107844559A (en) * 2017-10-31 2018-03-27 国信优易数据有限公司 A kind of file classifying method, device and electronic equipment
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN109684475A (en) * 2018-11-21 2019-04-26 斑马网络技术有限公司 Processing method, device, equipment and the storage medium of complaint
CN109726290A (en) * 2018-12-29 2019-05-07 咪咕数字传媒有限公司 Complaint classification model determination method and device and computer-readable storage medium
CN109816399A (en) * 2019-01-07 2019-05-28 平安科技(深圳)有限公司 Complain management method, device, computer equipment and the storage medium of part
CN109858702A (en) * 2019-02-14 2019-06-07 中国联合网络通信集团有限公司 Client upgrades prediction technique, device, equipment and the readable storage medium storing program for executing complained
CN109982367A (en) * 2017-12-28 2019-07-05 中国移动通信集团四川有限公司 Mobile terminal Internet access customer complaint prediction technique, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10761920B2 (en) * 2017-01-17 2020-09-01 Bank Of America Corporation Individualized channel error detection and resolution
CN108573031A (en) * 2018-03-26 2018-09-25 上海万行信息科技有限公司 A kind of complaint sorting technique and system based on content
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN109918501A (en) * 2019-01-18 2019-06-21 平安科技(深圳)有限公司 Method, apparatus, equipment and the storage medium of news article classification

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309948A (en) * 2013-05-20 2013-09-18 携程计算机技术(上海)有限公司 System and method for public opinion monitoring analysis and intelligent distribution processing of coordination center
CN107844559A (en) * 2017-10-31 2018-03-27 国信优易数据有限公司 A kind of file classifying method, device and electronic equipment
CN109982367A (en) * 2017-12-28 2019-07-05 中国移动通信集团四川有限公司 Mobile terminal Internet access customer complaint prediction technique, device, equipment and storage medium
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN109684475A (en) * 2018-11-21 2019-04-26 斑马网络技术有限公司 Processing method, device, equipment and the storage medium of complaint
CN109726290A (en) * 2018-12-29 2019-05-07 咪咕数字传媒有限公司 Complaint classification model determination method and device and computer-readable storage medium
CN109816399A (en) * 2019-01-07 2019-05-28 平安科技(深圳)有限公司 Complain management method, device, computer equipment and the storage medium of part
CN109858702A (en) * 2019-02-14 2019-06-07 中国联合网络通信集团有限公司 Client upgrades prediction technique, device, equipment and the readable storage medium storing program for executing complained

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Wenjing Duan et al..Mining Online User-Generated Content: Using Sentiment Analysis Technique to Study Hotel Service Quality.《2013 46th Hawaii International Conference on System Sciences 1530-1605/12 $26.00 © 2012 IEEE DOI 10.1109/HICSS.2013.400 3117 2013 46th Hawaii International Conference on System Sciences》.2013,3119-3128. *
唐雪薇.旅游网络口碑信息特征对出游意向的影响.《中国优秀硕士学位论文全文数据库 经济与管理科学辑》.2018,J157-40. *

Also Published As

Publication number Publication date
CN110347840A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110347840B (en) Prediction method, system, equipment and storage medium for complaint text category
US20210374610A1 (en) Efficient duplicate detection for machine learning data sets
JP6445055B2 (en) Feature processing recipe for machine learning
CN106663224B (en) Interactive interface for machine learning model assessment
US10452992B2 (en) Interactive interfaces for machine learning model evaluations
US11574250B2 (en) Classification of erroneous cell data
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
US20210097136A1 (en) Generating corpus for training and validating machine learning model for natural language processing
CN110599200A (en) Detection method, system, medium and device for false address of OTA hotel
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
WO2022072237A1 (en) Lifecycle management for customized natural language processing
CN116797195A (en) Work order processing method, apparatus, computer device, and computer readable storage medium
CN112214595A (en) Category determination method, device, equipment and medium
CN115954058A (en) Organic reaction classification method and device, electronic equipment and storage medium
CN115146643A (en) Text representation method, system, electronic device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant