CN116662546A - Complaint text labeling method, device, equipment and medium - Google Patents

Complaint text labeling method, device, equipment and medium Download PDF

Info

Publication number
CN116662546A
CN116662546A CN202310632374.4A CN202310632374A CN116662546A CN 116662546 A CN116662546 A CN 116662546A CN 202310632374 A CN202310632374 A CN 202310632374A CN 116662546 A CN116662546 A CN 116662546A
Authority
CN
China
Prior art keywords
complaint
text
fuzzy rule
fuzzy
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310632374.4A
Other languages
Chinese (zh)
Inventor
史册
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310632374.4A priority Critical patent/CN116662546A/en
Publication of CN116662546A publication Critical patent/CN116662546A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/24765Rule-based classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/048Fuzzy inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Automation & Control Theory (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a complaint text labeling method, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring a complaint text of a user, wherein the complaint text comprises contents fed back by the user through a complaint channel; classifying the complaint text by using at least one fuzzy rule in a fuzzy rule base to obtain a first classification result; labeling the complaint text according to the first classification result; wherein the at least one fuzzy rule is configured to be obtained in advance by: dividing M pieces of history complaint text into N data partitions, wherein each data partition comprises at least one piece of history complaint text; and processing the N data partitions in a one-to-one correspondence manner by using N distributed nodes which are operated in parallel by using a distributed computing platform, and extracting the at least one fuzzy rule. The disclosure also provides a complaint text labeling device, a storage medium and a program product.

Description

Complaint text labeling method, device, equipment and medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly, to a complaint text labeling method, apparatus, device, medium, and program product.
Background
In order to provide a better business transaction experience for users, some enterprises may take measures to find problems in their own operation process to improve service quality, for example, to provide complaint channels to receive complaint feedback of users. Service problems are acquired through complaint worksheet data, and the problems are found to solve.
Complaint marking is a basic mode for carrying out structural processing on complaint texts, namely, selecting word labels most closely related to user description problems according to user complaint contents to mark. Along with the increase of the data volume, the workload of manual processing marking is larger and larger, and the problems of low efficiency, long time, high cost and the like exist.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a complaint text labeling method, apparatus, device, medium and program product.
In one aspect of the disclosed embodiments, a complaint text labeling method is provided, including: acquiring a complaint text of a user, wherein the complaint text comprises contents fed back by the user through a complaint channel; classifying the complaint text by using at least one fuzzy rule in a fuzzy rule base to obtain a first classification result; labeling the complaint text according to the first classification result; wherein the at least one fuzzy rule is configured to be obtained in advance by: dividing M pieces of history complaint texts into N data partitions, wherein each data partition comprises at least one piece of history complaint text, N and M are integers which are more than or equal to 2, and N is less than or equal to M; and processing the N data partitions in a one-to-one correspondence manner by using N distributed nodes which are operated in parallel by using a distributed computing platform, and extracting the at least one fuzzy rule.
According to an embodiment of the present disclosure, the dividing the M pieces of history complaint text into N data partitions includes: classifying each historical complaint text in the M historical complaint texts to obtain a second classification result of each historical complaint text, wherein the second classification result of each historical complaint text is any one of N preset classification results; and dividing the M pieces of history complaint text into the N data partitions based on the second classification result of each history complaint text.
According to an embodiment of the present disclosure, a first distributed node is any one of the N distributed nodes, a first data partition is any one of the N data partitions, and processing the N data partitions with the N distributed nodes of the distributed computing platform in a one-to-one correspondence includes processing the first data partition with the first distributed node, specifically including: dividing K first fuzzy sets according to the feature vector of each historical complaint text in the first data partition, wherein K is an integer greater than or equal to 1; and calculating K first membership degrees of each historical complaint text based on a predetermined first membership function, wherein the K first membership degrees comprise membership degrees of the feature vector respectively belonging to the K first fuzzy sets.
According to an embodiment of the present disclosure, the processing the first data partition with the first distributed node further includes: dividing N second fuzzy sets for the predicted complaint labels of each historical complaint text based on the N preset classification results; and calculating N second membership degrees of each historical complaint text based on a predetermined second membership function, wherein the N second membership degrees comprise membership degrees of the complaint labels respectively belonging to the N second fuzzy sets.
According to an embodiment of the present disclosure, the processing the first data partition with the first distributed node further includes: extracting an initial fuzzy rule corresponding to each historical complaint text aiming at the characteristic vector and the predictive complaint label of each historical complaint text, wherein the corresponding initial fuzzy rule is used for describing a conditional result relationship between the characteristic vector and the predictive complaint label.
According to an embodiment of the present disclosure, extracting the at least one fuzzy rule includes extracting a first fuzzy rule, including: calculating the intensity of a corresponding initial fuzzy rule according to the K first membership degrees and the N second membership degrees of each historical complaint text; determining an initial fuzzy rule with maximum intensity in the first data partition; and determining the initial fuzzy rule with the maximum intensity as the first fuzzy rule.
According to an embodiment of the present disclosure, each initial fuzzy rule is characterized in terms of a conditional statement and a result statement, and the extracting the at least one fuzzy rule includes: all initial fuzzy rules extracted by the N distributed nodes are obtained; dividing all the initial fuzzy rules on the premise of having the same conditional statement to obtain S rule partitions, wherein S is greater than or equal to 1; and operating S distributed nodes in parallel by using the distributed computing platform to process the S rule partitions in a one-to-one correspondence manner, and extracting S fuzzy rules.
According to an embodiment of the present disclosure, the second distributed node is any one of the S distributed nodes, the first rule partition is any one of the S rule partitions, and the processing the S rule partitions in a one-to-one correspondence includes processing the first rule partition with the second distributed node, specifically including: aiming at each initial fuzzy rule in the first rule partition, acquiring the K first membership degrees and the N second membership degrees of the corresponding historical complaint text; calculating the intensity of each initial fuzzy rule according to the acquired K first membership degrees and the N second membership degrees; and determining an initial fuzzy rule with the maximum intensity in the first rule partition, wherein the S fuzzy rules comprise the initial fuzzy rule with the maximum intensity.
According to an embodiment of the present disclosure, before the dividing the M pieces of history complaint text into N data partitions, the method further includes: word segmentation is carried out on M original historical complaint texts to obtain M preprocessed texts; carrying out noise reduction treatment on the M pieces of preprocessed texts to obtain M pieces of historical complaint texts; and extracting M eigenvectors based on the M pieces of history complaint text.
Another aspect of the disclosed embodiments provides a complaint text labeling device, including: the system comprises a text acquisition module, a text processing module and a text processing module, wherein the text acquisition module is used for acquiring a complaint text of a user, and the complaint text comprises contents fed back by the user through a complaint channel; the text classification module is used for classifying the complaint text by utilizing at least one fuzzy rule in the fuzzy rule base to obtain a first classification result; the text marking module is used for marking the complaint text according to the first classification result; wherein the at least one fuzzy rule is configured to be obtained in advance by: dividing M pieces of history complaint texts into N data partitions, wherein each data partition comprises at least one piece of history complaint text, N and M are integers which are larger than or equal to 1, and N is smaller than or equal to M; and processing the N data partitions in a one-to-one correspondence manner by using N distributed nodes which are operated in parallel by using a distributed computing platform, and extracting the at least one fuzzy rule.
Another aspect of an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a storage means for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
Another aspect of the disclosed embodiments also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method as described above.
Another aspect of the disclosed embodiments also provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
One or more of the above embodiments have the following advantages: m historical complaint texts are processed in parallel based on N distributed nodes of the distributed computing platform, at least one fuzzy rule is extracted to pre-establish a fuzzy rule base, and the processing efficiency is improved in a parallelization processing mode. After the complaint text is classified by using the fuzzy rules in the fuzzy rule library, automatic labeling is realized, complaint data can be labeled more quickly and more efficiently, service optimization of enterprises is assisted, and compared with the prior art, errors caused by manually labeling the data are reduced, and the method has the advantages of high efficiency, short time consumption, low cost and the like.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a complaint text labeling method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a complaint text labeling method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of pre-establishing a fuzzy rule base in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of partitioning data according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of preprocessing according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow diagram of fuzzy aggregation partitioning in accordance with an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow diagram of fuzzy aggregation partitioning in accordance with another embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow chart of extracting a first fuzzy rule in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow chart of extracting fuzzy rules in accordance with another embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow chart of parallel extraction of fuzzy rules in accordance with an embodiment of the present disclosure;
FIG. 11 schematically illustrates a block diagram of a complaint text labeling apparatus according to an embodiment of the present disclosure; and
fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement a complaint text labeling method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
In order to facilitate understanding of the technical solutions of the embodiments of the present disclosure, some technical terms related to the present disclosure are first described.
Fuzzy algorithm: based on fuzzy mathematics, a fuzzy rule base composed of IF-THEN rules from expert or automatic extraction based on knowledge in specific fields is obtained. For example, a Wang-Mendel algorithm is executed by using a distributed computing platform, rules are directly extracted from historical data, and a fuzzy rule base is obtained by using fuzzification, product reasoning and defuzzification operations.
Fuzzy rule: is a statement in the form of IF-THEN for the ambiguous relationship of the statement being described with a continuous membership function. A fuzzy rule includes one or more input variables and an output variable, both of which have their corresponding membership functions and fuzzy sets.
Membership function: a function that describes the degree of membership of an element or variable to a fuzzy set. Examples are triangular membership functions, trapezoidal membership functions, rectangular membership functions, gaussian membership functions, sigmoid membership functions, exponential membership functions, etc.
Fuzzy aggregation: refers to mathematically describing a set of membership degrees of an element or variable to a fuzzy concept with a membership function.
Conditional outcome relationship: the IF-THEN rule description based on fuzzy logic includes a relationship between input and output.
Intensity of fuzzy rule: the influence degree of the fuzzy rule on the output result is generally determined by the membership degree output by the membership degree function, and the higher the strength is, the stronger the influence degree of the fuzzy rule on the output result is.
Distributed computing platform: the system is composed of a plurality of computers, wherein the computers cooperate with each other to jointly complete a work task. Examples may include Hadoop, spark, and Flink platforms.
Distributed node: refers to each computer in a system of computers. Each computer has its own computing and storage resources capable of receiving and completing the tasks assigned to it. The number of distributed nodes may be dynamically increased or decreased according to demand.
Data partitioning: the historical complaint text data set is divided into a plurality of parts, namely a plurality of data partitions, and the partitions are distributed to different distributed nodes for parallel processing. Text data is read and converted to an RDD data model, for example, using a textFile function provided by the RDD (Resilient Distributed Datasets) programming model. RDD is an element combination in abstraction that contains data. It is partitioned into multiple partitions, each of which is distributed over different distributed nodes, so that the data in RDD can be operated on in parallel.
With banking industry as an example, with the increasing competition of banking industry, customers pay more attention to enjoying service when transacting banking business, and requirements on banking business environment, financial products and business service are also higher and higher. In order to provide better business handling experience for customers, banks find problems for complaint worksheets and solve the problems.
The bank inputs the complaint label to the complaint work order data manually, and then manually selects the processing department corresponding to the complaint business type to process the complaint dispatch order. However, banking business, products, clients and the like are more in quantity, the manual classification speed is low, the time consumption is long, and the manual and time costs are high. In addition, subjective understanding of the same text semantics by different people may be different, and information omission and even labeling errors are easy to occur during manual classification and marking.
Some embodiments of the present disclosure provide a complaint text labeling method, which processes M historical complaint texts in parallel based on N distributed nodes of a distributed computing platform, extracts at least one fuzzy rule to pre-establish a fuzzy rule base, and saves manpower by a parallelization processing mode, thereby improving processing efficiency. After the complaint text is classified by using the fuzzy rules in the fuzzy rule library, automatic labeling is realized, complaint data can be labeled more quickly and more efficiently, service optimization of enterprises is assisted, and compared with the prior art, errors caused by manually labeling the data are reduced, and the method has the advantages of high efficiency, short time consumption, low cost and the like.
In particular, although the foregoing is exemplified by the complaint text in the bank, the disclosure is not limited thereto, and the disclosure may be flexibly applied to various scenes where the complaint text needs to be processed.
Fig. 1 schematically illustrates an application scenario diagram of a complaint text labeling method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example in which embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
The server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud computing, network service, and middleware service.
In some embodiments, the user may register a complaint work order for his own transacted business via an application client or web page in the terminal device 101, 102, 103. Complaint worksheet data is received by server 105 and converted to complaint text to perform the complaint text labeling method of the embodiments of the present disclosure. In other embodiments, the complaint text labeling method of the embodiments of the present disclosure may be performed by collecting complaint content fed back by a user through a network, telephone, or offline channel, and transmitting the complaint content in the form of complaint text to the server 105 through the terminal devices 101, 102, 103.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The complaint text labeling method of the embodiment of the present disclosure will be described in detail below by fig. 2 to 10 based on the scenario described in fig. 1.
FIG. 2 schematically illustrates a flow chart of a complaint text labeling method according to an embodiment of the present disclosure. As shown in fig. 2, the complaint text labeling method of this embodiment includes:
in operation S210, a complaint text of the user is acquired, wherein the complaint text includes contents fed back by the user through a complaint channel.
For example, the content fed back by the user through the complaint channel is a question manually input by the user on the user interface of the electronic device or the content input by the user through voice, etc. Of course, it is understood that the original data of complaints may be other forms of data such as pictures, videos, etc., which are not limited in this disclosure.
For example, the complaint text of the user may be text data corresponding to the voice complaint data obtained by performing voice transcription operation on the voice complaint data. Specifically, the user can feed back through a telephone complaint channel, record a complaint telephone, and perform voice transcription operation.
In operation S220, the complaint text is classified by using at least one fuzzy rule in the fuzzy rule base, and a first classification result is obtained.
In operation S230, complaint text is annotated according to the first classification result.
In some embodiments, the first classification result and the labeling label may be the same, i.e., the classification result is labeled after the first classification result is obtained. Including credit card complaints, loan complaints, deposit complaints, transfer complaints, financial complaints, and the like, for example.
In other embodiments, the first classification result may be reclassified according to the first classification result in operation S230, and the reclassified result may be labeled as a label through the fuzzy rule base. Specifically, for example, the first classification result includes a complaint service, a complaint service handling channel, a complaint cause, a complaint product, a complaint, and the like. For complaint business, the reclassification result includes business labels of credit cards, loans, deposits, and the like. For complaint business offices, the reclassification results include tags such as telephone, mail, web site, or counter. For complaint reasons, the reclassification results include quality of service, interest rates, administrative fees, account management or security issues, etc. Each first classification result can be provided with a multi-level label, and specific subdivision designs are made for banking business, channels, products, reasons and the like.
Therefore, fuzzy rules can be set in multiple levels in the fuzzy rule base, first a first classification result is obtained, and then classification is continued according to each fuzzy rule under the level of the first classification result, so that a reclassification result is obtained. And (5) sorting for more than two times in sequence, and marking by taking the last sorting result as a label.
In some embodiments, after labeling the complaint text, adding the marked identification and the corresponding label of the complaint list into a database, and performing visual display through inquiring, screening, counting and other operations, so that the overall complaint situation analysis, the complaint situation analysis of each management department, the complaint situation analysis of key areas, the complaint situation analysis of each branch, the statistical analysis of labels with different dimensions including customers, channels, products, behaviors, complaint causes, the quality effect analysis of complaint treatment and the like can be realized, the evaluation and management of the complaint treatment quality by the management department can be improved, and meanwhile, the basis can be provided for improving the measures such as products, service behaviors and the like.
In some embodiments, the complaints are automatically dispatched to the corresponding complaint handling departments using a program to handle and feed back the complaints according to different types of tags.
According to the embodiment of the disclosure, M historical complaint texts are processed in parallel based on N distributed nodes of the distributed computing platform, at least one fuzzy rule is extracted to pre-establish a fuzzy rule base, and the parallelization processing mode saves manpower and improves processing efficiency. After the complaint text is classified by using the fuzzy rules in the fuzzy rule library, automatic labeling is realized, complaint data can be labeled more quickly and more efficiently, service optimization of enterprises is assisted, and compared with the prior art, errors caused by manually labeling the data are reduced, and the method has the advantages of high efficiency, short time consumption, low cost and the like.
Fig. 3 schematically illustrates a flow chart of pre-establishing a fuzzy rule base in accordance with an embodiment of the present disclosure. As shown in fig. 3, at least one fuzzy rule is configured to be obtained in advance by:
in operation S310, the M pieces of history complaint text are divided into N data partitions, where each data partition includes at least one piece of history complaint text, N and M are integers greater than or equal to 2, and N is less than or equal to M.
In some embodiments, M pieces of historical complaint text may be manually partitioned into N data partitions based on expert experience.
In other embodiments, M pieces of historical complaint text may be partitioned into N data partitions according to at least one of the user's region, occupation, gender, complaint channel, etc.
In other embodiments, automated classification based on semantics may be performed as shown in FIG. 4 below.
FIG. 4 schematically illustrates a flow chart of partitioning data according to an embodiment of the present disclosure. As shown in fig. 4, this embodiment is one of the embodiments of operation S310, including:
in operation S410, classifying each of the M historical complaint texts to obtain a second classification result of each of the historical complaint texts, where the second classification result of each of the historical complaint texts is any one of N preset classification results.
In operation S420, M pieces of history complaint text are divided into N data partitions based on the second classification result of each history complaint text.
For example, the feature extraction model may be used to convert M pieces of history complaint text into vectors of multiple dimensions, and then, using the clustering algorithm k-means, N preset classification results are selected as multiple cluster centers, and data belonging to the same class is aggregated. It can be understood that when the feature extraction model is used to convert M pieces of history complaint text into vectors with multiple dimensions, the vectors obtained by converting data of the same sentence pattern or similar sentence patterns are the same or similar. The method can convert each history complaint text into vectors with multiple dimensions, classify the vectors, and aggregate the vectors with the same or similar vectors into one type.
In operation S320, the N distributed nodes are operated in parallel by using the distributed computing platform to process the N data partitions in a one-to-one correspondence, and at least one fuzzy rule is extracted. The N distributed nodes can respectively execute Wang-Mendel algorithm to process data partition.
According to the embodiment of the disclosure, the data partition can be scientifically divided according to the classification result, and the data partition is conveniently distributed to each node for parallelization processing.
Complaint text content is typically unstructured or semi-structured data, and lacks the structuring and organization of data in relational databases, so that the original data needs to be preprocessed to be converted into a more regular and content-responsive characteristic representation. The text preprocessing process mainly comprises word segmentation, denoising, duplication removal, void removal, reference disambiguation and the like of the text.
Fig. 5 schematically illustrates a flow chart of preprocessing according to an embodiment of the present disclosure. Prior to dividing the M pieces of history complaint text into N data partitions, as shown in fig. 5, the preprocessing content of this embodiment includes:
in operation S510, the M original historical complaint texts are segmented to obtain M preprocessed texts.
The text word segmentation usually uses a statistical method and a dictionary method, the Chinese word classification algorithm is complex, and a maximum matching method can be adopted, for example: "user incoming call reflects, was previously charged by Shanghai century Lianhua service for 50 yuan, but was not yet charged. Can be divided into: "user/incoming/reflect, previous/pass/Shanghai/century Lianhua/service/recharge/50 yuan, but/not check out". For example, aiming at M original historical complaint texts, chinese word segmentation is carried out by adopting a third party library Jieba of Python.
In operation S520, noise reduction processing is performed on the M pieces of preprocessed text, so as to obtain M pieces of history complaint text.
Text noise reduction includes removing words that are not useful for text classification, such as some character expressions, nonsensical words, etc. Stop words are removed, and a plurality of language gas auxiliary words are commonly found in Chinese grammar: the interference text analysis such as ' no ', ' bar ', ' and the like needs to be removed according to specific situations, and the text preprocessing accuracy is improved.
For example, a Chinese stop word list is introduced, some nonsensical functional words in the corpus are removed, and the working efficiency is improved. And short nonsensical complaint report text is removed to avoid the influence on the follow-up accuracy.
In addition, the duplicate removal and the null removal operations can be continued. Because the original data may include special symbols or emoticons, when the original data is uploaded to the server, the server side may not support the emoticons, and the special symbols or emoticons, and the like included in the original data may be displayed in the form of spaces, so that the blank removing operation can be performed on the original data in the present disclosure, that is, the special symbols or emoticons, and the like in the original data can be understood as being removed.
The deduplication operation can be understood as screening the same and repeated sentences in the original data, and eliminating any one of the sentences. The problem of different user inputs may have similar meaning, that is, the original data may have similar meaning data, so the original data may be subjected to the deduplication operation in the present disclosure.
It should be noted that, the operation of preprocessing the original data is not limited to operations such as de-duplication, de-nulling, de-noising, and the like, which is not limited in this disclosure.
In operation S530, M feature vectors are extracted based on M pieces of history complaint text.
In order to enable a computer to understand the semantics of words to a certain extent, the pre-processed complaint work order text is processed based on a pre-trained word2vec model, and feature vectors of each complaint work order text are obtained. The M feature vectors may be converted into an RDD data model to facilitate distributed computing platform processing.
The modeling method based on the Wang-Mendel algorithm is described below:
assume a set of input-output data objects:
wherein,,representing input variables and output variables. M is the number of data pairs, namely M pieces of historical complaint text. Extracting description output variable +_through the M input and output data >And input variable->The IF-THEN fuzzy rules of the relation form a fuzzy rule base, and n is more than or equal to 1 and less than or equal to M. The following are provided:
If x1 isand...And x n is/>THEN y is B 1 .(1=1,...,N)
wherein, in the historical complaint text, x= (x) 1 ,..,x n ) T ∈R n For input variables (feature vectors of the piece of history complaint text), x 1 For one of the elements it is possible to use,for fuzzy aggregation, y E R is an output variable, namely a complaint label. B (B) 1 Is a fuzzy set.
Each fuzzy rule represents a knowledge/experience of human beings, when the input variable satisfies if condition, the output of y is B 1 . A piece of history complaint text obtained by preprocessing can be used for respectively extracting a plurality of fuzzy rules, and a large number of fuzzy rules form a fuzzy rule base. The steps of dividing the fuzzy set, extracting the fuzzy rule, calculating the intensity of the fuzzy rule and merging the fuzzy rule are explained below with reference to fig. 6 to 10.
Firstly, fuzzy set division is carried out on input and output spaces represented by given data (historical complaint text), the number of suitable fuzzy sets can be set according to actual problem conditions, the set distribution can be in a uniform distribution mode, and then the distribution conditions can be finely adjusted according to modeling results.
For example, fuzzy subspaces are partitioned for each variable and corresponding membership functions are defined In each interval [ a ] i ,β i ](i=1, 2,) N is defined as N i Fuzzy set->For any x i ∈[a i ,β i ]All have->Make->
Then, calculate the data pair according to each input/outputDetermine->Membership in fuzzy set->Membership value and +.>Membership to fuzzy set B 1 (1=1,2,...,N y ) Membership value of (i) i.e. calculate +.>And->
Fig. 6 schematically illustrates a flow chart of fuzzy aggregation partitioning in accordance with an embodiment of the present disclosure. As shown in fig. 6, this embodiment is one of the embodiments of operation S320, the first distributed node is any one of the N distributed nodes, the first data partition is any one of the N data partitions, and processing the first data partition with the first distributed node includes:
in operation S610, K first fuzzy sets are partitioned for feature vectors of each of the history complaint texts in the first data partition, where K is an integer greater than or equal to 1.
The feature vector of each piece of history complaint text refers to a vector for representing data, and may be composed of a plurality of features. For example, feature vectors may include text content, keyword frequencies, keyword lengths, location information, semantic emotion, and the like. These fuzzy sets can be used for classification and labeling in subsequent processing.
In operation S620, K first membership degrees of each of the historical complaint texts are calculated based on the predetermined first membership functions, the K first membership degrees including membership degrees to which the feature vector belongs to the K first fuzzy sets, respectively.
The first membership function is used for calculating membership between one or more elements in the feature vector and each fuzzy set, and represents a mapping relation. The relationship between one or more elements in the feature vector and each fuzzy set is quantified using a first membership function, and these relationships are quantitatively characterized as membership for subsequent classification and labeling.
For example, the input variable may be a numerical feature vector for each piece of historical complaint text, where each element corresponds to a numerical value of a word or phrase map. Assuming that the vector has a plurality of elements, the range of values for each element may be divided into a number of fuzzy sets, each fuzzy set having a degree of membership for indicating the degree to which the value of the element belongs to the fuzzy set.
Taking the example of unreasonable commission of your bank, it can be converted into a numerical feature vector. The text is represented as a vector, for example using a bag of words model, where each element represents a word and the value of the element represents the number of times or weight the word appears in the text. For example, a vector of length N, where N is the size of the vocabulary, may be represented as [0, 1], where the third element represents the number of occurrences of "commission" and the fourth element represents the number of occurrences of "unreasonable", assuming that the vocabulary includes four words, "your", "bank", "commission", "unreasonable". The first and second elements correspond to "your", "bank", and the fuzzy set is not partitioned after denoising.
The value of each feature may then be divided into three fuzzy sets, for example, for "commission" and "high", "medium" and "low", each set having a membership function describing the similarity of the feature value to the set. Assume that the fuzzy set of "commissions" is divided as follows:
high (H): the handling fee is greater than or equal to 100 yuan, and the membership function is a triangle membership function.
In (M): the handling fee is between 50 yuan and 100 yuan, and the membership function is a trapezoid membership function.
Low (L): the handling fee is less than or equal to 50 yuan, and the membership function is a triangle membership function.
And then, acquiring one or more transactions for charging the commission in a preset time period of the user, and calculating a first membership degree between each fuzzy set according to the amount of each commission.
It can be understood that each node in the N distributed nodes may process the corresponding partition according to the descriptions of operations S610 to S620 to obtain K first membership degrees of each historical complaint text in the partition, which is not described herein again.
Fig. 7 schematically illustrates a flow chart of fuzzy aggregation partitioning in accordance with another embodiment of the present disclosure. As shown in fig. 7, processing the first data partition with the first distributed node further includes:
In operation S710, N second fuzzy sets are classified for predicted complaint labels of each of the history complaint texts based on N preset classification results.
For example, the preset classification result includes complaint business, complaint business handling channel, complaint cause, complaint product, complaint appeal, etc. The following second fuzzy set division "complaint business", "complaint business handling channel", "complaint cause", "complaint product", "complaint" and the like may be performed for each complaint label. Each set has a membership function describing the membership of the text to the set.
Predictive complaint labels refer to output variables that are the undetermined label results, each second fuzzy set being likely to be the final complaint label.
In operation S720, N second membership degrees of each of the historical complaint texts are calculated based on the predetermined second membership functions, the N second membership degrees including membership degrees to which the complaint labels are respectively affiliated to the N second fuzzy sets.
Taking the example of unreasonable commission of your bank, the second membership function may be a multi-segment mapping function determined according to expert experience, or the second membership function may be inferred according to a bayesian algorithm, and the prior probability and likelihood probability are calculated by adopting bayesian inference, so as to obtain the posterior probability, that is, the probability that the sample belongs to a certain class. Predicting probability that the complaint label belongs to a fuzzy set, namely membership degree of the fuzzy set.
In some embodiments, a plurality of third fuzzy sets may be further partitioned for the predicted complaint labels, where the second fuzzy set and the third fuzzy set have a mapping relationship, specifically as follows:
complaint business: fuzzy sets for complaint services may include "credit card service", "loan service", "deposit service", and the like, each set having a membership function describing how similar text is to the set.
Complaint business handling channels: fuzzy sets for complaint business offices may include "phone", "mail", "website", "counter", etc., each set also having a membership function to describe how similar text is to the set.
Complaint causes: fuzzy sets for complaint reasons may include "quality of service", "interest rate", "management fee", "account management", "security problem", etc., each set also having a membership function describing how similar text is to the set.
Complaint products: fuzzy sets for complaint products may include "credit card", "loan", "savings account", "fund investment", etc., each set also having a membership function to describe how similar the text is to the set.
Complaint complaints: fuzzy sets for complaint complaints may include "compensate", "solve problem", "refund", "promote service", etc., each set also having a membership function to describe how similar the text is to the set.
Complaint channel: fuzzy sets for complaint channels may include "customer service hotline", "online customer service", "complaint mailbox", "social media", etc., each set also having a membership function to describe how similar text is to the set.
And then, calculating a third membership degree of the predicted complaint label and each third fuzzy set, and participating in the subsequent fuzzy rule extraction.
According to the embodiment of the disclosure, N second fuzzy sets are divided from the predicted complaint labels, and N second membership degrees are calculated, so that complaint scenes of different industries can be attached conveniently, and the accuracy of fuzzy rule extraction is improved. Compared with the label of the manual labeling training sample, the label labeling method can reduce errors caused by manual labeling data and improve labeling precision and reliability.
In some embodiments, processing the first data partition with the first distributed node further comprises: extracting an initial fuzzy rule corresponding to each historical complaint text aiming at the feature vector and the predictive complaint label of each historical complaint text, wherein the corresponding initial fuzzy rule is used for describing a conditional result relationship between the feature vector and the predictive complaint label.
For example, in complaint text classification, for example, the initial fuzzy rule may represent "if the text length is 10 characters or less, it may belong to the service attitude problem category" or the like. For example, in a complaint text classification, the conditional outcome relationship may represent that "if words such as 'bad evaluation' or 'dissatisfaction' are contained in the text, it may belong to the service attitude problem category".
According to the embodiment of the disclosure, the feature vector of the text and the predicted complaint label are structurally represented by extracting the initial fuzzy rule, and the condition result relation between the feature vector of the text and the predicted complaint label is further described by using a fuzzy logic method, so that a basis and a foundation are provided for subsequent classification and labeling.
Fig. 8 schematically illustrates a flowchart of extracting a first fuzzy rule according to an embodiment of the present disclosure. As shown in fig. 8, this embodiment is one of the embodiments of operation S320, and extracting the first fuzzy rule includes:
in operation S810, the intensity of the corresponding initial fuzzy rule is calculated according to the K first membership degrees and the N second membership degrees of each of the history complaint texts.
There is one intensity for each initial fuzzy rule. In some embodiments, the intensity D of the initial blur rule is defined as follows:
Wherein,,representation->Membership to fuzzy setsMembership value of (2). />Representation->Membership to fuzzy set B 1 (1=1,2,...,N y ) Membership value of (2).
In some embodiments, if the amount of data in the actual system is small, each of the initial fuzzy rules may be manually set to have an intensity through expert experience.
In operation S820, an initial fuzzy rule having the maximum intensity in the first data partition is determined.
In operation S830, an initial fuzzy rule having the maximum intensity is determined as a first fuzzy rule.
Through the extraction of the initial fuzzy rule, each pair of data has a corresponding initial fuzzy rule, so that the conflicted initial fuzzy rule can occur in high probability, namely, IF conditions of the rules are the same, but the conclusions are different. In order to ensure that the rules are concise, a first fuzzy rule is finally reserved for describing the characteristics on the corresponding attributes, and the overlapped rules are required to be combined.
According to the embodiment of the disclosure, IF each distributed node processes a data partition divided after obtaining a second classification result for M pieces of history complaint text, it may be determined that the initial fuzzy rule extracted in the data partition has the same IF condition. For each initial fuzzy rule belonging to the same data partition, there is an intensity, and the larger the intensity of the rule is, the larger the reliability is, and the larger the influence on the rule is after merging. Therefore, according to the maximum intensity screening, the coincidence rule can be finally combined into a first fuzzy rule.
In other embodiments, it is contemplated that the initial fuzzy rules extracted within the data partition may not all have the same IF condition. And therefore, after the initial fuzzy rules proposed by all the distributed nodes are re-divided, the distributed nodes are distributed again for parallel processing. Further description is provided below with reference to fig. 9 and 10.
Fig. 9 schematically illustrates a flow chart of extracting fuzzy rules according to another embodiment of the present disclosure. Each initial fuzzy rule is characterized in terms of a conditional statement (IF statement) and a result statement (THEN statement), as shown in fig. 9, which is one of the embodiments of operation S320, the extracting of at least one fuzzy rule includes:
in operation S910, all the initial fuzzy rules extracted by the N distributed nodes are obtained.
In operation S920, on the premise of having the same conditional statement, all the initial fuzzy rules are divided to obtain S rule partitions, where S is greater than or equal to 1.
In operation S930, the S rule partitions are processed in a one-to-one correspondence by using the S distributed nodes running in parallel by using the distributed computing platform, and S fuzzy rules are extracted.
According to the embodiment of the disclosure, the fuzzy rule can be extracted in parallel by a distributed computing platform and a fuzzy logic method, so that the processing efficiency and accuracy are improved, and the time and cost are saved.
Fig. 10 schematically illustrates a flow chart of parallel extraction of fuzzy rules according to an embodiment of the present disclosure. The second distributed node is any one of the S distributed nodes, the first rule partition is any one of the S rule partitions, as shown in fig. 10, and the embodiment is one of embodiments of operation S930, and processing the first rule partition with the second distributed node includes:
in operation S1010, for each initial fuzzy rule in the first rule partition, K first membership degrees and N second membership degrees corresponding to the history complaint text are obtained.
In operation S1020, the intensity of each initial fuzzy rule is calculated according to the acquired K first membership degrees and N second membership degrees.
In operation S1030, an initial fuzzy rule having the maximum intensity in the first rule section is determined, and S fuzzy rules include the initial fuzzy rule having the maximum intensity.
According to the embodiment of the disclosure, each initial fuzzy rule has an intensity, and the larger the intensity of the rule is, the greater the reliability is, and the larger the influence on the rule after combination is. Therefore, according to the maximum intensity screening, finally, each distributed node can combine the superposition rule into one fuzzy rule which is used as one of the S fuzzy rules.
It will be appreciated that each of the S distributed nodes may process a corresponding partition according to the descriptions of fig. 9 and 10, and will not be described herein. In the fuzzy rule extraction process, because all data are not associated, training data can be randomly divided into independent data partitions at the stage, and the independent data partitions are distributed to distributed nodes to extract initial fuzzy rules. And dividing the obtained initial fuzzy rules which possibly have coincidence by using the same mode, carrying out rule combination, and adding the data into a fuzzy rule base to mark and classify the related complaint information based on the rule base data.
Based on the complaint text labeling method, the disclosure also provides a complaint text labeling device. The device will be described in detail below with reference to fig. 11.
Fig. 11 schematically shows a block diagram of a complaint text labeling apparatus according to an embodiment of the present disclosure.
As shown in fig. 11, the complaint text labeling apparatus 1100 of this embodiment includes a text acquisition module 1110, a text classification module 1120, and a text labeling module 1130.
The text acquisition module 1110 may perform operation S210 for acquiring a complaint text of the user, wherein the complaint text includes contents fed back by the user through the complaint channel.
The text classification module 1120 may perform operation S220 for classifying the complaint text using at least one fuzzy rule in the fuzzy rule base to obtain a first classification result.
Text marking module 1130 may perform operation S230 for marking complaint text according to the first classification result.
In some embodiments, complaint text labeling apparatus 1100 may further include a fuzzy rule base creation module configured to obtain at least one fuzzy rule in advance by: dividing M pieces of history complaint texts into N data partitions, wherein each data partition comprises at least one piece of history complaint text, N and M are integers which are larger than or equal to 1, and N is smaller than or equal to M. And processing N data partitions in a one-to-one correspondence manner by using N distributed nodes which are operated in parallel by using the distributed computing platform, and extracting at least one fuzzy rule.
In some embodiments, the fuzzy rule base building module may include a text dividing unit that may perform operations S410-S420, which are not described in detail herein.
In some embodiments, the fuzzy rule base building module may include a preprocessing unit that may perform operations S510-S530, which are not described herein.
In some embodiments, the fuzzy rule base building module may include a fuzzy aggregation dividing unit, which may perform operations S610 to S620 and operations S710 to S720, which are not described herein.
In some embodiments, the fuzzy rule base building module may include a fuzzy rule merging unit that may perform operations S810-S830, operations S910-S930, and operations S1010-S1030, which are not described herein.
Note that complaint text labeling apparatus 1100 includes modules for performing the steps of any one of the embodiments described above with respect to fig. 2-10, respectively. The implementation manner, the solved technical problems, the realized functions and the realized technical effects of each module/unit/sub-unit and the like in the apparatus part embodiment are the same as or similar to the implementation manner, the solved technical problems, the realized functions and the realized technical effects of each corresponding step in the method part embodiment, and are not repeated herein.
Any of the multiple modules in complaint text labeling apparatus 1100 may be combined into one module to be implemented, or any of the modules may be split into multiple modules, according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.
At least one of complaint text labeling apparatus 1100 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable way of integrating or packaging the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware, according to embodiments of the present disclosure. Alternatively, at least one of complaint text labeling apparatus 1100 can be implemented at least in part as a computer program module that, when executed, performs a corresponding function.
Fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement a complaint text labeling method according to an embodiment of the present disclosure.
As shown in fig. 12, an electronic device 1200 according to an embodiment of the present disclosure includes a processor 1201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. The processor 1201 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 1201 may also include on-board memory for caching purposes. The processor 1201 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 1203, various programs and data required for the operation of the electronic apparatus 1200 are stored. The processor 1201, the ROM 1202, and the RAM 1203 are connected to each other through a bus 1204. The processor 1201 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1202 and/or RAM 1203. Note that the program may be stored in one or more memories other than the ROM 1202 and the RAM 1203. The processor 1201 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.
According to an embodiment of the disclosure, the electronic device 1200 may also include an input/output (I/O) interface 1205, the input/output (I/O) interface 1205 also being connected to the bus 1204. The electronic device 1200 may also include one or more of the following components connected to the I/O interface 1205: including an input section 1206 for a keyboard, mouse, etc. Including an output portion 1207 such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc. Including a storage portion 1208 of a hard disk or the like. And a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments. Or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include the ROM 1202 and/or the RAM 1203 and/or one or more memories other than the ROM 1202 and the RAM 1203 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1201. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, and downloaded and installed via a communication portion 1209, and/or from a removable medium 1211. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1201. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (13)

1. A complaint text labeling method comprises the following steps:
acquiring a complaint text of a user, wherein the complaint text comprises contents fed back by the user through a complaint channel;
classifying the complaint text by using at least one fuzzy rule in a fuzzy rule base to obtain a first classification result;
labeling the complaint text according to the first classification result;
wherein the at least one fuzzy rule is configured to be obtained in advance by:
dividing M pieces of history complaint texts into N data partitions, wherein each data partition comprises at least one piece of history complaint text, N and M are integers which are more than or equal to 2, and N is less than or equal to M;
And processing the N data partitions in a one-to-one correspondence manner by using N distributed nodes which are operated in parallel by using a distributed computing platform, and extracting the at least one fuzzy rule.
2. The method of claim 1, wherein the dividing the M pieces of historical complaint text into N data partitions comprises:
classifying each historical complaint text in the M historical complaint texts to obtain a second classification result of each historical complaint text, wherein the second classification result of each historical complaint text is any one of N preset classification results;
and dividing the M pieces of history complaint text into the N data partitions based on the second classification result of each history complaint text.
3. The method according to claim 2, wherein:
the first distributed node is any one of the N distributed nodes, and the first data partition is any one of the N data partitions;
the processing the N data partitions with the N distributed nodes of the distributed computing platform in a one-to-one correspondence includes processing the first data partition with the first distributed node, and specifically includes:
dividing K first fuzzy sets according to the feature vector of each historical complaint text in the first data partition, wherein K is an integer greater than or equal to 1;
And calculating K first membership degrees of each historical complaint text based on a predetermined first membership function, wherein the K first membership degrees comprise membership degrees of the feature vector respectively belonging to the K first fuzzy sets.
4. The method of claim 3, wherein the processing the first data partition with the first distributed node further comprises:
dividing N second fuzzy sets for the predicted complaint labels of each historical complaint text based on the N preset classification results;
and calculating N second membership degrees of each historical complaint text based on a predetermined second membership function, wherein the N second membership degrees comprise membership degrees of the complaint labels respectively belonging to the N second fuzzy sets.
5. The method of claim 4, wherein the processing the first data partition with the first distributed node further comprises:
extracting an initial fuzzy rule corresponding to each historical complaint text aiming at the characteristic vector and the predictive complaint label of each historical complaint text, wherein the corresponding initial fuzzy rule is used for describing a conditional result relationship between the characteristic vector and the predictive complaint label.
6. The method of claim 5, wherein the extracting the at least one fuzzy rule comprises extracting a first fuzzy rule, specifically comprising:
calculating the intensity of a corresponding initial fuzzy rule according to the K first membership degrees and the N second membership degrees of each historical complaint text;
determining an initial fuzzy rule with maximum intensity in the first data partition;
and determining the initial fuzzy rule with the maximum intensity as the first fuzzy rule.
7. The method of claim 5, wherein each initial fuzzy rule is characterized in terms of a conditional statement and a result statement, the extracting the at least one fuzzy rule comprising:
all initial fuzzy rules extracted by the N distributed nodes are obtained;
dividing all the initial fuzzy rules on the premise of having the same conditional statement to obtain S rule partitions, wherein S is greater than or equal to 1;
and operating S distributed nodes in parallel by using the distributed computing platform to process the S rule partitions in a one-to-one correspondence manner, and extracting S fuzzy rules.
8. The method of claim 7, wherein:
the second distributed node is any one of the S distributed nodes, and the first rule partition is any one of the S rule partitions;
The processing the S rule partitions in a one-to-one correspondence includes processing the first rule partition with the second distribution node, and specifically includes:
aiming at each initial fuzzy rule in the first rule partition, acquiring the K first membership degrees and the N second membership degrees of the corresponding historical complaint text;
calculating the intensity of each initial fuzzy rule according to the acquired K first membership degrees and the N second membership degrees;
and determining an initial fuzzy rule with the maximum intensity in the first rule partition, wherein the S-piece fuzzy rule comprises the initial fuzzy rule with the maximum intensity.
9. The method of claim 1, wherein prior to the dividing the M pieces of historical complaint text into N data partitions, the method further comprises:
word segmentation is carried out on M original historical complaint texts to obtain M preprocessed texts;
carrying out noise reduction treatment on the M pieces of preprocessed texts to obtain M pieces of historical complaint texts;
and extracting M eigenvectors based on the M pieces of history complaint text.
10. A complaint text labeling device, comprising:
the system comprises a text acquisition module, a text processing module and a text processing module, wherein the text acquisition module is used for acquiring a complaint text of a user, and the complaint text comprises contents fed back by the user through a complaint channel;
The text classification module is used for classifying the complaint text by utilizing at least one fuzzy rule in the fuzzy rule base to obtain a first classification result;
the text marking module is used for marking the complaint text according to the first classification result;
wherein the at least one fuzzy rule is configured to be obtained in advance by:
dividing M pieces of history complaint texts into N data partitions, wherein each data partition comprises at least one piece of history complaint text, N and M are integers which are larger than or equal to 1, and N is smaller than or equal to M;
and processing the N data partitions in a one-to-one correspondence manner by using N distributed nodes which are operated in parallel by using a distributed computing platform, and extracting the at least one fuzzy rule.
11. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.
12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 9.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.
CN202310632374.4A 2023-05-31 2023-05-31 Complaint text labeling method, device, equipment and medium Pending CN116662546A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310632374.4A CN116662546A (en) 2023-05-31 2023-05-31 Complaint text labeling method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310632374.4A CN116662546A (en) 2023-05-31 2023-05-31 Complaint text labeling method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116662546A true CN116662546A (en) 2023-08-29

Family

ID=87716654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310632374.4A Pending CN116662546A (en) 2023-05-31 2023-05-31 Complaint text labeling method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116662546A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726345A (en) * 2023-11-30 2024-03-19 北京领雁科技股份有限公司 Complaint data processing method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726345A (en) * 2023-11-30 2024-03-19 北京领雁科技股份有限公司 Complaint data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Gupta et al. Sentiment analysis for stock price prediction
CN110909165A (en) Data processing method, device, medium and electronic equipment
CN106447066A (en) Big data feature extraction method and device
CN106445988A (en) Intelligent big data processing method and system
US11495227B2 (en) Artificial intelligence (AI) based user query intent analyzer
CN113220999B (en) User characteristic generation method and device, electronic equipment and storage medium
CN111062803A (en) Financial business query and review method and system
Fu et al. A sentiment-aware trading volume prediction model for P2P market using LSTM
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN115689717A (en) Enterprise risk early warning method, device, electronic equipment, medium and program product
US20200043019A1 (en) Intelligent identification of white space target entity
CN116662546A (en) Complaint text labeling method, device, equipment and medium
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
Choi et al. Fake review identification and utility evaluation model using machine learning
CN117911079A (en) Personalized merchant marketing intelligent recommendation method and system
CN112527969A (en) Incremental intention clustering method, device, equipment and storage medium
US20210319457A1 (en) Utilizing models to aggregate data and to identify insights from the aggregated data
Mary et al. ASFuL: Aspect based sentiment summarization using fuzzy logic
CN117033431A (en) Work order processing method, device, electronic equipment and medium
Li et al. An improved genetic-XGBoost classifier for customer consumption behavior prediction
CN116089886A (en) Information processing method, device, equipment and storage medium
CN116308602A (en) Recommended product information generation method and device, electronic equipment and medium
KR20230059364A (en) Public opinion poll system using language model and method thereof
CN114741501A (en) Public opinion early warning method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination