CN116431809A - Text labeling method, device and storage medium based on bank customer service scene - Google Patents

Text labeling method, device and storage medium based on bank customer service scene Download PDF

Info

Publication number
CN116431809A
CN116431809A CN202310404470.3A CN202310404470A CN116431809A CN 116431809 A CN116431809 A CN 116431809A CN 202310404470 A CN202310404470 A CN 202310404470A CN 116431809 A CN116431809 A CN 116431809A
Authority
CN
China
Prior art keywords
dialogue
cluster
sentence
statement
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310404470.3A
Other languages
Chinese (zh)
Inventor
邬默
昝云飞
徐红
高翔
纪达麒
陈运文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Daguan Technology Beijing Co ltd
Original Assignee
Daguan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daguan Technology Beijing Co ltd filed Critical Daguan Technology Beijing Co ltd
Priority to CN202310404470.3A priority Critical patent/CN116431809A/en
Publication of CN116431809A publication Critical patent/CN116431809A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text labeling method, a text labeling device and a storage medium based on a bank customer service scene. Comprising the following steps: acquiring a dialogue text of a customer service scene of a bank, wherein the dialogue text comprises dialogue sentences; performing vector conversion on dialogue sentences in the dialogue text to obtain corresponding dialogue sentence vectors, and performing density clustering on the dialogue sentence vectors to obtain sentence vector clusters; obtaining a statement cluster to be calibrated according to the statement vector cluster, and labeling the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated. The sentence clusters to be calibrated are obtained by clustering a large number of dialogue sentences in the dialogue text, and the sentence clusters to be calibrated are labeled according to the semantics of the obtained sentence clusters to be calibrated, so that the workload of manual labeling and sentence analysis is saved, and the efficiency and accuracy of text labeling are improved.

Description

Text labeling method, device and storage medium based on bank customer service scene
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for labeling text based on a customer service scene of a bank, and a storage medium.
Background
In current banking, a large amount of text data of customer service and customer conversations has been deposited in a banking database. However, due to the excessive volume of data, there is insufficient human resources to integrate it into valid, usable structured data, resulting in a large amount of underutilized data resources. The current common scheme is based on the category to which the text label belongs, and then supervised model training is carried out through different classification algorithms, so that the model learns the characteristics of each category.
However, the above-mentioned large-scale speech transcribed dialogue text cannot be effectively classified by manual labeling due to the excessive volume of data, and also has no way for a business person to clearly specify classification categories because it is not known which points and categories of interest the batch of data will cover before confirming the data.
Disclosure of Invention
The invention provides a text labeling method, a device and a storage medium based on a bank customer service scene, which are used for realizing efficient and accurate labeling of texts based on the bank customer service scene.
According to a first aspect of the present invention, there is provided a text labeling method based on a customer service scene of a bank, including:
acquiring a dialogue text of a customer service scene of a bank, wherein the dialogue text comprises dialogue sentences;
performing vector conversion on dialogue sentences in the dialogue text to obtain corresponding dialogue sentence vectors, and performing density clustering on the dialogue sentence vectors to obtain sentence vector clusters;
and acquiring a statement cluster to be calibrated according to the statement vector cluster, and labeling the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated.
According to another aspect of the present invention, there is provided a text labeling device based on a customer service scene of a bank, including: the dialogue text acquisition module is used for acquiring dialogue texts of customer service scenes of banks, wherein the dialogue texts comprise dialogue sentences;
the sentence vector cluster acquisition module is used for carrying out vector conversion on dialogue sentences in the dialogue text to acquire corresponding dialogue sentence vectors, and carrying out density clustering on each dialogue sentence vector to acquire a sentence vector cluster;
the label labeling module is used for obtaining the statement cluster to be calibrated according to the statement vector cluster, and labeling the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present invention.
According to another aspect of the invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to perform the method according to any of the embodiments of the invention.
According to the technical scheme, a large number of dialogue sentences in the dialogue text are clustered to obtain the sentence clusters to be calibrated, and the sentence clusters to be calibrated are labeled according to the semantics of the obtained sentence clusters to be calibrated, so that the workload of manual labeling and sentence analysis is saved, and the efficiency and accuracy of text labeling are improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a text labeling method based on a customer service scene of a bank according to an embodiment of the invention;
fig. 2 is a flowchart of a text labeling method based on a customer service scene of a bank according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a text labeling device based on a customer service scene of a bank according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a text labeling method based on a customer service scene of a bank, where the method may be performed by a text labeling device based on the customer service scene of the bank, and the device may be implemented in hardware and/or software. As shown in fig. 1, the method includes:
step S101, acquiring a dialogue text of a customer service scene of a bank.
Optionally, acquiring the dialogue text of the customer service scene of the bank includes: acquiring dialogue data of a specified type in a bank client scene, wherein the specified type comprises a pre-credit marketing class, a mid-credit auditing class, a post-credit prompting class or a satisfaction investigation class; preprocessing dialogue data, splitting the preprocessed dialogue data according to the identities of clients and customer service to obtain client texts and customer service texts, wherein the client texts comprise client dialogue sentences, and the customer service texts comprise customer service dialogue sentences; and taking the customer text or the customer service text as a bank customer service scene dialogue text.
More specifically, in this embodiment, dialogue data of a pre-loan marketing class, a mid-loan audit class, a post-loan fee-promoting class, or a satisfaction survey class is obtained from a bank database, however, this embodiment is merely illustrative, and any data related to a customer service scenario of a bank is within the scope of the present application, and the present embodiment is not limited thereto. The dialogue data of each category includes dialogue content between a customer and customer service, and the dialogue data is preprocessed, and specific preprocessing operations can remove interfering words such as stop words, perform word segmentation processing, and the like, and in this embodiment, specific preprocessing operation content is not limited. After preprocessing, the preprocessed dialogue data is split according to the identities of clients and customer services, all the dialogue sentences related to the clients are integrated into the same file to obtain a first text, all the dialogue sentences related to the customer services are integrated into the same file to obtain a second text, each dialogue in the dialogue text is one dialogue, and because the dialogue data searched and collected from a bank database is huge, millions of dialogue sentences are usually contained in the clients.
Step S102, carrying out vector conversion on dialogue sentences in the dialogue text to obtain corresponding dialogue sentence vectors, and carrying out density clustering on the dialogue sentence vectors to obtain sentence vector clusters.
Optionally, performing vector conversion on the dialogue sentence in the dialogue text to obtain a corresponding dialogue sentence vector, including: obtaining a pre-training language model, wherein the pre-training language model comprises a Bert model; and carrying out vector conversion on the dialogue sentences by adopting a pre-training language model to obtain corresponding dialogue sentence vectors.
Optionally, performing density clustering on dialogue sentence vectors to obtain sentence vector clusters, including: performing data segmentation on dialogue sentence vectors to obtain a plurality of dialogue sentence vector sets, wherein each dialogue sentence set comprises dialogue sentence vectors with the same quantity; clustering each dialogue sentence vector set according to a first cluster parameter by adopting a density clustering algorithm to obtain a plurality of sentence vector clusters, wherein each sentence vector cluster comprises dialogue sentence vectors with vector distances within a specified range.
Specifically, in this embodiment, after the dialogue text is obtained, the dialogue sentence in the dialogue text is subjected to vector speech processing, specifically, the Bert pre-training language model is adopted to convert the dialogue sentence into the dialogue sentence vector, and the number of dialogue sentence vectors is very huge due to the very large number of dialogue sentences. In this embodiment, data segmentation is performed on a plurality of dialogue sentence vectors, for example, when the number of dialogue sentence vectors corresponding to a client text is 20 ten thousand, segmentation is performed by using 10 ten thousand as a group to obtain two dialogue sentence vector sets: the customer dialogue sentence vector set a and the customer dialogue sentence vector set B are, of course, described only by taking the dialogue sentence vector corresponding to the customer dialogue text as an example, and the customer dialogue sentence vector set C and the customer dialogue sentence vector set D can be obtained by splitting the dialogue sentence vector corresponding to the customer dialogue text in the same manner as above. Of course, this embodiment is merely illustrative, and the specific number of dialogue sentence vector sets obtained by splitting each dialogue text is not limited. In the embodiment, the dialogue sentence vectors corresponding to the dialogue texts are split, so that the situations that the machine is excessively consumed and the effect cannot be controlled due to the excessive text quantity can be avoided.
After each dialogue sentence vector set is obtained, clustering is performed on each dialogue sentence vector set according to a first cluster parameter, for example, 10 clusters by adopting a density clustering algorithm, so that at least 10 sentence vector clusters can be clustered in each dialogue sentence vector set, for example, 10 sentence vector clusters are obtained by a client dialogue sentence vector set A through clustering, 11 sentence vector clusters are obtained by a client dialogue sentence vector set B through clustering, 11 sentence vector clusters are obtained by a customer service dialogue sentence vector set C through clustering, and 12 sentence vector clusters are obtained by a customer service dialogue sentence vector set D through clustering.
Step S103, obtaining a statement cluster to be calibrated according to the statement vector cluster, and labeling the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated.
Optionally, obtaining the statement cluster to be calibrated according to the statement vector cluster includes: converting dialogue sentence vectors in each sentence vector cluster into dialogue sentence acquisition sentence clusters; acquiring a first statement cluster set and a second statement cluster set according to the semantic state of each semantic cluster, wherein the first statement cluster set comprises statement clusters with definite semantics, and the second statement cluster set comprises statement clusters with ambiguous semantics; and acquiring the statement cluster to be calibrated according to the first statement cluster set and the second statement cluster set.
Optionally, acquiring the statement cluster to be calibrated according to the first statement cluster set and the second statement cluster set includes: performing vector conversion on each dialogue sentence in the second sentence cluster set to obtain a corresponding dialogue sentence vector; performing iterative clustering on dialogue sentence vectors converted by the second sentence cluster set by adopting a density algorithm according to the designated cluster parameters to obtain a newly added sentence vector cluster; converting dialogue sentence vectors in each newly added sentence vector cluster into dialogue sentences to obtain a third sentence cluster set, wherein the third sentence cluster set comprises sentence clusters with definite semantics; and taking the statement clusters contained in the first statement cluster set and the third statement cluster set as statement clusters to be calibrated.
Specifically, in this embodiment, after the sentence vector clusters obtained by the density clustering algorithm are obtained, the dialogue sentence vectors in each sentence vector cluster are subjected to the vector reverse conversion to obtain the sentence clusters, and the sentence clusters contain the dialogue sentences after conversion. At this time, the user can analyze the front 100 rows of each sentence cluster to determine whether the user can determine the semantics of each sentence cluster based on the 100 rows, and form the sentence cluster with definite semantics into a first sentence cluster set, and form the cluster with ambiguous semantics into a second sentence cluster set.
It should be noted that, in this embodiment, the semantic clear sentence clusters included in the first sentence cluster set are directly used as the sentence clusters to be calibrated. Whereas for the second set of sentence clusters, a finer granularity of clustering is required because the semantic ambiguous sentence clusters are included. At this time, each dialogue sentence in the second sentence cluster set is subjected to vector conversion to obtain a corresponding dialogue sentence vector, and a density algorithm is adopted to perform second-round clustering by adopting specified cluster parameters. Therefore, the designated cluster parameters adopted for the second round of clustering are larger than the first cluster parameters, for example, 20 clusters, a new sentence vector cluster is obtained by carrying out iterative clustering on dialogue sentence vectors converted by the second sentence cluster set, and after two rounds of clustering, if the sentence clusters corresponding to the new sentence vectors still exist in the sentence clusters with undefined semantics, multiple rounds of clustering can be carried out, dialogue sentence vectors in the finally obtained new sentence vector cluster are converted into dialogue sentences to obtain a third sentence cluster set, wherein the third sentence cluster set also comprises the sentence clusters with definite semantics, and all sentence clusters in the third sentence cluster set obtained by the multiple rounds of clustering are also used as sentence clusters to be calibrated.
Optionally, labeling the statement cluster to be calibrated according to the semantics of the semantic cluster to be calibrated, including: displaying the statement cluster to be calibrated to receive the semantics of the statement cluster to be calibrated, which are determined by a user; and determining the label according to the semantics, and labeling the label into the statement cluster to be calibrated.
Specifically, in this embodiment, after the statement cluster to be calibrated is obtained, the statement cluster to be calibrated is displayed, so that a user can determine the specific semantics of the statement cluster to be calibrated, and the user does not need to analyze dialogue statements at this time, and only needs to summarize and summarize. In the embodiment, the labels of the sentence clusters can be directly determined according to the semantics, and the labels are marked in the sentence clusters to be marked, so that the concentrated marking of a plurality of dialogue sentences is realized, and the manual marking cost is obviously reduced. Since the semantic definite statement clusters can be obtained through clustering, the statement clusters to be calibrated are marked according to the definite semantics, and the accuracy of label marking can be remarkably improved.
According to the embodiment of the invention, the sentence clusters to be calibrated are obtained by clustering a large number of dialogue sentences in the dialogue text, and the sentence clusters to be calibrated are labeled according to the semantics of the obtained sentence clusters to be calibrated, so that the workload of manual labeling and sentence analysis is saved, and the efficiency and accuracy of text labeling are improved.
Example two
Fig. 2 is a flowchart of a text labeling method based on a customer service scene of a bank according to a second embodiment of the present invention, where the embodiment is based on the above embodiment, and after labeling a sentence cluster to be calibrated according to semantics, the method further includes: comparing the marked tag with the history tag to obtain a new tag, and determining a new event aiming at the dialogue text of the customer service scene of the bank according to the new tag.
As shown in fig. 2, the method includes:
step S201, acquiring a dialogue text of a customer service scene of a bank.
Optionally, acquiring the dialogue text of the customer service scene of the bank includes: acquiring dialogue data of a specified type in a bank client scene, wherein the specified type comprises a pre-credit marketing class, a mid-credit auditing class, a post-credit prompting class or a satisfaction investigation class; preprocessing dialogue data, splitting the preprocessed dialogue data according to the identities of clients and customer service to obtain client texts and customer service texts, wherein the client texts comprise client dialogue sentences, and the customer service texts comprise customer service dialogue sentences; and taking the customer text or the customer service text as a bank customer service scene dialogue text.
Step S202, carrying out vector conversion on dialogue sentences in the dialogue text to obtain corresponding dialogue sentence vectors, and carrying out density clustering on the dialogue sentence vectors to obtain sentence vector clusters.
Optionally, performing vector conversion on the dialogue sentence in the dialogue text to obtain a corresponding dialogue sentence vector, including: obtaining a pre-training language model, wherein the pre-training language model comprises a Bert model; and carrying out vector conversion on the dialogue sentences by adopting a pre-training language model to obtain corresponding dialogue sentence vectors.
Optionally, performing density clustering on dialogue sentence vectors to obtain sentence vector clusters, including: performing data segmentation on dialogue sentence vectors to obtain a plurality of dialogue sentence vector sets, wherein each dialogue sentence set comprises dialogue sentence vectors with the same quantity; clustering each dialogue sentence vector set according to a first cluster parameter by adopting a density clustering algorithm to obtain a plurality of sentence vector clusters, wherein each sentence vector cluster comprises dialogue sentence vectors with vector distances within a specified range.
Step S203, a statement cluster to be calibrated is obtained according to the statement vector cluster, and the statement cluster to be calibrated is labeled according to the semantics of the statement cluster to be calibrated.
Optionally, obtaining the statement cluster to be calibrated according to the statement vector cluster includes: converting dialogue sentence vectors in each sentence vector cluster into dialogue sentence acquisition sentence clusters; acquiring a first statement cluster set and a second statement cluster set according to the semantic state of each semantic cluster, wherein the first statement cluster set comprises statement clusters with definite semantics, and the second statement cluster set comprises statement clusters with ambiguous semantics; and acquiring the statement cluster to be calibrated according to the first statement cluster set and the second statement cluster set.
Optionally, acquiring the statement cluster to be calibrated according to the first statement cluster set and the second statement cluster set includes: performing vector conversion on each dialogue sentence in the second sentence cluster set to obtain a corresponding dialogue sentence vector; performing iterative clustering on dialogue sentence vectors converted by the second sentence cluster set by adopting a density algorithm according to the designated cluster parameters to obtain a newly added sentence vector cluster; converting dialogue sentence vectors in each newly added sentence vector cluster into dialogue sentences to obtain a third sentence cluster set, wherein the third sentence cluster set comprises sentence clusters with definite semantics; and taking the statement clusters contained in the first statement cluster set and the third statement cluster set as statement clusters to be calibrated.
Step S204, comparing the marked label with the history label to obtain a new label, and determining a new event aiming at the dialogue text of the customer service scene of the bank according to the new label.
Specifically, in this embodiment, after the tags are added to each sentence cluster, all the tags obtained by the current label are obtained. Meanwhile, the history label obtained during the previous labeling is also obtained, and all the labels obtained at this time are compared with the history label to obtain a new label, for example, the history label comprises frequent card handling, strong and hard recovery and slow system, and all the labels obtained by the word labeling comprise: frequent card handling, strong and hard collection, slow system and application breakdown, the application breakdown is used as a new label of the label, so that the new event of the dialogue text of the customer service scene of the bank can be determined as the occurrence of the application breakdown according to the new label.
After the new event is acquired, the new event can be displayed to the user, different alarm levels are set for different event types, for example, the alarm level corresponding to the application breakdown is set to be one level, and an alarm mode corresponding to the one-level alarm is adopted to prompt the user, so that the user can acquire the current running state based on the bank customer service scene in time, and corresponding measures are taken in time for adjusting the emergency event in the running state.
According to the embodiment of the invention, the sentence clusters to be calibrated are obtained by clustering a large number of dialogue sentences in the dialogue text, and the sentence clusters to be calibrated are labeled according to the semantics of the obtained sentence clusters to be calibrated, so that the workload of manual labeling and sentence analysis is saved, and the efficiency and accuracy of text labeling are improved. Comparing the marked tag with the history tag to obtain a new tag, and determining a new event aiming at the dialogue text of the customer service scene of the bank according to the new tag, so that a user can acquire the current running state based on the customer service scene of the bank in time, and the emergency event in the running state can be adjusted by adopting corresponding measures in time.
Example III
Fig. 3 is a schematic structural diagram of a text labeling device based on a customer service scene of a bank according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes: a dialogue text acquisition module 310, a sentence cluster to be calibrated acquisition module 320 and a label labeling module 330.
The dialogue text obtaining module 310 is configured to obtain a dialogue text of a customer service scene of a bank, where the dialogue text includes dialogue sentences;
the sentence vector cluster obtaining module 320 is configured to perform vector conversion on dialogue sentences in the dialogue text to obtain corresponding dialogue sentence vectors, and perform density clustering on each dialogue sentence vector to obtain a sentence vector cluster;
the labeling module 330 is configured to obtain a statement cluster to be calibrated according to the statement vector cluster, and label the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated.
Optionally, the dialogue text obtaining module is used for obtaining dialogue data of a specified type in a banking client scene, wherein the specified type comprises a pre-credit marketing class, a mid-credit auditing class, a post-credit prompting class or a satisfaction investigation class;
preprocessing dialogue data, splitting the preprocessed dialogue data according to the identities of clients and customer service to obtain client texts and customer service texts, wherein the client texts comprise client dialogue sentences, and the customer service texts comprise customer service dialogue sentences;
and taking the customer text or the customer service text as a bank customer service scene dialogue text.
Optionally, the sentence vector cluster obtaining module includes a dialogue sentence vector obtaining unit, configured to obtain a pre-training language model, where the pre-training language model includes a Bert model;
and carrying out vector conversion on the dialogue sentences by adopting a pre-training language model to obtain corresponding dialogue sentence vectors.
Optionally, the sentence vector cluster obtaining module includes a sentence vector cluster obtaining unit, configured to perform data segmentation on dialogue sentence vectors to obtain a plurality of dialogue sentence vector sets, where each dialogue sentence set includes dialogue sentence vectors with the same number;
clustering each dialogue sentence vector set according to a first cluster parameter by adopting a density clustering algorithm to obtain a plurality of sentence vector clusters, wherein each sentence vector cluster comprises dialogue sentence vectors with vector distances within a specified range.
Optionally, the tag labeling module includes a sentence cluster obtaining unit to be calibrated, and is configured to convert dialogue sentence vectors in each sentence vector cluster into dialogue sentence obtaining sentence clusters;
acquiring a first statement cluster set and a second statement cluster set according to the semantic state of each semantic cluster, wherein the first statement cluster set comprises statement clusters with definite semantics, and the second statement cluster set comprises statement clusters with ambiguous semantics;
and acquiring the statement cluster to be calibrated according to the first statement cluster set and the second statement cluster set.
Optionally, the sentence cluster obtaining unit to be calibrated is further configured to perform vector conversion on each dialogue sentence in the second sentence cluster set to obtain a corresponding dialogue sentence vector;
performing iterative clustering on dialogue sentence vectors converted by the second sentence cluster set by adopting a density algorithm according to the designated cluster parameters to obtain a newly added sentence vector cluster;
converting dialogue sentence vectors in each newly added sentence vector cluster into dialogue sentences to obtain a third sentence cluster set, wherein the third sentence cluster set comprises sentence clusters with definite semantics;
and taking the statement clusters contained in the first statement cluster set and the third statement cluster set as statement clusters to be calibrated.
Optionally, the designated cluster parameter is greater than the first cluster parameter.
Optionally, the label labeling module is used for displaying the statement cluster to be calibrated so as to receive the semantics of the statement cluster to be calibrated, which is determined by a user;
and determining the label according to the semantics, and labeling the label into the statement cluster to be calibrated.
Optionally, the device further comprises a new event acquisition module, configured to compare the labeled tag with the history tag to acquire a new tag;
and determining a new event aiming at the dialogue text of the customer service scene of the bank according to the new tag.
The text labeling device based on the bank customer service scene provided by the embodiment of the invention can execute the text labeling method based on the bank customer service scene provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM12 and the RAM13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, for example based on text labeling methods in a banking customer service scenario.
In some embodiments, the text labeling method based on a banking customer service scenario may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM12 and/or the communication unit 19. When the computer program is loaded into RAM13 and executed by processor 11, one or more of the steps of the text labeling method described above in a banking customer service based scenario may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the text labeling method in a banking customer service based scenario in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (12)

1. The text labeling method based on the bank customer service scene is characterized by comprising the following steps of:
acquiring a dialogue text of a customer service scene of a bank, wherein the dialogue text comprises dialogue sentences;
performing vector conversion on dialogue sentences in the dialogue text to obtain corresponding dialogue sentence vectors, and performing density clustering on the dialogue sentence vectors to obtain sentence vector clusters;
and acquiring a statement cluster to be calibrated according to the statement vector cluster, and labeling the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated.
2. The method of claim 1, wherein the obtaining bank customer service scene dialogue text comprises:
acquiring dialogue data of a specified type in a bank client scene, wherein the specified type comprises a pre-credit marketing class, a mid-credit auditing class, a post-credit prompting class or a satisfaction investigation class;
preprocessing the dialogue data, splitting the preprocessed dialogue data according to the identities of clients and customer service to obtain client texts and customer service texts, wherein the client texts comprise client dialogue sentences, and the customer service texts comprise customer service dialogue sentences;
and taking the client text or the customer service text as the bank customer service scene dialogue text.
3. The method of claim 1, wherein the performing vector conversion on the dialogue sentence in the dialogue text to obtain the corresponding dialogue sentence vector comprises:
obtaining a pre-training language model, wherein the pre-training language model comprises a Bert model;
and carrying out vector conversion on the dialogue sentence by adopting the pre-training language model to obtain a corresponding dialogue sentence vector.
4. The method of claim 1, wherein the densely clustering the dialogue sentence vectors to obtain sentence vector clusters comprises:
performing data segmentation on the dialogue sentence vectors to obtain a plurality of dialogue sentence vector sets, wherein each dialogue sentence set comprises dialogue sentence vectors with the same quantity;
clustering each dialogue sentence vector set according to a first cluster parameter by adopting a density clustering algorithm to obtain a plurality of sentence vector clusters, wherein each sentence vector cluster comprises dialogue sentence vectors with vector distances within a specified range.
5. The method of claim 4, wherein the obtaining the statement cluster to be calibrated from the statement vector cluster comprises:
converting dialogue sentence vectors in each sentence vector cluster into dialogue sentence acquisition sentence clusters;
acquiring a first statement cluster set and a second statement cluster set according to the semantic state of each semantic cluster, wherein the first statement cluster set comprises statement clusters with definite semantics, and the second statement cluster set comprises statement clusters with ambiguous semantics;
and acquiring the statement cluster to be calibrated according to the first statement cluster set and the second statement cluster set.
6. The method of claim 5, wherein the obtaining the statement cluster to be calibrated from the first set of statement clusters and the second set of statement clusters comprises:
performing vector conversion on each dialogue sentence in the second sentence cluster set to obtain a corresponding dialogue sentence vector;
performing iterative clustering on dialogue sentence vectors converted by the second sentence cluster set by adopting a density algorithm according to specified cluster parameters to obtain a newly added sentence vector cluster;
converting dialogue sentence vectors in each newly added sentence vector cluster into dialogue sentences to obtain a third sentence cluster set, wherein the third sentence cluster set comprises sentence clusters with definite semantics;
and taking the statement clusters contained in the first statement cluster set and the third statement cluster set as the statement clusters to be calibrated.
7. The method of claim 6, wherein the specified cluster parameter is greater than the first cluster parameter.
8. The method according to claim 1, wherein the labeling the sentence cluster to be calibrated according to the semantics of the semantic cluster to be calibrated includes:
displaying the statement cluster to be calibrated to receive the semantics of the statement cluster to be calibrated, which are determined by a user;
and determining a label according to the semantics, and labeling the label into the statement cluster to be calibrated.
9. The method according to claim 1, wherein after labeling the statement cluster to be calibrated according to the semantics, further comprising:
comparing the marked tag with the history tag to obtain a newly added tag;
and determining a new event aiming at the dialogue text of the customer service scene of the bank according to the new tag.
10. The utility model provides a text marking device based on under bank customer service scene which characterized in that includes:
the dialogue text acquisition module is used for acquiring dialogue texts of customer service scenes of banks, wherein the dialogue texts comprise dialogue sentences;
the sentence vector cluster acquisition module is used for carrying out vector conversion on dialogue sentences in the dialogue text to acquire corresponding dialogue sentence vectors, and carrying out density clustering on each dialogue sentence vector to acquire a sentence vector cluster;
the label labeling module is used for obtaining the statement cluster to be calibrated according to the statement vector cluster, and labeling the statement cluster to be calibrated according to the semantics of the statement cluster to be calibrated.
11. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
12. A computer readable storage medium, characterized in that it stores computer instructions for causing a processor to implement the method of any one of claims 1-9 when executed.
CN202310404470.3A 2023-04-17 2023-04-17 Text labeling method, device and storage medium based on bank customer service scene Pending CN116431809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310404470.3A CN116431809A (en) 2023-04-17 2023-04-17 Text labeling method, device and storage medium based on bank customer service scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310404470.3A CN116431809A (en) 2023-04-17 2023-04-17 Text labeling method, device and storage medium based on bank customer service scene

Publications (1)

Publication Number Publication Date
CN116431809A true CN116431809A (en) 2023-07-14

Family

ID=87081129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310404470.3A Pending CN116431809A (en) 2023-04-17 2023-04-17 Text labeling method, device and storage medium based on bank customer service scene

Country Status (1)

Country Link
CN (1) CN116431809A (en)

Similar Documents

Publication Publication Date Title
CN111861596A (en) Text classification method and device
CN114970540A (en) Method and device for training text audit model
CN116340831B (en) Information classification method and device, electronic equipment and storage medium
CN112560480A (en) Task community discovery method, device, equipment and storage medium
CN115048352B (en) Log field extraction method, device, equipment and storage medium
CN114461665B (en) Method, apparatus and computer program product for generating a statement transformation model
CN114444514B (en) Semantic matching model training method, semantic matching method and related device
CN115952258A (en) Generation method of government affair label library, and label determination method and device of government affair text
CN115510212A (en) Text event extraction method, device, equipment and storage medium
CN115600607A (en) Log detection method and device, electronic equipment and medium
CN114647727A (en) Model training method, device and equipment applied to entity information recognition
CN116431809A (en) Text labeling method, device and storage medium based on bank customer service scene
CN114254650A (en) Information processing method, device, equipment and medium
CN117574146B (en) Text classification labeling method, device, electronic equipment and storage medium
CN117271373B (en) Automatic construction method and device for test cases, electronic equipment and storage medium
CN112989797B (en) Model training and text expansion methods, devices, equipment and storage medium
CN114491040B (en) Information mining method and device
CN114898374A (en) Image semantic recognition method, device, equipment and storage medium
CN113360602A (en) Method, apparatus, device and storage medium for outputting information
CN117290758A (en) Classification and classification method, device, equipment and medium for unstructured document
CN117851599A (en) Method, device, equipment and medium for extracting text of other elements of investment supervision
CN116257639A (en) Logistics knowledge graph generation method, device, equipment and storage medium
CN116524905A (en) Training method, device, equipment and storage medium of voice recognition model
CN117574146A (en) Text classification labeling method, device, electronic equipment and storage medium
CN115827835A (en) Method, device and equipment for extracting labels of open texts and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination