CN117057333A - Method, device, equipment and medium for mining conversation template of financial business - Google Patents

Method, device, equipment and medium for mining conversation template of financial business Download PDF

Info

Publication number
CN117057333A
CN117057333A CN202310928524.6A CN202310928524A CN117057333A CN 117057333 A CN117057333 A CN 117057333A CN 202310928524 A CN202310928524 A CN 202310928524A CN 117057333 A CN117057333 A CN 117057333A
Authority
CN
China
Prior art keywords
word
target
entity
template
dialogue data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310928524.6A
Other languages
Chinese (zh)
Inventor
孙梓淇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202310928524.6A priority Critical patent/CN117057333A/en
Publication of CN117057333A publication Critical patent/CN117057333A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Discrete Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of finance, in particular to a method, a device, equipment and a medium for mining a conversation template of a finance business, wherein the method acquires N historical conversation data corresponding to the finance business and a target business type, and screens the N historical conversation data to obtain target conversation data corresponding to the target business type; identifying and obtaining a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template; and performing word segmentation processing on the target dialogue data to obtain corresponding words, performing cluster analysis on all the words to obtain a clustering result, and if a named entity exists in the clustering result, removing the corresponding words in the clustering result from the initial speech operation template to obtain a target speech operation template, so that the outbound data in an outbound task are more accurate, and the normalization and the high efficiency of the outbound corresponding to the financial service are enhanced.

Description

Method, device, equipment and medium for mining conversation template of financial business
Technical Field
The invention relates to the field of finance, in particular to a method, a device, equipment and a medium for mining a conversation template of a financial business.
Background
The application field of the intelligent outbound system is wide at present, and the application fields of public departments, operators, logistics, finance, real estate and the like are covered. The intelligent outbound system is based on the high-efficiency intelligent robot outbound system to assist or replace a manual seat to automatically communicate, specific outbound tasks such as product promotion, user return visit and the like are completed, and the problems of high manual outbound cost, low efficiency and irregular conversation in the traditional outbound process can be efficiently solved. For example: for a salesman of insurance sales, when an insurance contract of a customer is about to expire, the salesman can use the intelligent outbound system to initiate outbound to the customer so as to remind the customer of a time period.
The acquisition of outbound data is a key of an intelligent outbound system, and directly influences the rationality of outbound tasks. At present, the method for acquiring outbound data maps an input sentence of text into a semantic frame consisting of a plurality of semantic slots by using a lexical analysis and template mode, and has the advantages of controllability, good performance, high speed, convenient editing and no technical barriers, but has the disadvantages of poor template diversity and generalization, so that the outbound data corresponding to outbound tasks is obtained inaccurately, namely, the outbound content is not standard, and the purposefulness of the outbound tasks is deviated. Therefore, how to improve diversity and generalization of templates to ensure accuracy of outbound data is a problem to be solved urgently.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method, a device, equipment and a medium for mining a conversation template of a financial service, so as to solve the problem of low accuracy of outbound data.
In a first aspect, an embodiment of the present invention provides a method for mining a conversation template of a financial service, where the method includes:
acquiring N historical dialogue data and target business types corresponding to the financial business in a historical time period, extracting dialogue characteristics of the target business types, and screening the N historical dialogue data to obtain target dialogue data according to the dialogue characteristics;
performing entity recognition on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template;
performing word segmentation processing on the target dialogue data to obtain corresponding word segments, performing cluster analysis on all the word segments to obtain a cluster result, and detecting whether the named entity exists in the cluster result;
and if the named entity exists in the clustering result, removing the corresponding word segmentation in the clustering result from the initial speech operation template to obtain a target speech operation template corresponding to the target service type.
In a second aspect, an embodiment of the present invention provides a device for mining a conversation template of a financial business, the device comprising:
the data screening module is used for acquiring N historical dialogue data and target service types corresponding to the financial service in the historical time period, extracting dialogue characteristics of the target service types, and screening the N historical dialogue data to obtain target dialogue data according to the dialogue characteristics;
the template acquisition module is used for carrying out entity identification on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template;
the entity detection module is used for carrying out word segmentation processing on the target dialogue data to obtain corresponding word segments, carrying out cluster analysis on all the word segments to obtain a cluster result, and detecting whether the named entity exists in the cluster result;
and the template adjustment module is used for removing the corresponding word segmentation in the clustering result from the initial speech template if the named entity exists in the clustering result, so as to obtain the target speech template corresponding to the target service type.
In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, where the processor implements the speaking template mining method according to the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing a computer program, which when executed by a processor implements the speaking template mining method according to the first aspect.
Compared with the prior art, the embodiment of the invention has the beneficial effects that:
the embodiment of the invention provides a conversation template mining method of financial business, which is used for acquiring N historical conversation data and target business types corresponding to the financial business in a historical time period, extracting conversation characteristics of the target business types, and screening the N historical conversation data to obtain target conversation data according to the conversation characteristics; performing entity recognition on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template; performing word segmentation processing on the target dialogue data to obtain corresponding word segments, performing cluster analysis on all the word segments to obtain a cluster result, and detecting whether a named entity exists in the cluster result; if the named entity exists in the clustering result, the corresponding word in the clustering result is removed from the initial conversation template, a target conversation template corresponding to the target service type is obtained, the historical conversation data before the external call is analyzed, the target conversation data is obtained, the target conversation data are further removed based on semantic generalization, the target conversation template is obtained, the external call data in the external call task are more accurate, conversation data can be obtained regularly, the conversation template is updated in real time according to the service type, and normalization and high efficiency of the external call corresponding to the financial service are enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an application environment of a method for mining a conversation template of a financial transaction according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for mining a conversation template of a financial transaction according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for mining a conversation template of a financial service according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
The speaking template mining method for financial services provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud terminal device, a personal digital assistant (personal digital assistant, PDA), and other computer devices. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
The conversation template mining method is applied to a financial business scene, the client can be a mobile phone, the mobile phone is connected with a server through a wireless network so as to send conversation data when a business operator carries out conversation by using the mobile phone and the client, and then the server carries out data analysis processing on the conversation data to obtain conversation templates corresponding to business types of various financial businesses.
Referring to fig. 2, a flow chart of a method for mining a conversation template of a financial service according to an embodiment of the present invention is provided, where the method for mining a conversation template can be applied to a server in fig. 1, and a computer device corresponding to the server is connected to a corresponding database, so as to implement mining of the conversation template according to historical conversation data stored in the database, and in addition, the server is connected to a client, so as to obtain conversation data sent by a user from the client. As shown in fig. 2, the speaking template mining method may include the steps of:
step S201, N historical dialogue data and target business types corresponding to financial business in a historical time period are obtained, dialogue characteristics of the target business types are extracted, and the target dialogue data is obtained through screening from the N historical dialogue data according to the dialogue characteristics.
Wherein, the historical time period can be one or more hours, or one or more days, or one or more weeks, etc., assuming 1 day, and the time starts from 6 months 13 days, each historical time period is 6 months 12 days, 6 months 11 days, 6 months 10 days, etc., respectively. Financial transactions refer to commercial transactions related to financial services, which may involve banking, securities, insurance, and various other forms of financial services. The history dialogue data refers to dialogue data between a salesman and a client held in a history period. The target service type can be a service type which needs to be performed by a user in each service type, and it is worth noting that the financial service relates to various forms of financial services, and the service types corresponding to different forms of financial services are different, for example, the service types such as bank loan, bank collection, insurance contract renewal and the like. The dialogue feature may include keywords corresponding to the business type, for example, keywords corresponding to the business type of the bank's incentives include repayment paths, risks, overdue, and the like. Therefore, N historical dialogue data and target business types corresponding to the financial business in the historical time period are obtained, dialogue characteristics of the target business types are extracted, and the target dialogue data is obtained through screening from the N historical dialogue data according to the dialogue characteristics.
In one embodiment, in a banking scene, multiple sections of history dialogues between each banking member and a client in a history period are obtained, and for a banking type, according to dialog features between the banking member and the client, a history dialog corresponding to the banking type is obtained by screening from the multiple sections of history dialogues.
It should be noted that the historical dialogue data may be voice data or text data, and may also be other types of data. When the historical dialogue data is voice data, converting the voice data into text data through a voice recognition technology; when the historical dialogue data is text data, data conversion is not needed; when the historical dialogue data is other types of data, the historical dialogue data is converted into text data by adopting corresponding technology.
Step S202, entity recognition is carried out on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and the named entity is removed from the target dialogue data to obtain an initial conversation template.
The named entity, namely the entity for short, has special meaning or refers to the entity with strong meaning and the name as the identifier, and the entity comprises the entity and the entity type corresponding to the entity. For example, the entity includes an entity type such as a person name, a place name, and the like. In addition, there are named entity types in different fields, such as insurance, tax, online banking, interest, stock, fund, etc., in the financial field. The conversation template provides a text basis for a conversation with a customer during an intelligent outbound call. Therefore, entity recognition is carried out on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and the named entity is removed from the target dialogue data to obtain an initial conversation template.
Optionally, performing entity recognition on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, including:
carrying out speaker characteristic analysis on the target dialogue data to obtain a speaker of each sentence in the target dialogue data, and determining a target sentence of which the speaker is a target object;
and carrying out entity recognition on the target sentence to obtain a named entity of the target sentence.
Wherein the dialogue data includes question data and answer data. The speaker characteristic is a symbol or a sign that is characteristic of the speaker. Speakers include clients and customer services. The target object is customer service, so in the invention, the tone of the person is taken as the characteristic of the speaker, the speaker characteristic analysis is carried out on the target dialogue data according to the tone, the speaker of each sentence in the target dialogue data is obtained, and the target sentence of which the speaker is the target object is determined; and carrying out entity recognition on the target sentence to obtain a named entity of the target sentence.
In an embodiment, in a bank collection service scenario, all dialogues between a collection person and a client are used for collection, so as to achieve collection in the communication process, and through the tone colors of the collection person and the client, the words of the collection person are extracted from all dialog data, and then entity recognition is performed on the words of the collection person, so that named entities of each sentence are obtained.
Optionally, performing entity recognition on the target sentence to obtain a named entity of the target sentence, including:
splitting a target sentence into at least one paragraph, and carrying out entity identification on each paragraph to obtain an entity of the corresponding paragraph;
and counting the occurrence times of the entities aiming at any entity, determining the word frequency of the entities according to the number of paragraphs and the occurrence times of the entities, and determining the entities as named entities corresponding to the target sentences if the word frequency of the entities is greater than a threshold value.
For a target sentence, firstly, performing paragraph identification on the target sentence, splitting the target sentence into at least one paragraph, and then performing entity identification on each paragraph to obtain an entity of the corresponding paragraph. After all the entities of the target sentence are identified, the entities need to be screened to ensure the uniqueness of the subsequent conversation template. The invention adopts inverse document frequency (Inverse Document Frequency, IDF) to screen the entity, which is as follows: for any entity, counting the number of paragraphs containing the entity as the occurrence number of the entity, counting the number of paragraphs, taking the ratio of the occurrence number of the entity to the number of paragraphs as the word frequency of the entity, further detecting whether the word frequency of the entity is greater than a threshold value, if the word frequency of the entity is greater than the threshold value, confirming that the entity is a named entity of a target sentence, otherwise, if the word frequency of the entity is not greater than the threshold value, confirming that the entity is not the named entity of the target sentence, for example, the exchange rate, the amount, the bank and other words in financial business occur frequently.
In the invention, the method for identifying the paragraphs of the target sentence and splitting the target sentence into at least one paragraph comprises the following steps: and carrying out paragraph identification according to punctuation marks in the target sentence, and forming a paragraph between the sentence starting position and the first punctuation mark, namely forming a paragraph between the two punctuation marks.
Optionally, performing entity identification on each paragraph to obtain an entity of the corresponding paragraph, including:
acquiring a preset entity dictionary;
for any paragraph, matching the paragraph by using a preset entity dictionary to obtain an entity of the corresponding paragraph;
traversing all paragraphs to obtain the entity of each paragraph.
In the dictionary matching process, an entity dictionary is set, so that a preset entity dictionary is acquired, and for any paragraph, the paragraph is matched by using the preset entity dictionary, so that the entity included in the corresponding paragraph can be obtained. For example, if the entity is a house loan and a paragraph is "under national regulation, the house loan interest rate gradually decreases", the paragraph is matched by the entity dictionary, and 1 entity, namely, the house loan, can be obtained. And traversing all paragraphs to obtain the entity of each paragraph.
In the invention, the entity recognition can be performed on each paragraph by using a dictionary matching mode, the entity recognition can be performed on each paragraph by using an entity recognition model, and the entity recognition can be performed on each paragraph by adopting a combination of the two modes.
For entity recognition of each paragraph by using the entity recognition model, specifically, the entity recognition model is a model for finding out related entities from the natural language file, and the entity included in the corresponding paragraph is output by directly inputting each paragraph into the entity recognition model. The entity recognition model can adopt various existing neural network models, such as BERT+CRF models, the embodiment of the invention does not limit what model is adopted specifically, a person skilled in the art can reasonably select according to needs, and the entity model in a specific field can be trained according to the existing method.
For each paragraph, performing entity recognition by combining dictionary matching and entity recognition models, specifically, for any paragraph, matching a first entity set from the paragraph by using dictionary matching, recognizing a second entity set from the paragraph by using the entity recognition model, then removing the entity having conflict in the first entity set and the second entity set, and taking the remaining entities in the first entity set and the second entity set as the final recognition result. And all possibly occurring entities are obtained through dictionary matching, the coverage rate is high, the entity with conflict can be processed by utilizing the entity identification model, the accuracy rate is high, and the accuracy rate and the coverage rate of entity identification can be improved by combining the entity identification model with the entity identification model. For example: the 'purchase insurance' can obtain 3 entities, namely 'purchase', 'purchase insurance' and 'insurance', by utilizing dictionary matching, and 2 entities, namely 'purchase' and 'insurance', can be obtained by utilizing an entity identification model, and 2 entities, namely 'purchase' and 'insurance', can be obtained by combining the two. Step S203, word segmentation processing is carried out on the target dialogue data to obtain corresponding word segments, clustering analysis is carried out on all the word segments to obtain a clustering result, and whether a named entity exists in the clustering result is detected.
The word segmentation process refers to dividing continuous text into individual word elements. The word segmentation is the minimum unit of expression of the semantic meaning. The clustering result characterizes the word segmentation set with the same part of speech in all the word segments. Therefore, in the invention, the jieba is adopted to perform word segmentation processing on the target dialogue data, so as to obtain all the words corresponding to the target dialogue data. Jieba is a chinese word segmentation tool based on Python (object-oriented dynamic type language) and includes three word segmentation modes, namely an accurate mode, a full mode and a search engine mode, wherein the accurate mode can perform the most accurate segmentation on sentences, has no redundant data and is applied to text analysis. After all the words of the target dialogue data are acquired, carrying out cluster analysis on all the words to obtain a cluster result, further carrying out one-to-one matching on the named entities and the words in the cluster result aiming at any cluster result to detect whether the named entities exist in the cluster result, confirming that the named entities exist in the cluster result when the matching is successful, and confirming that the named entities do not exist in the cluster result when the matching is unsuccessful.
In one embodiment, for the scene of insurance promotion, the salesman says "do you buy insurance" and the customer says "i have purchased insurance" to form dialogue data as "do you buy insurance, i have purchased insurance", and then the dialogue data is subjected to word segmentation processing, and all obtained words include you, purchase, insurance, do, i, have, purchase, insurance, and have been obtained. And clustering the segmented words, namely, clustering results corresponding to the pronouns are called by the group members, clustering results corresponding to the purchase and purchase composition verbs, clustering results corresponding to the insurance and insurance composition nouns, and clustering results corresponding to the composition language assisting words are obtained. And if the named entity is insurance, detecting whether the word corresponding to the insurance exists in the clustering result.
Optionally, performing cluster analysis on all the segmented words to obtain a cluster result, including:
connecting all the word pairs to generate word pairs, calculating to obtain weight values of the corresponding word pairs according to two word pairs in each word pair, and constructing a word segmentation structure diagram by combining the word pairs and the weight values of the word pairs;
and carrying out graph segmentation processing on the word segmentation structure graph through a community discovery algorithm to obtain clustering results of all word segmentation components under the same part of speech.
The community discovery algorithm is a graph-based clustering algorithm, and is mainly used for dividing a network according to the relationship strength of individuals in the network of the graph. The word segmentation structure diagram is a network structure comprising dot and edge (directed edge or undirected edge) elements, and mainly describes an easy-to-understand network topology of the relationships among individuals. Part of speech refers to the property that a word embodies in a sentence. In the invention, all the word segmentation is respectively used as the vertex, the word segmentation is connected in pairs to form word pairs, the weight value of the corresponding word pair is calculated according to the two word segmentation in each word pair, and the word pairs formed by combining the word pairs and the weight value of the word pairs are integrated, so that the word segmentation structure diagram is constructed in a co-occurrence mode. After the word segmentation structure diagram is obtained, the word segmentation structure diagram can be subjected to graph segmentation processing through a community discovery algorithm, in the embodiment, a community discovery algorithm (Louvain algorithm) based on modularity is adopted to carry out graph segmentation, each word in the word segmentation structure diagram is firstly taken as a community, and for each community, the neighbor communities of each community are combined with the community based on the connection association of word pairs. And then acquiring the maximum word pair weight value by the basic gradient descent rule, merging the maximum word pair weight value into a corresponding community, and completing segmentation after multiple rounds of merging until the weight value of the word pair is not changed, so as to obtain a clustering result formed by all the segmented words under the same word part, wherein the similarity difference between the segmented words is reflected by a segmented word structure diagram, the word part can be accurately reflected, and the accuracy of segmented word clustering is improved.
In one embodiment, for the scene of insurance promotion, the salesman says "do you buy insurance" and the customer says "i have purchased insurance" to form dialogue data as "do you buy insurance, i have purchased insurance", and then the dialogue data is subjected to word segmentation processing, and all obtained words include you, purchase, insurance, do, i, have, purchase, insurance, and have been obtained. The word pairs are formed by connecting two pairs of the word pairs by taking the word pairs as vertices respectively, for example: and finally, carrying out graph segmentation processing on the word segmentation structure graph by using a community discovery algorithm to obtain clustering results of all word segmentation under the same part of speech, namely, clustering results corresponding to the words of the word "you" and the word "I" are called pronouns, clustering results corresponding to the words of the word "buy" are obtained, clustering results corresponding to the words of the word "safety" are obtained, and clustering results corresponding to the words of the word "composition speech" are obtained.
Optionally, the calculating to obtain the weight value of the corresponding word pair according to the two word segmentation in each word pair includes:
for any word pair, respectively acquiring the word vector of each word in the word pair, and calculating the similarity between the word vectors of the two word in the word pair as the weight value of the word pair.
The word vector of each word in the word pair is obtained by using a pre-trained word vector (word 2 vec) model, and then the similarity between two word pairs is calculated based on the word vectors to serve as the weight value of the corresponding word pair, and the difference between each word in all word pairs is analyzed by the word pairs formed by two word pairs through the weight value of the word pairs.
In the invention, the word vector of each word in the word pair can be obtained based on a pre-trained word vector model, and then the similarity between two word pairs is calculated based on the word vector to be used as the weight value of the corresponding word pair, and the word vector of the corresponding word can be obtained by respectively carrying out ID (I d e n t i t y document, unique identifier) coding on each word, and then the similarity between the two word pairs is calculated according to the word vector to be used as the weight value of the corresponding word pair, specifically: firstly, all the segmented words are uniformly stored to form a word stock, and in the word stock, each segmented word is provided with a word ID correspondingly by respectively carrying out ID coding on each segmented word. Furthermore, an embedding layer (embedding) is adopted to perform the reduction and the dense representation on each word ID, so that word vectors corresponding to the word IDs are converted and generated. Thus, based on two segmented words in the word pair, the word vectors corresponding to the two segmented words can be obtained by inquiring from the word stock. The cosine value of the included angle between the two word vectors is calculated, the cosine value is used as a similarity value between the two segmented words, and the similarity value is used as a weight value of a word pair corresponding to the two segmented words. The smaller the included angle is, the closer the cosine value is to 1, the more the word vector directions of the two segmented words are consistent, and the more similar the two segmented words are.
Step S204, if the existence of the named entity in the clustering result is detected, removing the corresponding segmentation word in the clustering result from the initial speech template to obtain the target speech template corresponding to the target service type.
The target speech template refers to a speech template intended by a user, so after detecting that a named entity exists in a clustering result, the method and the device acquire the corresponding word of the named entity in the clustering result as a target word, remove the target word from the initial speech template, and further take the initial speech template after the target word is removed as the target speech template corresponding to the target service type.
Optionally, after removing the corresponding word in the clustering result from the initial speech template, the method includes:
recognizing the word segmentation corresponding to the preset word class in the clustering result, and removing the word segmentation corresponding to the preset word class in the clustering result from the initial speech operation template.
The word class refers to classification of words, and is classified according to the usage and grammar characteristics of the words, and the category to which the words with similar semantics belong is characterized, namely, the words with similar semantics are classified into the same category. Represented by the same label. For example, the words "delegate," "replace," may be categorized as being of the same word class. The preset word class comprises a fluxing word and a pronoun, and can be set according to specific service scenes, and the invention is not limited. Therefore, recognizing the word segmentation corresponding to the preset word class in the clustering result, and removing the word segmentation corresponding to the preset word class in the clustering result from the initial speech operation template.
In the invention, a word class classification model can be trained in advance through marking the word class corresponding to each word segment to obtain a trained word class classification model, and then after the word segment corresponding to the named entity in the clustering result is removed, the word class identification is carried out on each remaining word segment in the clustering result by using the trained word class classification model, so that the word class corresponding to each remaining word segment in the clustering result can be determined, all the word segments corresponding to the preset word class in the clustering result are obtained, and all the word segments corresponding to the preset word class are removed from the initial speech operation template.
It should be noted that, the invention can identify the word segmentation corresponding to the preset word class in the clustering result through the trained word class classification model, and can also identify the word segmentation corresponding to the preset word class in the clustering result through respectively carrying out ID coding on each word segmentation, specifically: firstly, setting corresponding IDs for different word segments, establishing corresponding relations between the word segments and word segment IDs, presetting corresponding IDs for different word segments, establishing corresponding relations between word segments and word segment IDs, and presetting corresponding relations between the word segments and the word segment IDs and storing the corresponding relations, wherein the word segment ID corresponding to the word segment ID is the ID of the word segment to which the word segment represented by the word segment ID belongs. Therefore, the word ID corresponding to each word segment in the clustering result can be determined according to the pre-constructed corresponding relation between each word segment and the word segment ID. Then, according to the corresponding relation between the pre-constructed word segmentation ID and the word class ID, determining the word class ID corresponding to each word segmentation ID in the clustering result. And finally, determining the word class corresponding to each word class ID in the clustering result according to the pre-established corresponding relation between the word class and the word class ID. After determining the word class corresponding to each of the remaining word segments in the clustering result, all the word segments corresponding to the preset word class in the clustering result can be obtained according to the word class of each word segment.
In one embodiment, for the scene of insurance promotion, the salesman says "do you buy insurance" and the customer says "i have purchased insurance" to form dialogue data as "do you buy insurance, i have purchased insurance", and then the dialogue data is subjected to word segmentation processing, and all obtained words include you, purchase, insurance, do, i, have, purchase, insurance, and have been obtained. Clustering the words to obtain clustering results corresponding to the words "you" and "me" component person called pronouns, clustering results corresponding to the words "buy" component verbs, clustering results corresponding to the words "insurance" component nouns, clustering results corresponding to the words "morals" and "make up the mood assistant words", and clustering results corresponding to the words "already" component assistant words. After the words named for insurance of the entities are removed from all the clustering results, the words corresponding to the pronouns and the auxiliary words in the clustering results are removed, namely the words of you, I and I are removed.
According to the embodiment of the invention, N historical dialogue data and target business types corresponding to financial businesses in a historical time period are obtained, dialogue characteristics of the target business types are extracted, and the target dialogue data are obtained by screening the N historical dialogue data according to the dialogue characteristics; performing entity recognition on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template; performing word segmentation processing on the target dialogue data to obtain corresponding word segments, performing cluster analysis on all the word segments to obtain a cluster result, and detecting whether a named entity exists in the cluster result; if the named entity exists in the clustering result, the corresponding word in the clustering result is removed from the initial conversation template, a target conversation template corresponding to the target service type is obtained, the historical conversation data before the external call is analyzed, the target conversation data is obtained, the target conversation data are further removed based on semantic generalization, the target conversation template is obtained, the external call data in the external call task are more accurate, conversation data can be obtained regularly, the conversation template is updated in real time according to the service type, and normalization and high efficiency of the external call corresponding to the financial service are enhanced.
Corresponding to the speaking template mining method of the above embodiment, fig. 3 shows a block diagram of a speaking template mining device based on financial services according to a second embodiment of the present invention.
Referring to fig. 3, the speaking template excavating device includes:
the data screening module 31 is configured to obtain N historical dialogue data and a target service type corresponding to a financial service in a historical time period, extract dialogue features of the target service type, and screen and obtain target dialogue data from the N historical dialogue data according to the dialogue features;
the template acquisition module 32 is configured to perform entity recognition on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and remove the named entity from the target dialogue data to obtain an initial dialogue template;
the entity detection module 33 is configured to perform word segmentation on the target dialogue data to obtain corresponding segmented words, perform cluster analysis on all the segmented words to obtain a cluster result, and detect whether a named entity exists in the cluster result;
and the template adjustment module 34 is configured to remove the corresponding word in the clustering result from the initial speech template if the named entity exists in the clustering result, so as to obtain a target speech template corresponding to the target service type.
Optionally, the template acquisition module 32 includes:
the sentence extraction sub-module is used for carrying out speaker characteristic analysis on the target dialogue data to obtain a speaker of each sentence in the target dialogue data and determining a target sentence of which the speaker is a target object;
and the sentence identification sub-module is used for carrying out entity identification on the target sentence to obtain the named entity of the target sentence.
Optionally, the sentence recognition submodule includes:
the paragraph identification unit is used for splitting the target sentence into at least one paragraph, and carrying out entity identification on each paragraph to obtain an entity of the corresponding paragraph;
the entity counting unit is used for counting the occurrence times of the entities aiming at any entity, determining the word frequency of the entities according to the number of paragraphs and the occurrence times of the entities, and determining the entities as named entities corresponding to target sentences if the word frequency of the entities is greater than a threshold value.
Optionally, the paragraph identifying unit includes:
a dictionary acquisition subunit, configured to acquire a preset entity dictionary;
a dictionary matching subunit, configured to match paragraphs with respect to any paragraph by using a preset entity dictionary, so as to obtain an entity of each corresponding paragraph;
and the paragraph traversing subunit is used for traversing all paragraphs to obtain the entity of each paragraph.
Optionally, the entity detection module 33 includes:
the diagram construction submodule is used for carrying out connection on all the word segmentation to generate word pairs, calculating to obtain weight values of the corresponding word pairs according to two word segmentation in each word pair, and constructing a word segmentation structure diagram by combining the word pairs and the weight values of the word pairs;
and the graph segmentation sub-module is used for performing graph segmentation processing on the word segmentation structure graph through a community discovery algorithm to obtain clustering results of all word segmentation components under the same part of speech.
Optionally, the graph construction submodule includes:
the weight acquisition unit is used for respectively acquiring the word vector of each word segmentation in the word pair aiming at any word pair, and calculating the similarity between the word vectors of the two word segmentation in the word pair as the weight value of the word pair.
Optionally, the template adjustment module 34 includes:
the word segmentation removing sub-module is used for identifying the word segmentation corresponding to the preset word class in the clustering result after the word segmentation corresponding to the clustering result is removed from the initial speech template, and removing the word segmentation corresponding to the preset word class in the clustering result from the initial speech template.
It should be noted that, because the content of the information interaction and the execution process between the modules, the sub-modules, the units, and the sub-units is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. As shown in fig. 4, the computer device of this embodiment includes: at least one processor (only one shown in fig. 4), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor, when executing the computer program, performing the steps of any of the various speech template mining method embodiments described above.
The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 4 is merely an example of a computer device and is not intended to limit the computer device, and that a computer device may include more or fewer components than shown, or may combine certain components, or different components, such as may also include a network interface, a display screen, an input device, and the like.
The processor may be a CPU, but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory includes a readable storage medium, an internal memory, etc., where the internal memory may be the memory of the computer device, the internal memory providing an environment for the execution of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. that are provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs such as program codes of computer programs, and the like. The memory may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above-described embodiment, and may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The present invention may also be implemented as a computer program product for implementing all or part of the steps of the method embodiments described above, when the computer program product is run on a computer device, causing the computer device to execute the steps of the method embodiments described above.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. A method for mining a conversation template of a financial business is characterized by comprising the following steps:
acquiring N historical dialogue data and target business types corresponding to the financial business in a historical time period, extracting dialogue characteristics of the target business types, and screening the N historical dialogue data to obtain target dialogue data according to the dialogue characteristics;
Performing entity recognition on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template;
performing word segmentation processing on the target dialogue data to obtain corresponding word segments, performing cluster analysis on all the word segments to obtain a cluster result, and detecting whether the named entity exists in the cluster result;
and if the named entity exists in the clustering result, removing the corresponding word segmentation in the clustering result from the initial speech operation template to obtain a target speech operation template corresponding to the target service type.
2. The speaking template mining method according to claim 1, wherein the entity recognition of the target dialogue data to obtain a named entity of each sentence in the target dialogue data includes:
performing speaker characteristic analysis on the target dialogue data to obtain a speaker of each sentence in the target dialogue data, and determining a target sentence of which the speaker is a target object;
and carrying out entity recognition on the target sentence to obtain a named entity of the target sentence.
3. The speaking template mining method according to claim 2, wherein the entity recognition of the target sentence to obtain a named entity of the target sentence includes:
splitting the target sentence into at least one paragraph, and carrying out entity identification on each paragraph to obtain an entity of the corresponding paragraph;
and counting the occurrence times of the entities aiming at any entity, determining the word frequency of the entity according to the number of paragraphs and the occurrence times of the entities, and if the word frequency of the entity is greater than a threshold value, determining the entity as a named entity corresponding to the target sentence.
4. A speaking template mining method according to claim 3, wherein said performing entity recognition on each paragraph to obtain an entity of the corresponding paragraph comprises:
acquiring a preset entity dictionary;
for any paragraph, matching the paragraph by using the preset entity dictionary to obtain an entity corresponding to the paragraph;
traversing all paragraphs to obtain the entity of each paragraph.
5. The speaking template mining method according to claim 1, wherein the performing cluster analysis on all the segmented words to obtain a cluster result includes:
Connecting all word pairs to generate word pairs, calculating to obtain weight values of the corresponding word pairs according to two word pairs in each word pair, and constructing a word segmentation structure diagram by combining the word pairs and the weight values of the word pairs;
and carrying out graph segmentation processing on the word segmentation structure graph through a community discovery algorithm to obtain clustering results of all word segmentation components under the same part of speech.
6. The speaking template mining method according to claim 5, wherein the calculating the weight value of each word pair according to the two word segmentation of each word pair includes:
and respectively acquiring word vectors of each word in the word pairs aiming at any word pair, and calculating the similarity between the word vectors of the two word in the word pairs as the weight value of the word pairs.
7. The speaking template mining method according to claim 1, wherein after removing the corresponding word segmentation in the clustering result from the initial speaking template, the method comprises:
identifying the word segmentation corresponding to the preset word class in the clustering result, and removing the word segmentation corresponding to the preset word class in the clustering result from the initial speech operation template.
8. A conversation template mining apparatus for financial transactions, the conversation template mining apparatus comprising:
The data screening module is used for acquiring N historical dialogue data and target service types corresponding to the financial service in the historical time period, extracting dialogue characteristics of the target service types, and screening the N historical dialogue data to obtain target dialogue data according to the dialogue characteristics;
the template acquisition module is used for carrying out entity identification on the target dialogue data to obtain a named entity of each sentence in the target dialogue data, and removing the named entity from the target dialogue data to obtain an initial conversation template;
the entity detection module is used for carrying out word segmentation processing on the target dialogue data to obtain corresponding word segments, carrying out cluster analysis on all the word segments to obtain a cluster result, and detecting whether the named entity exists in the cluster result;
and the template adjustment module is used for removing the corresponding word segmentation in the clustering result from the initial speech template if the named entity exists in the clustering result, so as to obtain the target speech template corresponding to the target service type.
9. A computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the speaking template mining method of any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the speaking template mining method of any one of claims 1 to 7.
CN202310928524.6A 2023-07-26 2023-07-26 Method, device, equipment and medium for mining conversation template of financial business Pending CN117057333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310928524.6A CN117057333A (en) 2023-07-26 2023-07-26 Method, device, equipment and medium for mining conversation template of financial business

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310928524.6A CN117057333A (en) 2023-07-26 2023-07-26 Method, device, equipment and medium for mining conversation template of financial business

Publications (1)

Publication Number Publication Date
CN117057333A true CN117057333A (en) 2023-11-14

Family

ID=88652661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310928524.6A Pending CN117057333A (en) 2023-07-26 2023-07-26 Method, device, equipment and medium for mining conversation template of financial business

Country Status (1)

Country Link
CN (1) CN117057333A (en)

Similar Documents

Publication Publication Date Title
AU2019419888B2 (en) System and method for information extraction with character level features
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN110032724B (en) Method and device for recognizing user intention
CN110046648B (en) Method and device for classifying business based on at least one business classification model
CN109816231A (en) Workflow processing method, electronic device and readable storage medium storing program for executing
CN113051384B (en) User portrait extraction method based on dialogue and related device
CN114722199A (en) Risk identification method and device based on call recording, computer equipment and medium
CN114298845A (en) Method and device for processing claim settlement bills
CN113962799A (en) Training method of wind control model, risk determination method, device and equipment
CN111368066A (en) Method, device and computer readable storage medium for acquiring dialogue abstract
CN117278675A (en) Outbound method, device, equipment and medium based on intention classification
CN113705164A (en) Text processing method and device, computer equipment and readable storage medium
CN115840808B (en) Technological project consultation method, device, server and computer readable storage medium
CN112581297A (en) Information pushing method and device based on artificial intelligence and computer equipment
CN115146653B (en) Dialogue scenario construction method, device, equipment and storage medium
CN117057333A (en) Method, device, equipment and medium for mining conversation template of financial business
CN113887214A (en) Artificial intelligence based wish presumption method and related equipment thereof
CN113449506A (en) Data detection method, device and equipment and readable storage medium
CN113987202A (en) Knowledge graph-based interactive telephone calling method and device
CN112329468B (en) Method and device for constructing heterogeneous relation network, computer equipment and storage medium
WO2023233467A1 (en) Information identification device, information identification method, and program
CN113722432B (en) Method and device for associating news with stocks
CN112015888B (en) Abstract information extraction method and abstract information extraction system
CN117648440A (en) Intention classification method, device, equipment and medium based on multi-channel aggregation
CN117273451A (en) Enterprise risk information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination