CN117577107A

CN117577107A - Method, device and equipment for matching discouraging strategy based on voice dialogue information

Info

Publication number: CN117577107A
Application number: CN202311375787.5A
Authority: CN
Inventors: 谢朝霞; 肖建林; 杨磊
Original assignee: Shenzhen Anluo Technology Co ltd
Current assignee: Shenzhen Anluo Technology Co ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-02-20

Abstract

The invention discloses a method, a device and equipment for matching dissuasion strategies based on voice dialogue information, wherein the method comprises the following steps: establishing a voice call with the host terminal, and playing a plurality of preset dissuading techniques to the host terminal; acquiring voice reply content of a host terminal, identifying the voice reply content, and obtaining a semantic identification result according to the identification content; matching a plurality of preset dialogue questions and a plurality of preset event labels according to semantic recognition results; integrating the dialog conclusions to obtain target dissuasion conclusions; performing de-duplication processing on the host tag to obtain a target host portrait tag; and matching the corresponding discouraging strategy according to the target discouraging theory and a preset retry strategy. The embodiment of the invention can realize intelligent recognition of the dialogue information of the acquisition node; analyzing the dialogue to generate dissuading conclusions; and analyzing the host label according to the dialogue, and providing a reference for the effectiveness of subsequent dissuasion.

Description

Method, device and equipment for matching discouraging strategy based on voice dialogue information

Technical Field

The invention relates to the technical field of network security, in particular to a method, a device and equipment for matching dissuading strategies based on voice dialogue information.

Background

Telecommunication fraud refers to the fact that false information is compiled through telephone, network and short message modes, a fraud bureau is set, remote and non-contact fraud is implemented on victims, criminals of money or transfer are induced to victims, the purpose of fraud is achieved in a mode of impersonating others and impersonating and forging various legal garments and forms, such as impersonating public inspection methods, merchant company manufacturers, national office staff, banking staff and other various institution staff, and fraud is carried out in a mode of impersonating and impersonating recruiters, swiping bills, loans, mobile phone positioning and the like.

With the development of technology, a series of technical tools are developed and used, and many technicians and citizens can quickly develop and spread by means of communication tools such as mobile phones, fixed phones, networks and the like and non-contact fraud implemented by modern technology and the like, so that great losses are caused to the masses of people.

In order to reduce the loss of telecommunication fraud to the masses, it is necessary to discourage masses from receiving fraud information and telephone fraud. The voice discouraging in the prior art is that the telephone communication needs to be carried out manually, the discouraging time is long, and a large amount of labor cost needs to be consumed.

Accordingly, the prior art is still in need of improvement and development.

Disclosure of Invention

In view of the shortcomings of the prior art, the invention aims to provide a method, a device and equipment for matching dissuasion strategies based on voice dialogue information, and aims to solve the technical problems that voice dissuasion in the prior art is to manually carry out telephone communication, long dissuasion time and large labor cost consumption are required.

The technical scheme of the invention is as follows:

a method for matching discouraging policies based on voice conversation information, applied to a voice robot, the method comprising:

establishing a voice call with the host terminal, and playing a plurality of preset dissuading techniques to the host terminal;

acquiring voice reply content of a host terminal, identifying the voice reply content, and obtaining a semantic identification result according to the identification content;

matching a plurality of preset dialogue questions and a plurality of preset event labels according to semantic recognition results;

integrating the dialog conclusions to obtain target dissuasion conclusions;

performing de-duplication processing on the host tag to obtain a target host portrait tag;

and matching the corresponding discouraging strategy according to the target discouraging theory and a preset retry strategy.

Further, the establishing a voice call with the host terminal, playing a plurality of preset dissuading techniques to the host terminal, including:

dissuading operation is configured in advance according to different fraud scenes;

and establishing a voice call with the host terminal, and playing a plurality of dissuading call techniques under a pre-configured fraud scene to the host terminal based on the voice call.

Further preferably, the obtaining the voice reply content of the host terminal, identifying the voice reply content, and obtaining a semantic identification result according to the identification content includes:

acquiring voice response content of a host terminal, identifying the voice response content based on asr service, and converting the voice response content into text content according to an identification result;

and carrying out semantic analysis on the text content to obtain a semantic recognition result.

Further preferably, the integrating the dialog conclusions to obtain a target dissuasion conclusion includes:

the priority of the dialogue conclusion is sequenced in advance;

and acquiring a plurality of dialogue conclusions generated in the voice call process, and taking the dialogue conclusion with the highest priority in the plurality of dialogue conclusions as a target dissuasion conclusion.

Preferably, the obtaining the voice reply content of the host terminal, identifying the voice reply content based on asr service, and converting the voice reply content into text content according to the identification result includes:

acquiring voice reply content of a host terminal, and preprocessing the voice reply content;

uploading the preprocessed voice reply content to an ASR server, and sending an identification request containing voice parameter information to the ASR server;

and converting the voice reply content into text content based on the recognition result returned by the ASR server.

Further, the semantic analysis is performed on the text content to obtain a semantic recognition result, which includes:

text data for semantic analysis is obtained, the text data is preprocessed, and feature extraction is carried out on the preprocessed data to generate a semantic analysis sample;

training the deep learning model based on the semantic analysis sample to obtain a trained semantic analysis model; inputting the text content into the semantic analysis model, and obtaining a semantic recognition result according to the output result of the semantic analysis model.

Further, matching the corresponding discouraging policy according to the target discouraging theory and a preset retry policy, including:

and obtaining a target dissuasion conclusion, and performing voice dissuasion again if the target dissuasion conclusion is that the host has a risk of being deceptively re-.

Another embodiment of the present invention provides an apparatus for matching discouraging policies based on voice dialog information, the apparatus comprising:

the dissuasion voice sending module is used for establishing voice communication with the host terminal and playing a plurality of pre-configured dissuasion calls to the host terminal;

the semantic recognition module is used for acquiring voice reply content of the host terminal, recognizing the voice reply content and obtaining a semantic recognition result according to the recognition content;

the data matching module is used for matching a plurality of dialog theory and a host label which are configured in advance according to the semantic recognition result;

the data integration module is used for integrating the plurality of dialogue conclusions to obtain target dissuasion conclusions;

the data deduplication module is used for performing deduplication processing on the event label to obtain a target event portrait label;

and the discouraging strategy matching module is used for matching the corresponding discouraging strategy according to the target discouraging theory and a preset retry strategy.

Another embodiment of the present invention provides an apparatus for matching discouraging policies based on voice dialog information, the apparatus including at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of matching discouraging policies based on voice dialog information described above.

Another embodiment of the present invention also provides a non-volatile computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described method of matching discouraging policies based on voice dialog information.

The beneficial effects are that: the embodiment of the invention can realize intelligent recognition of the dialogue information of the acquisition node; analyzing the dialogue to generate dissuading conclusions; and analyzing the host label according to the dialogue, and providing a reference for the effectiveness of subsequent dissuasion.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of a preferred embodiment of a method for matching discouraging policies based on voice dialog information in accordance with the present invention;

FIG. 2 is a schematic diagram of functional modules of a preferred embodiment of an apparatus for matching discouraging policies based on voice dialog information in accordance with the present invention;

fig. 3 is a schematic hardware structure of a device for matching discouraging policies based on voice dialogue information according to a preferred embodiment of the present invention.

Detailed Description

The present invention will be described in further detail below in order to make the objects, technical solutions and effects of the present invention more clear and distinct. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Embodiments of the present invention are described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a preferred embodiment of a method for matching dissuading strategies based on voice dialogue information. As shown in fig. 1, it comprises the steps of:

step S100, establishing a voice call with the host terminal, and playing a plurality of preset dissuasion calls to the host terminal;

step S200, obtaining voice reply content of a host terminal, identifying the voice reply content, and obtaining a semantic identification result according to the identification content;

step S300, matching a plurality of dialog theory and event labels which are configured in advance according to semantic recognition results;

step S400, integrating the dialog conclusions to obtain a target dissuasion conclusion;

s500, performing de-duplication processing on the event label to obtain a target event main image label;

and step S600, matching the corresponding discouraging strategy according to the target discouraging theory and a preset retry strategy.

In specific implementation, the embodiment of the invention is applied to the voice robot. Configuring a conversation according to different fraud scenes, determining branches and branch trends after judging answers, and carrying out conversations according to the branch conversation;

acquiring a host answer, acquiring answer content of the host, and determining a judgment branch according to the intention or the answer;

using asr service to perform voice recognition and converting voice into characters; carrying out semantic analysis on the characters, and further carrying out branch judgment; after the branch is judged, the branch is connected to different jump nodes;

pre-configuring dialogue conclusion and a host label for different dialogue nodes or jump nodes; after the branch is judged, the branch is connected to different jump nodes; different dialogue nodes or jump nodes are pre-configured with dialogue conclusion and a host label; for example, the master tags include "whether or not transferred/transferred", "whether or not fraud/yes", "fraud/alarm" and the like; configuration example: for example, the dialogue configuration of impersonation customer service scene, the initial dialogue is' feed-! You are here the police anti-fraud center, we find that there are fraud molecules masquerading as a website customer service and you are in contact, and that you have an impression? If the owner answers in the affirmative, the dialogue is concluded to be "spoofed", and the owner labels are "whether fraud is encountered/yes". Determining a hit label of each call according to the host labels configured by the dialogue node and the skip node, and de-duplicating the label after the dialogue is finished to form a final host portrait label;

multiple rounds of conversations exist in each conversation, and multiple conversation conclusions are generated; after the conversation is detected to be ended, determining a target dissuasion conclusion of each conversation according to the priority of the conversation conclusion; determining labels in each call according to the host labels configured by the dialogue node and the jump node;

and matching the discouraging strategy according to the discouraging theory and the retry strategy. The discouraging policy includes whether to perform a re-AI discouraging.

Further, establishing a voice call with the host terminal, and playing a plurality of pre-configured dissuasion techniques to the host terminal, including:

In the specific implementation, the telephone operation is configured according to different fraud scenes, after the answers are judged, the branches and the branch directions are determined, and further the dialogue is performed according to the branch telephone operation.

Examples of pre-configured fraud scenario discouraging surgeries are as follows:

impersonation public inspection: dissuading using impersonation public inspection, such as initiating a dialogue of: "feed, hello, here the police anti-fraud center, this time call is to give you a phone early warning, whether there is a name of public security, court contacts you say you identity is faked, crime is suspected, let you transfer money to security account for cooperation investigation? The fraud process and fraud condition of the masquerading public inspection method are further verified after the scene is confirmed;

impersonation customer service: dissuading with impersonation customer service techniques, such as initiating a conversation of: "feed-! You are here the police anti-fraud center, we find that there are fraud molecules masquerading as a website customer service and you are in contact, and that you have an impression? After confirming the scene, the fraud process and fraud condition of the impersonation customer service are further verified.

Further, obtaining the voice reply content of the host terminal, identifying the voice reply content, and obtaining a semantic identification result according to the identification content, wherein the method comprises the following steps:

When the method is implemented, after answering by a host, the answer of the host is collected, and a judgment branch is determined according to intention or question and answer; speech recognition is performed using asr services, converting speech to text.

And carrying out semantic analysis on the characters, and further carrying out branch judgment to obtain a semantic recognition result.

Further, integrating the plurality of dialogue conclusions to obtain a target dissuasion conclusion, including:

the priority of the dialogue conclusion is sequenced in advance;

When the method is implemented, a plurality of dialogue conclusion can be generated when a plurality of rounds of dialogue exist in each dialogue, and after the dialogue is ended, the final dissuasion conclusion of each dialogue is determined according to the priority of the dialogue conclusion;

the dialog conclusions are subjected to priority numerical ranking (1-n), and the smaller the number is, the higher the priority is;

and according to the dialog conclusion generated in the dialog process, taking the high priority as a final dissuasion conclusion according to the priority.

Further, obtaining the voice reply content of the host terminal, identifying the voice reply content based on asr service, and converting the voice reply content into text content according to the identification result, including:

In particular implementations, voice data is collected: collecting audio data containing speech recognition to be performed;

audio pretreatment: preprocessing the audio, such as noise reduction, noise removal and the like, so as to improve the recognition accuracy;

uploading data: uploading the preprocessed audio data to an ASR service;

request identification: sending an identification request to an ASR service, wherein the identification request comprises relevant parameters of audio data, such as sampling rate, coding format and the like;

identification processing: after receiving the audio data, the ASR service performs processing steps such as acoustic feature extraction, voice model matching and the like, and converts the audio into text;

and (5) returning a result: the ASR service returns the recognition result to the user in a text form;

and (3) subsequent treatment: the recognition result can be subjected to subsequent processing, such as semantic understanding, language translation and the like, according to the needs.

Further, performing semantic analysis on the text content to obtain a semantic recognition result, including:

In specific implementation, data preparation: collecting and sorting text data for semantic analysis, which can be marked data, a large-scale corpus or field-related data;

text preprocessing: cleaning and preprocessing an original text, including removing special characters, punctuation marks, stop words and the like, and performing word segmentation operation to divide the text into basic words or phrases;

feature extraction: meaningful features are extracted from the pre-processed text, which may include word frequencies, TF-IDFs, word vectors, etc. The purpose of feature extraction is to convert text into a computer-understandable numerical representation;

and (3) establishing a model: suitable machine learning or deep learning algorithms are selected to construct the semantic analysis model, common methods include naive bayes, support Vector Machines (SVMs), decision trees, random forests, convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), and the like. According to different tasks, a classifier, a clustering algorithm or a sequence model and the like can be selected;

model training: training the model by using the marked data, and enabling the model to learn semantic information from the input text by iteratively optimizing model parameters;

model evaluation: the trained model is evaluated by using an independent test data set, and common evaluation indexes comprise accuracy, recall, F1 value and the like so as to measure the performance of the model;

prediction and application: and inputting the new text into the trained model for prediction to obtain a corresponding semantic analysis result. These results may be used for different tasks such as text classification, emotion analysis, intent recognition, named entity recognition, etc.

Further, according to the target dissuasion conclusion and a preset retry strategy, matching the corresponding dissuasion strategy comprises:

In specific implementation, according to the dissuasion theory and the retry strategy, whether AI dissuasion is carried out again is determined, so that effectiveness of dissuasion is ensured. AI dissuasion means that AI dissuades according to different scene dialects; retry strategy refers to re-dissuading, such as re-dissuading if the dialog is concluded to be "missed" or the like.

According to the embodiment of the method, the embodiment of the invention can collect dialogue information; analyzing according to the dialogue content to form an discouraging conclusion; marking a host according to the key information collected by the dialogue; and further determining dissuading conditions according to the strategy.

It should be noted that, there is not necessarily a certain sequence between the steps, and those skilled in the art will understand that, in different embodiments, the steps may be performed in different orders, that is, may be performed in parallel, may be performed interchangeably, or the like.

Another embodiment of the present invention provides an apparatus for matching discouraging policies based on voice dialogue information, as shown in fig. 2, the apparatus 1 includes:

an discouraging voice sending module 11, configured to establish a voice call with the host terminal, and play a plurality of discouraging techniques configured in advance to the host terminal;

the semantic recognition module 12 is configured to obtain voice reply content of a host terminal, recognize the voice reply content, and obtain a semantic recognition result according to the recognition content;

the data matching module 13 is used for matching a plurality of dialog theory and a host label which are configured in advance according to the semantic recognition result;

a data integration module 14, configured to integrate the plurality of dialogue conclusions to obtain a target dissuasion conclusion;

a data deduplication module 15, configured to perform deduplication processing on the principal label, so as to obtain a target principal sketch label;

and the discouraging policy matching module 16 is used for matching the corresponding discouraging policy according to the target discouraging theory and a preset retry policy.

The specific implementation is shown in the method embodiment, and will not be described herein.

Further, the dissuading voice transmission module 11 is specifically configured to:

Further, the semantic recognition module 12 is specifically configured to:

Further, the data integration module 14 is specifically configured to:

the priority of the dialogue conclusion is sequenced in advance;

Further, the semantic recognition module 12 is further configured to:

Further, discouraging policy matching module 16 is specifically for:

Another embodiment of the present invention provides an apparatus for matching discouraging policies based on voice dialog information, as shown in fig. 3, the apparatus 10 includes:

one or more processors 110 and a memory 120, one processor 110 being illustrated in fig. 3, the processors 110 and the memory 120 being coupled via a bus or other means, the bus coupling being illustrated in fig. 3.

Processor 110 is used to complete the various control logic of device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single-chip microcomputer, ARM (Acorn RISC Machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. The processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The memory 120 is used as a non-volatile computer readable storage medium, and may be used to store a non-volatile software program, a non-volatile computer executable program, and a module, such as program instructions corresponding to a method for matching discouraging policies based on voice dialogue information in an embodiment of the invention. Processor 110 performs various functional applications of device 10 and data processing, i.e., implements the method of matching discouraging policies based on voice dialog information in the method embodiments described above, by running non-volatile software programs, instructions, and units stored in memory 120.

The memory 120 may include a storage program area that may store an operating device, an application program required for at least one function, and a storage data area; the storage data area may store data created from the use of the device 10, etc. In addition, memory 120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 120 may optionally include memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more units are stored in memory 120 that, when executed by one or more processors 110, perform the method of matching discouraging policies based on voice dialog information in any of the method embodiments described above, e.g., perform method steps S100 through S600 in fig. 1 described above.

Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, e.g., to perform the method steps S100 through S600 of fig. 1 described above.

By way of example, nonvolatile storage media can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM may be available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchl ink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memories of the operating environments described herein are intended to comprise one or more of these and/or any other suitable types of memory.

Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of matching discouraging policies based on voice dialog information of the method embodiment described above. For example, the above-described method steps S100 to S600 in fig. 1 are performed.

The embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may exist in a computer-readable storage medium such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the respective embodiments or some parts of the embodiments.

Conditional language such as "capable," "energy," "possible," or "may," among others, is generally intended to convey that a particular embodiment can include (but other embodiments do not include) particular features, elements, and/or operations unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements and/or operations are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without input or prompting, whether these features, elements and/or operations are included or are to be performed in any particular embodiment.

What has been described herein in the specification and drawings includes examples of methods and apparatus capable of providing matching discouraging policies based on voice conversation information. It is, of course, not possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the present disclosure, but it may be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications may be made thereto without departing from the scope or spirit of the disclosure. Further, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings, and practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and figures be considered illustrative in all respects as illustrative and not limiting. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A method for matching discouraging policies based on voice dialog information, applied to a voice robot, the method comprising:

integrating the dialog conclusions to obtain target dissuasion conclusions;

2. The method for matching discouraging policies based on voice conversation information of claim 1 wherein the establishing a voice call with the host terminal, playing a number of discouraging utterances pre-configured to the host terminal, includes:

3. The method for matching dissuading policy based on voice dialogue information according to claim 2, wherein the obtaining the voice reply content of the host terminal, identifying the voice reply content, and obtaining the semantic identification result according to the identification content comprises:

4. The method for matching discouraging policies based on voice conversation information of claim 3 wherein the integrating the number of conversation conclusions to arrive at a target discouraging conclusion includes:

the priority of the dialogue conclusion is sequenced in advance;

5. The method for matching discouraging policy based on voice dialogue information according to claim 4, wherein the obtaining the voice response content of the host terminal, identifying the voice response content based on asr service, and converting the voice response content into text content according to the identification result includes:

6. The method for matching discouraging policies based on voice conversation information of claim 5, wherein the performing semantic analysis on the text content to obtain a semantic recognition result includes:

7. The method for matching discouraging policies based on voice conversation information of claim 6, wherein the matching the corresponding discouraging policies according to the target discouraging theory and a preset retry policy includes:

8. An apparatus for matching discouraging policies based on voice conversation information, the apparatus comprising:

9. An apparatus for matching discouraging policies based on voice dialog information, the apparatus comprising at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of matching discouraging policies based on voice dialog information of any of claims 1-7.

10. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of matching discouraging policies based on voice dialog information of any of claims 1-7.