CN116153319A

CN116153319A - High-risk user detection method and system based on voiceprint recognition

Info

Publication number: CN116153319A
Application number: CN202310057792.5A
Authority: CN
Inventors: 钱旭盛; 俞阳; 康雨萌; 何玮; 翟千惠; 朱萌; 王伟; 陈可
Original assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Current assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority date: 2023-01-13
Filing date: 2023-01-13
Publication date: 2023-05-23

Abstract

A high risk user detection method and system based on voiceprint recognition, the method includes: collecting voice signals of business users handled by business hall counters, performing front-end processing on the voice signals, converting the voice signals into a voice feature vector set, inputting the voice feature vector set, and iterating a voice signal model by training a GMM algorithm; according to the GMM model as a voiceprint model, a user voiceprint library is established, high-risk user crowd information in a client view module of the synchronous power marketing system is synchronized, and information matching is carried out after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized; the intelligent worker board collects the voice information of the first time after the customer approaches the hall, compares the voice information with the voiceprint identification VID in the voiceprint library, and gives an early warning if the voice information is the same as the voiceprint identification VID. The high-risk users are found to arrange business hall team leader to carry out business butt joint, the occurrence rate of business hall customer complaints is reduced, and the user service experience is improved.

Description

High-risk user detection method and system based on voiceprint recognition

Technical Field

The invention belongs to the technical field of electric power operation, and particularly relates to a high-risk user detection method and system based on voiceprint recognition.

Background

The business hall is used as a first main battlefield for the service of the electric power clients, and is concerned with not only the reputation and profits of enterprises, but also the personal interests of the clients. In order to avoid expanding the situation and affecting other users, business hall clients collect and analyze business handling records of daily business hall, judge the emotion of users through semantic understanding and emotion recognition, and establish a voice print library of risk users. Therefore, the high-risk user detection method and system based on voiceprint recognition are provided, so that the service quality and the service level of the business hall are greatly improved.

Prior art document 1 (CN 105989267 a) discloses a security protection method and device based on voiceprint recognition. The method comprises the following steps: collecting voice data of a current user of a terminal, and extracting voiceprint characteristic information from the voice data; matching the extracted voiceprint characteristic information of the current user of the terminal with a prestored voiceprint model of a terminal owner, and judging whether the current user of the terminal is the terminal owner or not; and when the current user of the terminal is not the terminal owner, carrying out safety protection processing on the terminal. The disadvantage of the prior art document 1 is that it is not possible to determine that the user is an emotional abnormality client. The invention can judge whether the emotion of the current user is abnormal (anger, complaint, etc.) while identifying the identity of the user.

Prior art document 2 (CN 109769099B) discloses a method and apparatus for detecting an abnormality of a call person. The method comprises the following steps: when a call starts, the terminal equipment acquires real audio and video data of a call object needing abnormal detection and a corresponding pre-trained multi-stage neural network detection model; in the call progress process, the terminal equipment collects call data according to a preset data collection strategy; for each call object, inputting the currently collected call data and the real audio/video data of the call object into the model of the call object, and determining whether the call object is abnormal or not according to a detection result output by the model; the conversation data comprise image data and/or voice data, and the recognition mode adopted by the model comprises face recognition, voiceprint recognition, limb action recognition and/or lip language recognition. The prior art document 2 has the defects that the abnormal conversation task cannot be identified through dimensionalities such as semantics, emotion and the like, meanwhile, the invention is based on preliminary judgment that the abnormal detection is needed to be retested, and the abnormal detection identification cannot be carried out under large-scale data. The invention can identify abnormal events of each customer in the hall through the dimensionalities of semantics, emotion and the like.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a high-risk user detection method based on voiceprint recognition, which establishes an electric high-risk customer voiceprint library, realizes early warning of the high-risk user at the first time after the user has arrived at a hall through voiceprint comparison, reduces risk of customer complaints, and improves customer service experience.

The invention adopts the following technical scheme.

A high risk user detection method based on voiceprint recognition comprises the following steps:

step 1, collecting voice signals of business users handled by business hall counters, and performing front-end processing on the voice signals to convert the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;

step 2, inputting the voice characteristic vector set obtained in the step 1, and training a GMM model;

step 3, establishing a user voiceprint library according to the GMM model obtained in the step 2 as a voiceprint model, synchronizing high-risk user crowd information in a client view module of the power marketing system, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;

and 4, collecting voice information at the first time after a customer approaches a hall through the intelligent worker, extracting voiceprint identification VID through the step 1 and the step 2, comparing the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and carrying out early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.

Preferably, in step 1, the voice signal collection handled by the business hall counter user business is completed through the intelligent pickup device arranged on the counter.

Preferably, in step 1, the voice signal preprocessing uses interference subtraction to perform noise spectrum filtering on the collected voice signal.

In step 1, extracting characteristic parameters of a voice signal by using an MFCC specifically includes: based on the logarithmic relation between Mel scale and frequency to simulate the non-linear relation between the sound level and the actual frequency, the voice signal of voiceprint time domain is extracted by Mel cepstrum coefficient and converted into the voice characteristic vector set for representing the characteristic of the speaker.

Preferably, step 2 specifically includes:

step 2.1, inputting the voice feature vector set obtained in the step 1, and training by using a global background model to obtain an initialized GMM recognition model;

step 2.2, clustering the GMM model by using a fuzzy K-means method;

and 2.3, performing iterative optimization on the clustering center by using an EM algorithm to obtain a final voice signal model.

Preferably, in step 3, the user voiceprint information is stored in the user voiceprint storage, and the voiceprint information includes: voiceprint identification VID, voiceprint model, voiceprint parameters.

Preferably, a high risk user refers to a customer who is quarrying in a business hall and an agent or complaining about an agent more than twice in a year.

Preferably, in step 3, when the customer handles the business in the hall, the user brushes the identity card on the intelligent pickup device, and the pickup device reads the user identity information and transmits the user identity information to the intelligent voice analysis system of the business hall, and the intelligent voice analysis system of the business hall binds the voiceprint with the user identity.

And the energy Internet marketing service system 360 is connected with a client portrait module, and the voiceprint library information is compared with the high-risk user information to obtain the high-risk voiceprint information by recording the client information with high risk user feature fields in the client portrait.

Preferably, step 4 specifically includes:

step 4.1, collecting voice of clients in a hall through intelligent cards of a hall manager, extracting voice voiceprint features through the method of the step 1, and comparing the extracted voice voiceprint features with voiceprint information in a voiceprint library;

and 4.2, when the extracted voice voiceprint characteristics are the same as the high-risk voiceprint information in the voiceprint library, judging that the user is a high-risk person, and actively initiating high-risk early warning by the intelligent voice analysis system of the business hall.

A voiceprint recognition based high risk user detection system comprising: the device comprises a collection and preprocessing module, a modeling module, a binding module and a detection module, wherein:

the collection and preprocessing module is used for collecting voice signals of business users handled by a business hall counter, performing front-end processing on the voice signals, and converting the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;

the modeling module is used for inputting a voice feature vector set and training a GMM model;

the binding module is used for using the GMM model as a voiceprint model, establishing a user voiceprint library, synchronizing high-risk user crowd information in the power marketing system client view module, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;

the detection module collects voice information of the first time after a customer approaches a hall through the intelligent worker, extracts the voiceprint identification VID, compares the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and performs early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.

A terminal comprising a processor and a storage medium; wherein:

the storage medium is used for storing instructions;

the processor is operative according to the instructions to perform the steps of a high risk user detection method based on voiceprint recognition.

A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements the steps of a high risk user detection method based on voiceprint recognition.

Compared with the prior art, the method has the advantages that the voice print library of the power customer is built, the user characteristics are positioned according to the images of the marketing customers, and the high-risk customers are identified and captured through voice prints to perform accurate service. Further reduces the probability of the occurrence of the quarry and the complaint of the business hall, improves the operation and management functions of the intelligent voice technology in the field service of the business hall, and greatly improves the intelligent level of the operation and management of the marketing service.

Drawings

FIG. 1 is a flow chart of a high risk user detection method based on voiceprint recognition of the present invention;

FIG. 2 is a front end processing module flow diagram;

FIG. 3 is a model training flow diagram;

FIG. 4 is a MFCC extraction process;

FIG. 5 is a reasonable combination of front-end characteristic parameters;

fig. 6 is a GMM model parameter estimation flow chart.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described herein are merely some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are within the scope of the present invention.

Example 1.

As shown in fig. 1, embodiment 1 of the present invention provides a high risk user detection method based on voiceprint recognition, which in a preferred but non-limiting embodiment of the present invention comprises the steps of:

step 1, collecting voice signals of business users transacted by a business hall counter, and realizing preliminary processing of the voice signals through a front-end processing module to obtain a voice feature vector set, wherein the processing flow comprises voice signal preprocessing and feature parameter extraction; as shown in fig. 2, the method specifically includes:

step 1.1, voice signal acquisition: the intelligent pickup equipment arranged on the counter is used for completing voice signal collection of business handling of business hall counter users;

step 1.2, preprocessing voice signals: preprocessing the collected voice signals to avoid frequency domain aliasing distortion of the voice signals and facilitate subsequent signal processing;

step 1.3, noise elimination: noise spectrum filtering is carried out on the collected voice signals through interference subtraction, and invalid voice signals are restrained;

step 1.4, extracting characteristic parameters: collecting voice signal characteristic parameters of the voiceprint time domain through Mel cepstrum coefficient (Mel-frequency Cesptrum Coefficient, MFCC), simulating the nonlinear relation between the sound level heard by human ear and the actual frequency based on the logarithmic relation between Mel scale and frequency, extracting voice signal of the voiceprint time domain through Mel cepstrum coefficient, converting into parameter vector set for representing the characteristic of speaker,

the Mel frequency versus actual frequency can be approximated by:

Mel(f)＝2595log(1+f/700)

or alternatively

Mel(f)＝1127ln(1+f/700)

Where f represents the actual frequency of the speech signal distribution.

The voice signal is generally distributed at 50-4kHz, so that F is generally 4kHz in the application of speaker recognition.

The extraction process of MFCC parameters is shown in fig. 4.

Characteristic parameter combination: the MFCC parameters are taken to be 0-12 th order, or 0-18 th order in speaker identification. After the 0-order MFCC feature extraction is completed, a speech feature vector set is output for feature processing as shown in fig. 5.

Step 2, after the processing process of the voice signal, the voice signal in the time domain is converted into a parameter vector set for representing the characteristics of the speaker. It is then necessary to train the recognition model through these sets of speech feature vectors,

the present embodiment preferably the recognition model is a Gaussian Mixture Model (GMM).

The parameter composition of the GMM model comprises a mean vector, a covariance matrix and a weight matrix.

Step 2.1, training a global background model: as shown in fig. 3, the global background model (universal background model UBM) is trained using the speech data of all speakers to obtain a gaussian mixture model with a high degree of mixture. The global background model is generally trained first, and then each speaker recognition model is obtained in an adaptive manner to improve the algorithm efficiency, and the UBM model contains the commonalities of all speakers.

Step 2.2, initializing a clustering center: the method for initializing the clustering center uses a fuzzy K mean value. The method needs to be described starting from Vector Quantization (VQ), which is subordinate to a template matching model, and the LBG algorithm in VQ is used in initializing the GMM model.

Step 2.3, fast converging the optimal clustering center: the initialization process of the clustering centers is obtained through a fuzzy K-means algorithm, but the centers obtained through the algorithm are likely to be optimal values in a local range to a large extent, so that the obtained GMM models are stable and accurate through adjustment of the clustering centers through an Expected Maximum (EM) algorithm, and the system automatically forms corresponding voiceprint identification VID, voiceprint models, voiceprint parameters and the like after model generation.

GMM model generates lambda feature vector set x _t The likelihood of (2) is as follows:

P(X|λ)＝∏P(x _t |λ)

in the method, in the process of the invention,

λ represents an initialization parameter set of the GMM model;

x _t the Mel cepstrum coefficient of each segment of voice frame; wherein, (1) T is less than or equal to T);

λ＝{ω _j ，μ _j ，∑ _j j=1, 2, …, M. Wherein omega _j Weights, mu, for each order of components of the GMM model _j Is the mean vector, sigma _j Covariance matrix, M is the mixed 0 number of GMM model.

And substituting the Gaussian distribution probability density function into the process of solving the likelihood. P in this formula represents the dimension of the characteristic parameter when

Namely, the following formula is substituted when the matrix is a diagonal matrix.

In the method, in the process of the invention,

expressed as gaussian density variance.

The parameter estimation process of the Gaussian mixture model can be described as finding a new set of parameters

Make->

Then use model parameters +.>

The iteration is continued until the convergence condition is satisfied. First, define the Q (λ, λ) function, according to Jesen inequality, the parameter estimation can be converted into a process of maximizing Q (λ, λ).

Due to P (x _t ,j|λ)＝ω _j P(x _t Lambda, j) is obtained

Next, to determine the update formula of the model parameters for ωj, μj, Σj by applying the partial derivatives, the function Q (λ, λ) is maximized when the partial derivatives are zero. The EM algorithm is embodied in the above process, including the step E of calculating the intermediate statistic, i.e., the probability P (q _t ＝j|y _t Lambda) is expressed as

M steps are finding the satisfaction

By->

The function finds zero derivatives of three parameters { ωj, μj, Σj }, and updates the formula as follows:

after parameter estimation is performed on the EM algorithm iterated for 5-10 times, model parameters can basically be converged, and a flow chart for training the GMM model is shown in fig. 6.

Step 3, voiceprint binding: when a user transacts business in a business hall, the intelligent pickup device synchronizes user identity information to an intelligent voice analysis system of the business hall, the system establishes a user voiceprint library based on the identity information and voiceprint information, synchronizes high-risk user crowd information in a client view module of an electric power energy Internet marketing service system, and can calibrate high-risk personnel after synchronization;

step 3.1: voiceprint library building: establishing a user voiceprint store to store user voiceprint information

The voiceprint information includes: voiceprint identification VID, voiceprint model, voiceprint parameters;

wherein, voiceprint identification VID: the unique identification of the voiceprint feature can quickly find the voiceprint parameters of the client through the VID.

Step 3.2: identity binding: when a user transacts business in a business hall, the user brushes an identity card on intelligent pickup equipment (an out-cabinet interaction terminal), the pickup equipment reads user identity information and transmits the user identity information to an intelligent voice analysis system of the business hall, and the system binds voiceprints and user identities;

high risk personnel information synchronization: and the energy Internet marketing service system 360 is connected with a view-customer portrait module, and customer information (name, identity card, mobile phone number, household number and the like) with high risk of user characteristic fields in the customer portrait is recorded, so that voiceprint library information comparison is completed, and high risk voiceprint information is obtained.

High risk users refer to customers who are quarrying in business halls and agents or complaining about agents more than twice in a year.

And 4, collecting voice information at the first time after a customer approaches a hall through the intelligent worker board, extracting voiceprint identification VID through the steps 1 and 2, and carrying out early warning reminding when an abnormal user is detected.

Step 4.1, voice matching: collecting voice of business staff in a hall through a hall manager intelligent work board, extracting voiceprint parameters through the method of the step 1, comparing the extracted voiceprint characteristics with voiceprint information in a voiceprint library, and generating a piece of data if the extracted voiceprint characteristics are the same as the voiceprint information in the voiceprint library;

step 4.2, early warning reminding: the extracted voiceprint features and the voiceprint library of the high-risk user have the same voiceprint information, so that the user can be judged to be a high-risk person, and the system initiatively initiates high-risk early warning.

Example 2.

Example 3.

Embodiment 3 of the present invention provides a computer-readable storage medium.

A computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a step in a high risk user detection based on voiceprint recognition according to embodiment 1 of the present invention.

The detailed steps are the same as those of the high risk user detection method based on voiceprint recognition provided in embodiment 1, and will not be described here again.

Example 4.

The embodiment 4 of the invention provides electronic equipment.

An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a voiceprint recognition based high risk user detection method according to embodiment 1 of the present invention when the program is executed.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The high-risk user detection method based on voiceprint recognition is characterized by comprising the following steps of:

2. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,

in the step 1, the voice signal collection of business handling of business hall counter users is completed through intelligent pickup equipment arranged on the counter.

3. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,

in step 1, noise spectrum filtering is performed on the collected voice signals by adopting interference subtraction method in voice signal preprocessing.

4. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,

5. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,

the step 2 specifically comprises the following steps:

step 2.2, clustering the GMM model by using a fuzzy K-means method;

6. A high risk user detection method based on voiceprint recognition according to claim 5, wherein,

in step 3, user voiceprint information is stored in the user voiceprint storage, and the voiceprint information comprises: voiceprint identification VID, voiceprint model, voiceprint parameters.

7. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,

8. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,

in step 3, when the customer handles business in the business hall, the user brushes an identity card on the intelligent pickup device, the pickup device reads user identity information and transmits the user identity information to the business hall intelligent voice analysis system, and the business hall intelligent voice analysis system binds voiceprints and user identities.

9. The method for high risk user detection based on voiceprint recognition of claim 8,

10. The method for high risk user detection based on voiceprint recognition of claim 9,

the step 4 specifically comprises the following steps:

11. A voiceprint recognition based high risk user detection system utilizing the method of any one of claims 1-10, comprising: the device comprises a collection and preprocessing module, a modeling module, a binding module and a detection module, and is characterized in that:

12. A terminal comprising a processor and a storage medium; the method is characterized in that:

the storage medium is used for storing instructions;

the processor is operative according to the instructions to perform the steps of a high risk user detection method based on voiceprint recognition according to any one of claims 1 to 10.

13. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of a voiceprint recognition based high risk user detection method according to any one of claims 1 to 10.