CN116153319A - High-risk user detection method and system based on voiceprint recognition - Google Patents
High-risk user detection method and system based on voiceprint recognition Download PDFInfo
- Publication number
- CN116153319A CN116153319A CN202310057792.5A CN202310057792A CN116153319A CN 116153319 A CN116153319 A CN 116153319A CN 202310057792 A CN202310057792 A CN 202310057792A CN 116153319 A CN116153319 A CN 116153319A
- Authority
- CN
- China
- Prior art keywords
- voiceprint
- voice
- information
- user
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000013459 approach Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 239000003795 chemical substances by application Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000011410 subtraction method Methods 0.000 claims 1
- 230000001360 synchronised effect Effects 0.000 abstract 2
- 210000001503 joint Anatomy 0.000 abstract 1
- 230000008569 process Effects 0.000 description 9
- 230000002159 abnormal effect Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000008451 emotion Effects 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B31/00—Predictive alarm systems characterised by extrapolation or other computation using updated historic data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Emergency Management (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Telephonic Communication Services (AREA)
Abstract
A high risk user detection method and system based on voiceprint recognition, the method includes: collecting voice signals of business users handled by business hall counters, performing front-end processing on the voice signals, converting the voice signals into a voice feature vector set, inputting the voice feature vector set, and iterating a voice signal model by training a GMM algorithm; according to the GMM model as a voiceprint model, a user voiceprint library is established, high-risk user crowd information in a client view module of the synchronous power marketing system is synchronized, and information matching is carried out after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized; the intelligent worker board collects the voice information of the first time after the customer approaches the hall, compares the voice information with the voiceprint identification VID in the voiceprint library, and gives an early warning if the voice information is the same as the voiceprint identification VID. The high-risk users are found to arrange business hall team leader to carry out business butt joint, the occurrence rate of business hall customer complaints is reduced, and the user service experience is improved.
Description
Technical Field
The invention belongs to the technical field of electric power operation, and particularly relates to a high-risk user detection method and system based on voiceprint recognition.
Background
The business hall is used as a first main battlefield for the service of the electric power clients, and is concerned with not only the reputation and profits of enterprises, but also the personal interests of the clients. In order to avoid expanding the situation and affecting other users, business hall clients collect and analyze business handling records of daily business hall, judge the emotion of users through semantic understanding and emotion recognition, and establish a voice print library of risk users. Therefore, the high-risk user detection method and system based on voiceprint recognition are provided, so that the service quality and the service level of the business hall are greatly improved.
Prior art document 1 (CN 105989267 a) discloses a security protection method and device based on voiceprint recognition. The method comprises the following steps: collecting voice data of a current user of a terminal, and extracting voiceprint characteristic information from the voice data; matching the extracted voiceprint characteristic information of the current user of the terminal with a prestored voiceprint model of a terminal owner, and judging whether the current user of the terminal is the terminal owner or not; and when the current user of the terminal is not the terminal owner, carrying out safety protection processing on the terminal. The disadvantage of the prior art document 1 is that it is not possible to determine that the user is an emotional abnormality client. The invention can judge whether the emotion of the current user is abnormal (anger, complaint, etc.) while identifying the identity of the user.
Prior art document 2 (CN 109769099B) discloses a method and apparatus for detecting an abnormality of a call person. The method comprises the following steps: when a call starts, the terminal equipment acquires real audio and video data of a call object needing abnormal detection and a corresponding pre-trained multi-stage neural network detection model; in the call progress process, the terminal equipment collects call data according to a preset data collection strategy; for each call object, inputting the currently collected call data and the real audio/video data of the call object into the model of the call object, and determining whether the call object is abnormal or not according to a detection result output by the model; the conversation data comprise image data and/or voice data, and the recognition mode adopted by the model comprises face recognition, voiceprint recognition, limb action recognition and/or lip language recognition. The prior art document 2 has the defects that the abnormal conversation task cannot be identified through dimensionalities such as semantics, emotion and the like, meanwhile, the invention is based on preliminary judgment that the abnormal detection is needed to be retested, and the abnormal detection identification cannot be carried out under large-scale data. The invention can identify abnormal events of each customer in the hall through the dimensionalities of semantics, emotion and the like.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a high-risk user detection method based on voiceprint recognition, which establishes an electric high-risk customer voiceprint library, realizes early warning of the high-risk user at the first time after the user has arrived at a hall through voiceprint comparison, reduces risk of customer complaints, and improves customer service experience.
The invention adopts the following technical scheme.
A high risk user detection method based on voiceprint recognition comprises the following steps:
step 1, collecting voice signals of business users handled by business hall counters, and performing front-end processing on the voice signals to convert the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;
step 2, inputting the voice characteristic vector set obtained in the step 1, and training a GMM model;
step 3, establishing a user voiceprint library according to the GMM model obtained in the step 2 as a voiceprint model, synchronizing high-risk user crowd information in a client view module of the power marketing system, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;
and 4, collecting voice information at the first time after a customer approaches a hall through the intelligent worker, extracting voiceprint identification VID through the step 1 and the step 2, comparing the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and carrying out early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.
Preferably, in step 1, the voice signal collection handled by the business hall counter user business is completed through the intelligent pickup device arranged on the counter.
Preferably, in step 1, the voice signal preprocessing uses interference subtraction to perform noise spectrum filtering on the collected voice signal.
In step 1, extracting characteristic parameters of a voice signal by using an MFCC specifically includes: based on the logarithmic relation between Mel scale and frequency to simulate the non-linear relation between the sound level and the actual frequency, the voice signal of voiceprint time domain is extracted by Mel cepstrum coefficient and converted into the voice characteristic vector set for representing the characteristic of the speaker.
Preferably, step 2 specifically includes:
step 2.1, inputting the voice feature vector set obtained in the step 1, and training by using a global background model to obtain an initialized GMM recognition model;
step 2.2, clustering the GMM model by using a fuzzy K-means method;
and 2.3, performing iterative optimization on the clustering center by using an EM algorithm to obtain a final voice signal model.
Preferably, in step 3, the user voiceprint information is stored in the user voiceprint storage, and the voiceprint information includes: voiceprint identification VID, voiceprint model, voiceprint parameters.
Preferably, a high risk user refers to a customer who is quarrying in a business hall and an agent or complaining about an agent more than twice in a year.
Preferably, in step 3, when the customer handles the business in the hall, the user brushes the identity card on the intelligent pickup device, and the pickup device reads the user identity information and transmits the user identity information to the intelligent voice analysis system of the business hall, and the intelligent voice analysis system of the business hall binds the voiceprint with the user identity.
And the energy Internet marketing service system 360 is connected with a client portrait module, and the voiceprint library information is compared with the high-risk user information to obtain the high-risk voiceprint information by recording the client information with high risk user feature fields in the client portrait.
Preferably, step 4 specifically includes:
step 4.1, collecting voice of clients in a hall through intelligent cards of a hall manager, extracting voice voiceprint features through the method of the step 1, and comparing the extracted voice voiceprint features with voiceprint information in a voiceprint library;
and 4.2, when the extracted voice voiceprint characteristics are the same as the high-risk voiceprint information in the voiceprint library, judging that the user is a high-risk person, and actively initiating high-risk early warning by the intelligent voice analysis system of the business hall.
A voiceprint recognition based high risk user detection system comprising: the device comprises a collection and preprocessing module, a modeling module, a binding module and a detection module, wherein:
the collection and preprocessing module is used for collecting voice signals of business users handled by a business hall counter, performing front-end processing on the voice signals, and converting the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;
the modeling module is used for inputting a voice feature vector set and training a GMM model;
the binding module is used for using the GMM model as a voiceprint model, establishing a user voiceprint library, synchronizing high-risk user crowd information in the power marketing system client view module, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;
the detection module collects voice information of the first time after a customer approaches a hall through the intelligent worker, extracts the voiceprint identification VID, compares the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and performs early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.
A terminal comprising a processor and a storage medium; wherein:
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform the steps of a high risk user detection method based on voiceprint recognition.
A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements the steps of a high risk user detection method based on voiceprint recognition.
Compared with the prior art, the method has the advantages that the voice print library of the power customer is built, the user characteristics are positioned according to the images of the marketing customers, and the high-risk customers are identified and captured through voice prints to perform accurate service. Further reduces the probability of the occurrence of the quarry and the complaint of the business hall, improves the operation and management functions of the intelligent voice technology in the field service of the business hall, and greatly improves the intelligent level of the operation and management of the marketing service.
Drawings
FIG. 1 is a flow chart of a high risk user detection method based on voiceprint recognition of the present invention;
FIG. 2 is a front end processing module flow diagram;
FIG. 3 is a model training flow diagram;
FIG. 4 is a MFCC extraction process;
FIG. 5 is a reasonable combination of front-end characteristic parameters;
fig. 6 is a GMM model parameter estimation flow chart.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described herein are merely some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are within the scope of the present invention.
Example 1.
As shown in fig. 1, embodiment 1 of the present invention provides a high risk user detection method based on voiceprint recognition, which in a preferred but non-limiting embodiment of the present invention comprises the steps of:
step 1, collecting voice signals of business users transacted by a business hall counter, and realizing preliminary processing of the voice signals through a front-end processing module to obtain a voice feature vector set, wherein the processing flow comprises voice signal preprocessing and feature parameter extraction; as shown in fig. 2, the method specifically includes:
step 1.1, voice signal acquisition: the intelligent pickup equipment arranged on the counter is used for completing voice signal collection of business handling of business hall counter users;
step 1.2, preprocessing voice signals: preprocessing the collected voice signals to avoid frequency domain aliasing distortion of the voice signals and facilitate subsequent signal processing;
step 1.3, noise elimination: noise spectrum filtering is carried out on the collected voice signals through interference subtraction, and invalid voice signals are restrained;
step 1.4, extracting characteristic parameters: collecting voice signal characteristic parameters of the voiceprint time domain through Mel cepstrum coefficient (Mel-frequency Cesptrum Coefficient, MFCC), simulating the nonlinear relation between the sound level heard by human ear and the actual frequency based on the logarithmic relation between Mel scale and frequency, extracting voice signal of the voiceprint time domain through Mel cepstrum coefficient, converting into parameter vector set for representing the characteristic of speaker,
the Mel frequency versus actual frequency can be approximated by:
Mel(f)=2595log(1+f/700)
or alternatively
Mel(f)=1127ln(1+f/700)
Where f represents the actual frequency of the speech signal distribution.
The voice signal is generally distributed at 50-4kHz, so that F is generally 4kHz in the application of speaker recognition.
The extraction process of MFCC parameters is shown in fig. 4.
Characteristic parameter combination: the MFCC parameters are taken to be 0-12 th order, or 0-18 th order in speaker identification. After the 0-order MFCC feature extraction is completed, a speech feature vector set is output for feature processing as shown in fig. 5.
Step 2, after the processing process of the voice signal, the voice signal in the time domain is converted into a parameter vector set for representing the characteristics of the speaker. It is then necessary to train the recognition model through these sets of speech feature vectors,
the present embodiment preferably the recognition model is a Gaussian Mixture Model (GMM).
The parameter composition of the GMM model comprises a mean vector, a covariance matrix and a weight matrix.
Step 2.1, training a global background model: as shown in fig. 3, the global background model (universal background model UBM) is trained using the speech data of all speakers to obtain a gaussian mixture model with a high degree of mixture. The global background model is generally trained first, and then each speaker recognition model is obtained in an adaptive manner to improve the algorithm efficiency, and the UBM model contains the commonalities of all speakers.
Step 2.2, initializing a clustering center: the method for initializing the clustering center uses a fuzzy K mean value. The method needs to be described starting from Vector Quantization (VQ), which is subordinate to a template matching model, and the LBG algorithm in VQ is used in initializing the GMM model.
Step 2.3, fast converging the optimal clustering center: the initialization process of the clustering centers is obtained through a fuzzy K-means algorithm, but the centers obtained through the algorithm are likely to be optimal values in a local range to a large extent, so that the obtained GMM models are stable and accurate through adjustment of the clustering centers through an Expected Maximum (EM) algorithm, and the system automatically forms corresponding voiceprint identification VID, voiceprint models, voiceprint parameters and the like after model generation.
GMM model generates lambda feature vector set x t The likelihood of (2) is as follows:
P(X|λ)=∏P(x t |λ)
in the method, in the process of the invention,
λ represents an initialization parameter set of the GMM model;
x t the Mel cepstrum coefficient of each segment of voice frame; wherein, (1) T is less than or equal to T);
λ={ω j ,μ j ,∑ j j=1, 2, …, M. Wherein omega j Weights, mu, for each order of components of the GMM model j Is the mean vector, sigma j Covariance matrix, M is the mixed 0 number of GMM model.
And substituting the Gaussian distribution probability density function into the process of solving the likelihood. P in this formula represents the dimension of the characteristic parameter whenNamely, the following formula is substituted when the matrix is a diagonal matrix.
The parameter estimation process of the Gaussian mixture model can be described as finding a new set of parametersMake-> Then use model parameters +.>The iteration is continued until the convergence condition is satisfied. First, define the Q (λ, λ) function, according to Jesen inequality, the parameter estimation can be converted into a process of maximizing Q (λ, λ).
Due to P (x t ,j|λ)=ω j P(x t Lambda, j) is obtained
Next, to determine the update formula of the model parameters for ωj, μj, Σj by applying the partial derivatives, the function Q (λ, λ) is maximized when the partial derivatives are zero. The EM algorithm is embodied in the above process, including the step E of calculating the intermediate statistic, i.e., the probability P (q t =j|y t Lambda) is expressed as
M steps are finding the satisfactionBy->The function finds zero derivatives of three parameters { ωj, μj, Σj }, and updates the formula as follows:
after parameter estimation is performed on the EM algorithm iterated for 5-10 times, model parameters can basically be converged, and a flow chart for training the GMM model is shown in fig. 6.
Step 3, voiceprint binding: when a user transacts business in a business hall, the intelligent pickup device synchronizes user identity information to an intelligent voice analysis system of the business hall, the system establishes a user voiceprint library based on the identity information and voiceprint information, synchronizes high-risk user crowd information in a client view module of an electric power energy Internet marketing service system, and can calibrate high-risk personnel after synchronization;
step 3.1: voiceprint library building: establishing a user voiceprint store to store user voiceprint information
The voiceprint information includes: voiceprint identification VID, voiceprint model, voiceprint parameters;
wherein, voiceprint identification VID: the unique identification of the voiceprint feature can quickly find the voiceprint parameters of the client through the VID.
Step 3.2: identity binding: when a user transacts business in a business hall, the user brushes an identity card on intelligent pickup equipment (an out-cabinet interaction terminal), the pickup equipment reads user identity information and transmits the user identity information to an intelligent voice analysis system of the business hall, and the system binds voiceprints and user identities;
high risk personnel information synchronization: and the energy Internet marketing service system 360 is connected with a view-customer portrait module, and customer information (name, identity card, mobile phone number, household number and the like) with high risk of user characteristic fields in the customer portrait is recorded, so that voiceprint library information comparison is completed, and high risk voiceprint information is obtained.
High risk users refer to customers who are quarrying in business halls and agents or complaining about agents more than twice in a year.
And 4, collecting voice information at the first time after a customer approaches a hall through the intelligent worker board, extracting voiceprint identification VID through the steps 1 and 2, and carrying out early warning reminding when an abnormal user is detected.
Step 4.1, voice matching: collecting voice of business staff in a hall through a hall manager intelligent work board, extracting voiceprint parameters through the method of the step 1, comparing the extracted voiceprint characteristics with voiceprint information in a voiceprint library, and generating a piece of data if the extracted voiceprint characteristics are the same as the voiceprint information in the voiceprint library;
step 4.2, early warning reminding: the extracted voiceprint features and the voiceprint library of the high-risk user have the same voiceprint information, so that the user can be judged to be a high-risk person, and the system initiatively initiates high-risk early warning.
Example 2.
A voiceprint recognition based high risk user detection system comprising: the device comprises a collection and preprocessing module, a modeling module, a binding module and a detection module, wherein:
the collection and preprocessing module is used for collecting voice signals of business users handled by a business hall counter, performing front-end processing on the voice signals, and converting the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;
the modeling module is used for inputting a voice feature vector set and training a GMM model;
the binding module is used for using the GMM model as a voiceprint model, establishing a user voiceprint library, synchronizing high-risk user crowd information in the power marketing system client view module, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;
the detection module collects voice information of the first time after a customer approaches a hall through the intelligent worker, extracts the voiceprint identification VID, compares the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and performs early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.
Example 3.
Embodiment 3 of the present invention provides a computer-readable storage medium.
A computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a step in a high risk user detection based on voiceprint recognition according to embodiment 1 of the present invention.
The detailed steps are the same as those of the high risk user detection method based on voiceprint recognition provided in embodiment 1, and will not be described here again.
Example 4.
The embodiment 4 of the invention provides electronic equipment.
An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a voiceprint recognition based high risk user detection method according to embodiment 1 of the present invention when the program is executed.
The detailed steps are the same as those of the high risk user detection method based on voiceprint recognition provided in embodiment 1, and will not be described here again.
Compared with the prior art, the method has the advantages that the voice print library of the power customer is built, the user characteristics are positioned according to the images of the marketing customers, and the high-risk customers are identified and captured through voice prints to perform accurate service. Further reduces the probability of the occurrence of the quarry and the complaint of the business hall, improves the operation and management functions of the intelligent voice technology in the field service of the business hall, and greatly improves the intelligent level of the operation and management of the marketing service.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.
Claims (13)
1. The high-risk user detection method based on voiceprint recognition is characterized by comprising the following steps of:
step 1, collecting voice signals of business users handled by business hall counters, and performing front-end processing on the voice signals to convert the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;
step 2, inputting the voice characteristic vector set obtained in the step 1, and training a GMM model;
step 3, establishing a user voiceprint library according to the GMM model obtained in the step 2 as a voiceprint model, synchronizing high-risk user crowd information in a client view module of the power marketing system, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;
and 4, collecting voice information at the first time after a customer approaches a hall through the intelligent worker, extracting voiceprint identification VID through the step 1 and the step 2, comparing the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and carrying out early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.
2. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,
in the step 1, the voice signal collection of business handling of business hall counter users is completed through intelligent pickup equipment arranged on the counter.
3. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,
in step 1, noise spectrum filtering is performed on the collected voice signals by adopting interference subtraction method in voice signal preprocessing.
4. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,
in step 1, extracting characteristic parameters of a voice signal by using an MFCC specifically includes: based on the logarithmic relation between Mel scale and frequency to simulate the non-linear relation between the sound level and the actual frequency, the voice signal of voiceprint time domain is extracted by Mel cepstrum coefficient and converted into the voice characteristic vector set for representing the characteristic of the speaker.
5. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,
the step 2 specifically comprises the following steps:
step 2.1, inputting the voice feature vector set obtained in the step 1, and training by using a global background model to obtain an initialized GMM recognition model;
step 2.2, clustering the GMM model by using a fuzzy K-means method;
and 2.3, performing iterative optimization on the clustering center by using an EM algorithm to obtain a final voice signal model.
6. A high risk user detection method based on voiceprint recognition according to claim 5, wherein,
in step 3, user voiceprint information is stored in the user voiceprint storage, and the voiceprint information comprises: voiceprint identification VID, voiceprint model, voiceprint parameters.
7. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,
high risk users refer to customers who are quarrying in business halls and agents or complaining about agents more than twice in a year.
8. A high risk user detection method based on voiceprint recognition according to claim 1, wherein,
in step 3, when the customer handles business in the business hall, the user brushes an identity card on the intelligent pickup device, the pickup device reads user identity information and transmits the user identity information to the business hall intelligent voice analysis system, and the business hall intelligent voice analysis system binds voiceprints and user identities.
9. The method for high risk user detection based on voiceprint recognition of claim 8,
and the energy Internet marketing service system 360 is connected with a client portrait module, and the voiceprint library information is compared with the high-risk user information to obtain the high-risk voiceprint information by recording the client information with high risk user feature fields in the client portrait.
10. The method for high risk user detection based on voiceprint recognition of claim 9,
the step 4 specifically comprises the following steps:
step 4.1, collecting voice of clients in a hall through intelligent cards of a hall manager, extracting voice voiceprint features through the method of the step 1, and comparing the extracted voice voiceprint features with voiceprint information in a voiceprint library;
and 4.2, when the extracted voice voiceprint characteristics are the same as the high-risk voiceprint information in the voiceprint library, judging that the user is a high-risk person, and actively initiating high-risk early warning by the intelligent voice analysis system of the business hall.
11. A voiceprint recognition based high risk user detection system utilizing the method of any one of claims 1-10, comprising: the device comprises a collection and preprocessing module, a modeling module, a binding module and a detection module, and is characterized in that:
the collection and preprocessing module is used for collecting voice signals of business users handled by a business hall counter, performing front-end processing on the voice signals, and converting the voice signals into a voice feature vector set, wherein the front-end processing comprises voice signal preprocessing and feature parameter extraction;
the modeling module is used for inputting a voice feature vector set and training a GMM model;
the binding module is used for using the GMM model as a voiceprint model, establishing a user voiceprint library, synchronizing high-risk user crowd information in the power marketing system client view module, and carrying out information matching after synchronization is completed, so that one-to-one binding of the client voiceprint model and high-risk users is realized;
the detection module collects voice information of the first time after a customer approaches a hall through the intelligent worker, extracts the voiceprint identification VID, compares the voiceprint identification VID with the voiceprint identification VID in the voiceprint library, and performs early warning if the voiceprint identification VID is the same as the voiceprint identification VID in the voiceprint library.
12. A terminal comprising a processor and a storage medium; the method is characterized in that:
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform the steps of a high risk user detection method based on voiceprint recognition according to any one of claims 1 to 10.
13. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of a voiceprint recognition based high risk user detection method according to any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310057792.5A CN116153319A (en) | 2023-01-13 | 2023-01-13 | High-risk user detection method and system based on voiceprint recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310057792.5A CN116153319A (en) | 2023-01-13 | 2023-01-13 | High-risk user detection method and system based on voiceprint recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116153319A true CN116153319A (en) | 2023-05-23 |
Family
ID=86373003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310057792.5A Pending CN116153319A (en) | 2023-01-13 | 2023-01-13 | High-risk user detection method and system based on voiceprint recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116153319A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117456981A (en) * | 2023-12-25 | 2024-01-26 | 北京秒信科技有限公司 | Real-time voice wind control system based on RNN voice recognition |
-
2023
- 2023-01-13 CN CN202310057792.5A patent/CN116153319A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117456981A (en) * | 2023-12-25 | 2024-01-26 | 北京秒信科技有限公司 | Real-time voice wind control system based on RNN voice recognition |
CN117456981B (en) * | 2023-12-25 | 2024-03-05 | 北京秒信科技有限公司 | Real-time voice wind control system based on RNN voice recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110415687B (en) | Voice processing method, device, medium and electronic equipment | |
CN107818798B (en) | Customer service quality evaluation method, device, equipment and storage medium | |
CN109389971B (en) | Insurance recording quality inspection method, device, equipment and medium based on voice recognition | |
CN107799126B (en) | Voice endpoint detection method and device based on supervised machine learning | |
CN105702263B (en) | Speech playback detection method and device | |
CN109658923B (en) | Speech quality inspection method, equipment, storage medium and device based on artificial intelligence | |
WO2019196196A1 (en) | Whispering voice recovery method, apparatus and device, and readable storage medium | |
US8731936B2 (en) | Energy-efficient unobtrusive identification of a speaker | |
JP4728868B2 (en) | Response evaluation apparatus, method, program, and recording medium | |
RU2373584C2 (en) | Method and device for increasing speech intelligibility using several sensors | |
US20140257820A1 (en) | Method and apparatus for real time emotion detection in audio interactions | |
CN108877823A (en) | Sound enhancement method and device | |
US20170294191A1 (en) | Method for speaker recognition and apparatus for speaker recognition | |
CN111696568B (en) | Semi-supervised transient noise suppression method | |
CN111312286A (en) | Age identification method, age identification device, age identification equipment and computer readable storage medium | |
WO2021217979A1 (en) | Voiceprint recognition method and apparatus, and device and storage medium | |
CN105810205A (en) | Speech processing method and device | |
CN116153319A (en) | High-risk user detection method and system based on voiceprint recognition | |
WO2022083039A1 (en) | Speech processing method, computer storage medium, and electronic device | |
Bagul et al. | Text independent speaker recognition system using GMM | |
CN104157294B (en) | A kind of Robust speech recognition method of market for farm products element information collection | |
CN113516987B (en) | Speaker recognition method, speaker recognition device, storage medium and equipment | |
Perdana et al. | Voice recognition system for user authentication using gaussian mixture model | |
US20040193415A1 (en) | Automated decision making using time-varying stream reliability prediction | |
CN117939238A (en) | Character recognition method, system, computing device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |