CN112738338A - Telephone recognition method, device, equipment and medium based on deep learning - Google Patents

Telephone recognition method, device, equipment and medium based on deep learning Download PDF

Info

Publication number
CN112738338A
CN112738338A CN202011564958.5A CN202011564958A CN112738338A CN 112738338 A CN112738338 A CN 112738338A CN 202011564958 A CN202011564958 A CN 202011564958A CN 112738338 A CN112738338 A CN 112738338A
Authority
CN
China
Prior art keywords
call voice
voice signal
call
network
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011564958.5A
Other languages
Chinese (zh)
Other versions
CN112738338B (en
Inventor
凌波
马骏
王少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011564958.5A priority Critical patent/CN112738338B/en
Publication of CN112738338A publication Critical patent/CN112738338A/en
Application granted granted Critical
Publication of CN112738338B publication Critical patent/CN112738338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Discrete Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a telephone identification method based on deep learning, which comprises the following steps: collecting a call voice signal of a client; extracting the characteristics of the call voice signal; inputting the characteristics of the call voice signals into a voice classification model, and obtaining the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraudulent calls. The invention also relates to a telephone recognition device, an electronic device and a medium based on deep learning. The invention improves the accuracy of capturing effective characteristics and improves the recognition rate of crank calls and fraud calls.

Description

Telephone recognition method, device, equipment and medium based on deep learning
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a telephone identification method and device based on deep learning, electronic equipment and a computer readable storage medium.
Background
With the rapid development of mobile communication services in the world, users enjoy the convenient services of the mobile network and simultaneously have problems, including that a large part of users often suffer from harassment and fraud of bad, illegal calls or information, and the problems become important problems in the industry.
For the harassment prevention and fraud prevention of the telephone of the mobile client, the manual labeling method is adopted in the industry at present: when the receiving personnel marks that the dial-in telephone is a harassing telephone, the dial-in number is taken as a possibly risky number to be put into a cloud database, and when the number is marked for more than a certain number of times, the number is taken as a blacklist number to be processed. In recent years, there has been a case where suspicious numbers are identified by using characteristics between data by combining a processing means such as a bayesian algorithm with a data mining technique.
At present, the identification and interception of harassment and fraud calls are still in the research stage, but the analysis of the existing identification and interception technology still has some defects and problems: 1) a large amount of manual marking is needed for marking numbers included in a database to carry out the work, 2) the recognition rate of the traditional anti-harassment technology to harassing calls is low, and effective recognition cannot be carried out on voice calls with possible fraud suspicions, and 3) the functions of voice data without considering a large data set on harassment and anti-fraud recognition are realized.
Disclosure of Invention
The invention provides a telephone recognition method, a device, electronic equipment and a computer readable storage medium based on deep learning, and mainly aims to improve the accuracy of capturing effective features, enable voice recognition to have better accuracy and improve the recognition rate of harassing calls and fraud calls.
In order to achieve the above object, the present invention provides a phone identification method based on deep learning, which includes:
collecting a call voice signal of a client;
extracting the characteristics of the call voice signal;
inputting the characteristics of the call voice signals into a voice classification model to obtain the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraud calls;
wherein the step of extracting the feature of the call voice signal comprises:
extracting PLP features in the call voice signal by using openSMILE;
utilizing script to call and extract a config file corresponding to the PLP characteristics to generate PLP characteristic data corresponding to a call voice signal;
and performing feature re-extraction on the PLP feature data by using a Faster RCNN network to obtain the features of the call voice signal.
Optionally, the step of extracting PLP features in the call voice signal by using openSMILE includes:
after sampling, windowing and discrete Fourier transform, the call voice signal is subjected to the square sum of the real part and the imaginary part of the short-time voice frequency spectrum to obtain a short-time power spectrum,
P(f)=Rx[X(f)]2+Im[X(f)]2
wherein X (f) is the short-time frequency spectrum of the call voice signal, f is the frequency axis of the short-time frequency spectrum of the call voice signal, Rx[X(f)]2Is the real part of the short-time spectrum, I, of a speech signal of a callm[X(f)]2Is the imaginary part of the short-time spectrum of the call speech signal, and p (f) is the short-time power spectrum of the call speech signal;
performing critical frequency band analysis on the short-time power spectrum of the call voice signal to obtain a plurality of critical bandwidth auditory spectrums theta (k) of the call voice signal;
equal loudness pre-emphasis is performed on a plurality of critical bandwidth auditory spectra theta (k) by the following formula,
Γ(k)=E[f0(k)]θ(k),(k=1,2,...,17)
wherein, gamma (k) is the auditory spectrum after equal loudness pre-emphasis, f0(k) Denotes the frequency corresponding to the center frequency of the k-th critical bandwidth auditory spectrum, Ef0(k)]Representing the frequency f0(k) The corresponding equal loudness curve is obtained by the following formula:
Figure BDA0002861551760000021
intensity-loudness conversion of multiple critical-bandwidth auditory spectra after equal loudness pre-emphasis by
φ(k)=Γ(k)0.33
Wherein phi (k) is a plurality of critical bandwidth auditory spectra after intensity-loudness conversion;
and (3) carrying out inverse Fourier transform on the plurality of threshold bandwidth auditory spectrums theta (k) subjected to intensity-loudness conversion to obtain a call voice signal subjected to inverse Fourier transform, calculating an all-pole model, and solving cepstrum coefficients of the call voice signal to obtain PLP characteristics.
Optionally, the step of performing critical band analysis on the short-time power spectrum of the call voice signal includes:
the short-time power spectrum of the call voice signal is subjected to critical band analysis by the following formula,
Figure BDA0002861551760000022
wherein Z (f) is Bark domain frequency;
mapping the frequency axis f of the frequency spectrum P (f) to Bark frequency Z to obtain 17 frequency bands, multiplying and summing the energy spectrum of each frequency band by a weighting coefficient to obtain a critical bandwidth auditory spectrum theta (k),
Figure BDA0002861551760000023
Figure BDA0002861551760000031
wherein Z is0(k) Representing the center frequency of the k-th critical bandwidth auditory spectrum, # (#) (Z-Z)0(k) A weighting coefficient corresponding to each frequency band, and P (f (z)) an energy spectrum corresponding to each frequency band.
Alternatively,
the construction steps of the Faster RCNN network comprise:
constructing a fast RCNN network through a convolutional layer, an RNP network, a comprehensive convolutional layer and a full-link layer;
extracting a feature map of the voice features through the convolutional layer;
generating a candidate region by the RNP network;
judging the type of the anchor frame by utilizing softmax, and obtaining a candidate area by correcting the anchor frame;
obtaining a candidate region through the feature map extracted by the comprehensive convolutional layer and the RNP network, and extracting a plurality of candidate feature maps;
and synthesizing a plurality of candidate feature maps through the full connection layer.
Optionally, the speech classification model is a Transformer network.
Optionally, the step of constructing the Transformer network comprises
Constructing a Transformer network through an encoder and a decoder;
coding the characteristics of the call voice signal extracted by the Faster RCNN network through the coder to obtain a context semantic vector;
and performing data decoding on the obtained context semantic vector through the decoder, and obtaining classification categories through a layer of softmax.
Optionally, the method further comprises: and combining the fast RCNN and the Transformer network into a voice type recognition network, and uploading the voice type recognition network to the cloud.
In order to solve the above problem, the present invention further provides a phone recognition apparatus based on deep learning, the apparatus comprising:
the acquisition module is used for acquiring a call voice signal of the client;
the feature extraction module is used for extracting the features of the call voice signals collected by the collection module;
the classification module is used for constructing a voice classification model, inputting the characteristics of the call voice signals extracted by the characteristic extraction module into the voice classification model and obtaining the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraud calls;
wherein the feature extraction module comprises:
the first feature extraction submodule extracts PLP features in the call voice signal by using openSMILE;
the characteristic data generation submodule is used for calling a config file corresponding to the PLP characteristic extracted by the first characteristic extraction submodule by using a script to generate PLP characteristic data corresponding to the call voice signal;
and the second feature extraction submodule performs feature re-extraction on the PLP feature data generated by the feature data generation submodule by using a fast RCNN network.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the telephone identification method based on deep learning.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, where the at least one instruction is executed by a processor in an electronic device to implement the deep learning based phone recognition method described above.
The phone recognition method, the device, the electronic equipment and the computer readable storage medium based on deep learning extract the PLP characteristics to generate PLP data, then utilize the fast RCNN network to carry out secondary characteristic extraction, abandon some useless characteristic data through the secondary characteristic extraction, improve the accuracy of capturing effective characteristics and enable voice recognition to have better accuracy.
Drawings
Fig. 1 is a schematic flowchart of a deep learning-based phone recognition method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for extracting features of a call voice signal according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for extracting PLP features from a call voice signal by using openSMILE according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating a method for performing critical band analysis on a short-time power spectrum of a call voice signal according to an embodiment of the present invention;
FIG. 5 is a block diagram of a deep learning based phone identification apparatus according to an embodiment of the present invention;
fig. 6 is a schematic internal structural diagram of an electronic device implementing a deep learning-based phone recognition method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a telephone identification method based on deep learning. Referring to fig. 1, a flowchart of a deep learning-based phone recognition method according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the phone recognition method based on deep learning includes:
step S100: collecting a call voice signal of a client;
step S200: extracting the feature of the call voice signal, as shown in fig. 2, includes:
step S210, extracting PLP (perceptual Linear prediction coefficient) features in the call voice signal by using openSMILE;
step S220, utilizing script to call and extract a config file corresponding to the PLP characteristics to generate PLP characteristic data corresponding to the call voice signal, and preferably utilizing a CSV file to store data;
and step S230, performing feature re-extraction on the PLP feature data by using the Faster RCNN.
Step S300: inputting the characteristics of the call voice signals into a voice classification model, and obtaining the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraudulent calls.
In one embodiment, as shown in fig. 3, step S210 includes: obtaining a frequency spectrum of a voice signal through Fourier transform, squaring amplitude, performing critical band integration, performing equal loudness pre-emphasis and intensity loudness transformation, and performing inverse Fourier transform and linear prediction to obtain PLP characteristics, specifically comprising:
step S211, performing spectrum analysis on the call voice signal, that is, performing fourier transform on the call voice signal to obtain a spectrum, and then squaring the amplitude, specifically: after sampling, windowing and discrete Fourier transform, the call voice signal is subjected to the square sum of the real part and the imaginary part of the short-time voice frequency spectrum to obtain a short-time power spectrum,
P(f)=Rx[X(f)]2+Im[X(f)]2 (1)
wherein X (f) is the short-time frequency spectrum of the call voice signal, f is the frequency axis of the short-time frequency spectrum of the call voice signal, Rx[X(f)]2Is the real part of the short-time spectrum, I, of a speech signal of a callm[X(f)]2Is the imaginary part of the short-time spectrum of the call speech signal, and p (f) is the short-time power spectrum of the call speech signal.
Step S212, performing critical frequency band analysis on the short-time power spectrum of the call voice signal to obtain a plurality of critical bandwidth auditory spectrums theta (k) of the call voice signal, wherein the division of the critical frequency bands reflects the masking effect of human auditory sense,
step S213, equal loudness pre-emphasis is performed on a plurality of critical bandwidth auditory spectra θ (k) by the following formula (5), preferably, equal loudness curve emphasis is performed on θ (k) using a simulated human ear equal loudness curve e (f) (about 40dB),
Γ(k)=E[f0(k)]θ(k),(k=1,2,...,17) (5)
wherein, gamma (k) is the auditory spectrum after equal loudness pre-emphasis, f0(k) Denotes the frequency corresponding to the center frequency of the k-th critical bandwidth auditory spectrum, Ef0(k)]Representing the frequency f0(k) The corresponding equal loudness curve is obtained by the following formula (6):
Figure BDA0002861551760000051
step S214, the intensity-loudness conversion is carried out on a plurality of critical bandwidth auditory spectrums theta (k) after equal loudness pre-emphasis through the following formula (7), and the nonlinear relation between the intensity of the analog sound and the loudness felt by human ears is approximate
φ(k)=Г(k)0.33 (7)
Where phi (k) is a number of critical bandwidth auditory spectra after intensity-loudness conversion.
Step S215, the plurality of threshold bandwidth auditory spectrums theta (k) after the intensity-loudness conversion are subjected to Fourier inverse transformation, the call voice signals after the Fourier inverse transformation are obtained, an all-pole model is calculated, cepstrum coefficients of the call voice signals are solved, and PLP characteristics are obtained.
In one embodiment, step S212 includes:
critical band analysis of the short-time power spectrum of a speech signal of a call by the following equation (2)
Figure BDA0002861551760000052
Wherein Z (f) is Bark domain frequency.
Mapping the frequency axis f of the spectrum P (f) to the Bark frequency Z to obtain 17 bands (Bark domain has 24 bands, range from 20 to 15500Hz, 17 bands can be obtained through the previous processing step), multiplying the energy spectrum in each of the 17 bands by the weighting coefficient of formula (3) to obtain the critical bandwidth auditory spectrum theta (k) after summation
Figure BDA0002861551760000061
Figure BDA0002861551760000062
Wherein Z is0(k) Representing the center frequency of the k-th critical bandwidth auditory spectrum, # (#) (Z-Z)0(k) A weighting coefficient corresponding to each frequency band, and P (f (z)) an energy spectrum corresponding to each frequency band.
In one embodiment, in step S230, the fast RCNN network includes Conv Layers, RegionProposalNetwork, roiploling, and Classification, wherein Conv Layers: extracting feature maps (feature maps) of voice features by using conv, relu and pooling basic cnn network layers; RegionProposalNetwork: the network is used for generating regionproposals, judging the types of anchors by utilizing softmax, and then obtaining accurate regionposals by correcting the anchors; RoiPooling: and collecting the input featurewebps and regionproplases, and synthesizing the two parts of information to extract the provalsfeaturewebps. That is, the step of constructing the Faster RCNN network includes:
constructing a fast RCNN network through a convolutional layer, an RNP network, a comprehensive convolutional layer and a full-link layer;
extracting a feature map of the voice features through the convolutional layer;
generating a candidate region by the RNP network;
judging the type of the anchor frame by utilizing softmax, and obtaining a candidate area by correcting the anchor frame;
obtaining a candidate region through the feature map extracted by the comprehensive convolutional layer and the RNP network, and extracting a plurality of candidate feature maps;
synthesizing a plurality of candidate feature maps through the full connectivity layer
In one embodiment, step S300 includes:
the voice classification model is a Transrormer network, that is, a Transformer network is adopted to classify the call voice data.
The transform network is composed of two parts, namely an encoder and a decoder, wherein the Encoders in the transform network firstly encode the voice data features extracted by the fast RCNN network to obtain a context semantic vector (context), and the context semantic vector obtained by the Encoders in the transform network decodes the data by using the context semantic vector, and finally obtains a classification category through a layer of softmax. That is, the step of constructing the Transformer network comprises
Constructing a Transformer network through an encoder and a decoder;
coding the characteristics of the call voice signal extracted by the Faster RCNN network through the coder to obtain a context semantic vector;
performing data decoding on the obtained context semantic vector through the decoder, and obtaining classification category through a layer of softmax
In one embodiment, both parts of Decoders have positionencoding added to them at the same time. The Positionalencoding formula is:
Figure BDA0002861551760000071
Figure BDA0002861551760000072
pos represents the position of the voice content after the feature of the voice data is extracted by the fast RCNN, i represents the dimension of the voice content, and dmodelPE is positionalencoding, which is the dimension of the model.
In one embodiment, step S300 further comprises:
the speech classification model is trained.
In one embodiment, the deep learning based phone recognition method further comprises:
dialing and information interception of blacklist numbers comprise: if the voice recognition result of the newly dialed number is recognized as abnormal call (harassment, fraud), the user is prompted to carry out blacklist inclusion processing, the number included in the blacklist is intercepted in the subsequent dialing process, the number is also shared in the blacklist of the mobile phone system, and the information transmission of the number is also intercepted.
In one embodiment, the deep learning based phone recognition method further comprises:
and feeding back the recognition result of the call voice data to the client.
The telephone recognition method based on deep learning realizes the recognition and interception of harassment prevention and fraud prevention at the telephone terminal based on deep learning, can analyze the dialing voice accessed by users in real life, and can effectively recognize harassment prevention and fraud prevention for numbers with high risk, and the users can directly pull the dialed number into a blacklist by using the system according to the risk prompt without additional processing by the users. In addition, according to the shared blacklist, the junk information of the identified number can be intercepted. Interception processing of dialing and message sending effectively helps a user to filter blacklist numbers, on one hand, the user does not need to answer processed harassment dialing, on the other hand, the frequency of telephone fraud cases is reduced, and the loss of lives and properties of the user is effectively reduced.
In one embodiment, in order to improve recognition efficiency and reduce energy consumption, the fast RCNN and the transform network are combined into a voice category recognition network, the voice category recognition network is uploaded to a cloud, feature analysis recognition is performed on call voice which has been uploaded by a client, if the recognition result is a normal call, the call recognition process is ended, if the recognition result is an abnormal call, the call recognition result is pushed to a user and suggested to be included in a blacklist for processing, and after the user confirms, the confirmation result is returned to the cloud again, specifically, as shown in fig. 4, the phone recognition method based on deep learning includes:
collecting a voice call signal of a client;
carrying out voice audio feature extraction on the voice call signal;
constructing a voice category identification network, and uploading the voice category identification network to a cloud, wherein the voice category identification network comprises a Faster RCNN and a Transformer network;
recognizing voice audio features through a voice category recognition network;
if the voice audio features are normal calls, the call recognition process is ended;
if the voice audio feature is abnormal voice call, judging whether the voice audio feature harasses the call or frauds the call;
if the voice audio features are harassing calls or fraudulent calls, sending an early warning signal to the client;
obtaining a feedback identification result of the client, and if the feedback identification result is a crank call or a fraud call, putting the crank call or the fraud call into a blacklist;
and if the feedback identification result is not a harassing call or a fraudulent call, ending the identification process.
Preferably, the deep learning based phone recognition method further includes:
the method comprises the steps of collecting voice call signals of a client, sending an instruction whether to upload a voice file to a cloud end, uploading the voice file to the cloud end when the uploading instruction of the client is received, and identifying the voice file through a voice category identification network.
Preferably, the step of sending the warning signal to the client includes:
and intercepting the data to the client by dialing or sending information.
In one embodiment, the deep learning based phone recognition method further comprises
A database for storing the call voice signals recognized as abnormal calls by the voice category recognition network and fed back as normal calls by the client, and also storing other call voice signals
According to the phone recognition method based on deep learning, a deep learning technology is adopted, a user feeds back a recognition result to the cloud end, marked voice data required by training of a voice category recognition network can be added, and the voice category recognition network can improve the voice recognition rate due to the increase of the voice data volume and the adjustment of a network model. Therefore, with the progress of the voice recognition process, the improvement of the model recognition rate can be seen through continuously collecting available harassment-type voice calls, fraud-type voice calls and voice calls in normal scenes.
Fig. 5 is a functional block diagram of the telephone recognition device based on deep learning according to the present invention. The phone recognition apparatus 100 based on deep learning according to the present invention can be installed in an electronic device. According to the implemented functions, the deep learning based phone recognition apparatus may include a number acquisition module 110, a feature extraction module 120, and a classification module 130. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the number acquisition module 110 acquires a call voice signal of the client;
the feature extraction module 120 extracts features of the call voice signal acquired by the acquisition module;
the classification module 130 constructs a voice classification model, inputs the features of the call voice signals extracted by the feature extraction module into the voice classification model, and obtains the classification of the call voice signals, wherein the classification comprises normal calls, nuisance calls and fraud calls;
wherein the feature extraction module 120 comprises:
the first feature extraction submodule 121 extracts PLP features in the call voice signal by using openSMILE;
a feature data generation sub-module 122, which uses the script to call the config file corresponding to the PLP feature extracted by the first feature extraction sub-module to generate PLP feature data corresponding to the call voice signal;
the second feature extraction sub-module 123 performs feature re-extraction on the PLP feature data generated by the feature data generation sub-module by using the fast RCNN network.
In one embodiment, the first feature extraction submodule 121 includes:
a short-time power spectrum obtaining unit, which obtains a short-time power spectrum by sampling, windowing and discrete Fourier transform of a call voice signal and taking the square sum of the real part and the imaginary part of the short-time voice spectrum,
P(f)=Rx[X(f)]2+Im[X(f)]2
wherein X (f) is the short-time frequency spectrum of the call voice signal, f is the frequency axis of the short-time frequency spectrum of the call voice signal, Rx[X(f)]2Is the real part of the short-time spectrum, I, of a speech signal of a callm[X(f)]2Is the imaginary part of the short-time spectrum of the call speech signal, and p (f) is the short-time power spectrum of the call speech signal;
a critical frequency band analysis unit for performing critical frequency band analysis on the short-time power spectrum of the call voice signal to obtain a plurality of critical bandwidth auditory spectrums theta (k) of the call voice signal;
and an equal loudness pre-emphasis unit which performs equal loudness pre-emphasis on the plurality of critical bandwidth auditory spectra θ (k) by the following formula.
Г(k)=E[f0(k)]θ(k),(k=1,2,...,17)
Wherein, f is the auditory spectrum after equal loudness pre-emphasis, f0(k) Denotes the frequency corresponding to the center frequency of the k-th critical bandwidth auditory spectrum, Ef0(k)]Representing the frequency f0(k) The corresponding equal loudness curve is obtained by the following formula:
Figure BDA0002861551760000091
an intensity-loudness conversion unit for performing intensity-loudness conversion on the plurality of critical-bandwidth auditory spectra after equal-loudness pre-emphasis by the following formula
φ(k)=Г(k)0.33
Wherein phi (k) is a plurality of critical bandwidth auditory spectra after intensity-loudness conversion;
and the characteristic obtaining unit is used for obtaining a call voice signal after Fourier inverse transformation by carrying out Fourier inverse transformation on the plurality of threshold bandwidth auditory spectrums theta (k) subjected to intensity-loudness conversion, calculating an all-pole model, solving a cepstrum coefficient of the call voice signal and obtaining PLP characteristics.
Preferably, the critical band analyzing unit includes:
a band analysis subunit for performing critical band analysis on the short-time power spectrum of the call voice signal by the following formula,
Figure BDA0002861551760000092
wherein Z (f) is Bark domain frequency;
a critical bandwidth auditory spectrum obtaining subunit, which maps the frequency axis f of the frequency spectrum P (f) to Bark frequency Z to obtain 17 frequency bands, and obtains a critical bandwidth auditory spectrum theta (k) after the energy spectrum of each frequency band is multiplied and summed by a weighting coefficient,
Figure BDA0002861551760000101
Figure BDA0002861551760000102
wherein Z is0(k) Representing the center frequency of the k-th critical bandwidth auditory spectrum, # (#) (Z-Z)0(k) A weighting coefficient corresponding to each frequency band, and P (f (z)) an energy spectrum corresponding to each frequency band
In one embodiment, the Faster RCNN network includes a convolutional layer, an RNP network, a comprehensive convolutional layer, and a fully-connected layer, the convolutional layer extracting a feature map of speech features; the RNP is used for generating a candidate region, judging the type of an anchor frame by utilizing softmax, and obtaining the candidate region by correcting the anchor frame; the characteristic graph extracted by the comprehensive convolutional layer and the RNP network obtain a candidate region, and a plurality of candidate characteristic graphs are extracted; the full connectivity layer integrates a plurality of candidate feature maps.
In one embodiment, the speech classification model is a Transformer network, the Transformer network includes an encoder and a decoder, the encoder encodes the features of the speech signal extracted by the fast RCNN network to obtain a context semantic vector, the encoder performs data decoding on the obtained context semantic vector, and a classification category is obtained through a layer of softmax.
Fig. 6 is a schematic structural diagram of an electronic device implementing a deep learning-based phone recognition method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a deep learning based phone identification program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a phone recognition program based on deep learning, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., a phone recognition program based on deep learning, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 6 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The deep learning based phone recognition program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
collecting a call voice signal of a client;
extracting the characteristics of the call voice signal;
establishing a voice classification model, inputting the characteristics of the call voice signals into the voice classification model, and obtaining the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraud calls;
wherein the step of extracting the feature of the call voice signal comprises:
extracting PLP features in the call voice signal by using openSMILE;
calling and extracting a config file corresponding to the PLP characteristics by using a script to generate PLP characteristic data corresponding to the call voice signal;
and (5) performing feature re-extraction on the PLP feature data by using a fast RCNN network.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again. It should be emphasized that, in order to further ensure the privacy and security of the data to be audited, the audit data may also be stored in a node of a block chain.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium includes a computer program, where the computer program is executed by a processor, and the computer program implements the following operations:
collecting a call voice signal of a client;
extracting the characteristics of the call voice signal;
establishing a voice classification model, inputting the characteristics of the call voice signals into the voice classification model, and obtaining the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraud calls;
wherein the step of extracting the feature of the call voice signal comprises:
extracting PLP features in the call voice signal by using openSMILE;
calling and extracting a config file corresponding to the PLP characteristics by using a script to generate PLP characteristic data corresponding to the call voice signal;
and (5) performing feature re-extraction on the PLP feature data by using a fast RCNN network.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned phone recognition method, device and electronic device based on deep learning, and will not be described herein again.
The telephone recognition method, the telephone recognition device, the electronic equipment and the medium based on deep learning adopt the deep learning technology, better utilize a large amount of voice data and the correlation among the data, and improve the recognition rate of the conversation voice category. A novel network structure Transformer network is adopted in the voice category identification network model, and the model updating speed and the identification efficiency can be accelerated by utilizing the GPU to accelerate calculation. The conversation voice data is utilized again, the voice data characteristics are utilized to train the voice category recognition network model, and the early warning function can be achieved after the conversation voice content of the user is immediately analyzed and classified. Certain recognition rate can be achieved in the aspect of harassment prevention, the dialing frequency of harassment calls is greatly reduced, and in addition, the blacklist processing is assisted for users. The system can give an early warning in the aspect of fraud prevention, reduces the crime rate of telephone fraud, and particularly protects young people and old people with low precaution consciousness.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for phone recognition based on deep learning, the method comprising:
collecting a call voice signal of a client;
extracting the characteristics of the call voice signal;
inputting the characteristics of the call voice signals into a voice classification model to obtain the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraud calls;
wherein the step of extracting the feature of the call voice signal comprises:
extracting PLP features in the call voice signal by using openSMILE;
utilizing script to call and extract a config file corresponding to the PLP characteristics to generate PLP characteristic data corresponding to a call voice signal;
and performing feature re-extraction on the PLP feature data by using a Faster RCNN network to obtain the features of the call voice signal.
2. The method for phone recognition based on deep learning of claim 1, wherein the step of extracting PLP features in the call voice signal by openSMILE comprises:
after sampling, windowing and discrete Fourier transform, the call voice signal is subjected to the square sum of the real part and the imaginary part of the short-time voice frequency spectrum to obtain a short-time power spectrum,
P(f)=Rx[X(f)]2+Im[X(f)]2
wherein X (f) is the short-time frequency spectrum of the call voice signal, f is the frequency axis of the short-time frequency spectrum of the call voice signal, Rx[X(f)]2Is the real part of the short-time spectrum, I, of a speech signal of a callm[X(f)]2Is the imaginary part of the short-time spectrum of the call speech signal, and p (f) is the short-time power spectrum of the call speech signal;
performing critical frequency band analysis on the short-time power spectrum of the call voice signal to obtain a plurality of critical bandwidth auditory spectrums theta (k) of the call voice signal;
equal loudness pre-emphasis is performed on a plurality of critical bandwidth auditory spectra theta (k) by the following formula,
Γ(k)=E[f0(k)]θ(k),(k=1,2,…,17)
wherein, gamma (k) is the auditory spectrum after equal loudness pre-emphasis, f0(k) Denotes the frequency corresponding to the center frequency of the k-th critical bandwidth auditory spectrum, Ef0(k)]Representing the frequency f0(k) The corresponding equal loudness curve is obtained by the following formula:
Figure FDA0002861551750000011
intensity-loudness conversion of multiple critical-bandwidth auditory spectra after equal loudness pre-emphasis by
φ(k)=Γ(k)0.33
Wherein phi (k) is a plurality of critical bandwidth auditory spectra after intensity-loudness conversion;
and (3) carrying out inverse Fourier transform on the plurality of threshold bandwidth auditory spectrums theta (k) subjected to intensity-loudness conversion to obtain a call voice signal subjected to inverse Fourier transform, calculating an all-pole model, and solving cepstrum coefficients of the call voice signal to obtain PLP characteristics.
3. The deep learning-based phone recognition method of claim 2, wherein the step of performing critical band analysis on the short-time power spectrum of the call voice signal comprises:
the short-time power spectrum of the call voice signal is subjected to critical band analysis by the following formula,
Figure FDA0002861551750000021
wherein Z (f) is Bark domain frequency;
mapping the frequency axis f of the short-time power spectrum P (f) to Bark frequency Z to obtain 17 frequency bands, multiplying the energy spectrum of each frequency band by a weighting coefficient to obtain a critical bandwidth auditory spectrum theta (k),
Figure FDA0002861551750000022
Figure FDA0002861551750000023
wherein Z is0(k) Representing the center frequency of the k-th critical bandwidth auditory spectrum, # (#) (Z-Z)0(k) A weighting coefficient corresponding to each frequency band, and P (f (z)) an energy spectrum corresponding to each frequency band.
4. The deep learning-based phone recognition method of claim 1, wherein the fast RCNN network is constructed by:
constructing a fast RCNN network through a convolutional layer, an RNP network, a comprehensive convolutional layer and a full-link layer;
extracting a feature map of the voice features through the convolutional layer;
generating a candidate region by the RNP network;
judging the type of the anchor frame by utilizing softmax, and obtaining a candidate area by correcting the anchor frame;
obtaining a candidate region through the feature map extracted by the comprehensive convolutional layer and the RNP network, and extracting a plurality of candidate feature maps;
and synthesizing a plurality of candidate feature maps through the full connection layer.
5. The method of claim 1, wherein the speech classification model is a Transformer network.
6. The deep learning-based phone recognition method of claim 5, wherein the transform network is constructed by
Constructing a Transformer network through an encoder and a decoder;
coding the characteristics of the call voice signal extracted by the Faster RCNN network through the coder to obtain a context semantic vector;
and performing data decoding on the obtained context semantic vector through the decoder, and obtaining classification categories through a layer of softmax.
7. The deep learning based phone recognition method of claim 5, further comprising: and combining the fast RCNN and the Transformer network into a voice type recognition network, and uploading the voice type recognition network to the cloud.
8. An apparatus for phone recognition based on deep learning, the apparatus comprising:
the acquisition module is used for acquiring a call voice signal of the client;
the feature extraction module is used for extracting the features of the call voice signals collected by the collection module;
the classification module is used for constructing a voice classification model, inputting the characteristics of the call voice signals extracted by the characteristic extraction module into the voice classification model and obtaining the classification of the call voice signals, wherein the classification comprises normal calls, harassing calls and fraud calls;
wherein the feature extraction module comprises:
the first feature extraction submodule extracts PLP features in the call voice signal by using openSMILE;
the characteristic data generation submodule is used for calling a config file corresponding to the PLP characteristic extracted by the first characteristic extraction submodule by using a script to generate PLP characteristic data corresponding to the call voice signal;
and the second feature extraction submodule performs feature re-extraction on the PLP feature data generated by the feature data generation submodule by using a fast RCNN network.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a deep learning based phone recognition method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the deep learning based phone recognition method according to any one of claims 1 to 7.
CN202011564958.5A 2020-12-25 2020-12-25 Telephone recognition method, device, equipment and medium based on deep learning Active CN112738338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011564958.5A CN112738338B (en) 2020-12-25 2020-12-25 Telephone recognition method, device, equipment and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011564958.5A CN112738338B (en) 2020-12-25 2020-12-25 Telephone recognition method, device, equipment and medium based on deep learning

Publications (2)

Publication Number Publication Date
CN112738338A true CN112738338A (en) 2021-04-30
CN112738338B CN112738338B (en) 2022-10-14

Family

ID=75616376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011564958.5A Active CN112738338B (en) 2020-12-25 2020-12-25 Telephone recognition method, device, equipment and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN112738338B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449106A (en) * 2022-02-10 2022-05-06 恒安嘉新(北京)科技股份公司 Abnormal telephone number identification method, device, equipment and storage medium
CN115334509A (en) * 2022-06-18 2022-11-11 阮荣军 Conversation wind control system applying big data service
CN117555916A (en) * 2023-11-06 2024-02-13 广东电网有限责任公司佛山供电局 Voice interaction method and system based on natural language processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120116763A1 (en) * 2009-07-16 2012-05-10 Nec Corporation Voice data analyzing device, voice data analyzing method, and voice data analyzing program
CN102982801A (en) * 2012-11-12 2013-03-20 中国科学院自动化研究所 Phonetic feature extracting method for robust voice recognition
CN109525700A (en) * 2018-12-25 2019-03-26 出门问问信息科技有限公司 Incoming call recognition methods, device, computer equipment and readable storage medium storing program for executing
CN111222025A (en) * 2019-12-27 2020-06-02 南京中新赛克科技有限责任公司 Fraud number identification method and system based on convolutional neural network
CN111970400A (en) * 2019-05-20 2020-11-20 中国移动通信集团陕西有限公司 Crank call identification method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120116763A1 (en) * 2009-07-16 2012-05-10 Nec Corporation Voice data analyzing device, voice data analyzing method, and voice data analyzing program
CN102982801A (en) * 2012-11-12 2013-03-20 中国科学院自动化研究所 Phonetic feature extracting method for robust voice recognition
CN109525700A (en) * 2018-12-25 2019-03-26 出门问问信息科技有限公司 Incoming call recognition methods, device, computer equipment and readable storage medium storing program for executing
CN111970400A (en) * 2019-05-20 2020-11-20 中国移动通信集团陕西有限公司 Crank call identification method and device
CN111222025A (en) * 2019-12-27 2020-06-02 南京中新赛克科技有限责任公司 Fraud number identification method and system based on convolutional neural network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449106A (en) * 2022-02-10 2022-05-06 恒安嘉新(北京)科技股份公司 Abnormal telephone number identification method, device, equipment and storage medium
CN114449106B (en) * 2022-02-10 2024-04-30 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for identifying abnormal telephone number
CN115334509A (en) * 2022-06-18 2022-11-11 阮荣军 Conversation wind control system applying big data service
CN115334509B (en) * 2022-06-18 2023-10-31 义乌中国小商品城大数据有限公司 Communication wind control system applying big data service
CN117555916A (en) * 2023-11-06 2024-02-13 广东电网有限责任公司佛山供电局 Voice interaction method and system based on natural language processing
CN117555916B (en) * 2023-11-06 2024-05-31 广东电网有限责任公司佛山供电局 Voice interaction method and system based on natural language processing

Also Published As

Publication number Publication date
CN112738338B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN112738338B (en) Telephone recognition method, device, equipment and medium based on deep learning
CN110910901B (en) Emotion recognition method and device, electronic equipment and readable storage medium
CN106683680A (en) Speaker recognition method and device and computer equipment and computer readable media
CN107103903A (en) Acoustic training model method, device and storage medium based on artificial intelligence
CN110047510A (en) Audio identification methods, device, computer equipment and storage medium
CN109726372B (en) Method and device for generating work order based on call records and computer readable medium
CN113488024B (en) Telephone interrupt recognition method and system based on semantic recognition
CN113903363A (en) Violation detection method, device, equipment and medium based on artificial intelligence
CN114338623B (en) Audio processing method, device, equipment and medium
CN107492153A (en) Attendance checking system, method, work attendance server and attendance record terminal
CN111276119A (en) Voice generation method and system and computer equipment
CN113191787A (en) Telecommunication data processing method, device electronic equipment and storage medium
CN113707173A (en) Voice separation method, device and equipment based on audio segmentation and storage medium
CN114155832A (en) Speech recognition method, device, equipment and medium based on deep learning
CN109545226A (en) A kind of audio recognition method, equipment and computer readable storage medium
CN113539243A (en) Training method of voice classification model, voice classification method and related device
CN111552832A (en) Risk user identification method and device based on voiceprint features and associated map data
CN110556114A (en) Speaker identification method and device based on attention mechanism
CN116092503A (en) Fake voice detection method, device, equipment and medium combining time domain and frequency domain
CN116072119A (en) Voice control system, method, electronic equipment and medium for emergency command
CN108010533A (en) The automatic identifying method and device of voice data code check
CN113555003B (en) Speech synthesis method, device, electronic equipment and storage medium
CN111985231B (en) Unsupervised role recognition method and device, electronic equipment and storage medium
CN114842880A (en) Intelligent customer service voice rhythm adjusting method, device, equipment and storage medium
CN109065066B (en) Call control method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant