CN114220439A - Method, device, system, equipment and medium for acquiring voiceprint recognition model - Google Patents

Method, device, system, equipment and medium for acquiring voiceprint recognition model Download PDF

Info

Publication number
CN114220439A
CN114220439A CN202111594604.XA CN202111594604A CN114220439A CN 114220439 A CN114220439 A CN 114220439A CN 202111594604 A CN202111594604 A CN 202111594604A CN 114220439 A CN114220439 A CN 114220439A
Authority
CN
China
Prior art keywords
voiceprint
voiceprint recognition
model
layer
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111594604.XA
Other languages
Chinese (zh)
Inventor
李森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jinyun Zhilian Technology Co ltd
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202111594604.XA priority Critical patent/CN114220439A/en
Publication of CN114220439A publication Critical patent/CN114220439A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/30Individual registration on entry or exit not involving the use of a pass
    • G07C9/32Individual registration on entry or exit not involving the use of a pass in combination with an identity check
    • G07C9/37Individual registration on entry or exit not involving the use of a pass in combination with an identity check using biometric data, e.g. fingerprints, iris scans or voice recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The disclosure relates to a method, a device, a system, equipment and a computer readable storage medium for acquiring a voiceprint recognition model. The acquisition method comprises the following steps: replacing a second last layer in a pre-trained convolutional neural network model with a limited Boltzmann machine, and replacing the first last layer in the convolutional neural network model with a normalized exponential function layer to obtain a target model, wherein the convolutional neural network model is obtained by pre-training based on a first voiceprint sample set; training the target model based on a second voiceprint sample set to obtain a voiceprint recognition model, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set, and the voiceprint recognition model is used for recognizing the voiceprint of the user. By the method for acquiring the voiceprint recognition model, the voiceprint recognition model with high recognition accuracy can be acquired.

Description

Method, device, system, equipment and medium for acquiring voiceprint recognition model
Technical Field
The present disclosure relates to the field of smart home technologies, and in particular, to a method, an apparatus, a system, a device, and a computer-readable storage medium for acquiring a voiceprint recognition model.
Background
With the development of electronic equipment and communication networks, smart homes are popularized in more and more user families. However, the existing smart home systems have a space for improvement in convenience and safety.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a method, an apparatus, a system, a device and a computer readable storage medium for acquiring a voiceprint recognition model, by which a voiceprint recognition model with high recognition accuracy can be obtained.
In a first aspect, an embodiment of the present disclosure provides a method for acquiring a voiceprint recognition model, where the method includes:
replacing a second last layer in the pre-trained convolutional neural network model with a RBM (Restricted Boltzmann Machines), and replacing the first last layer in the convolutional neural network model with a normalized exponential function softmax layer to obtain a target model, wherein the convolutional neural network model is obtained by pre-training based on a first voiceprint sample set;
training the target model based on a second voiceprint sample set to obtain a voiceprint recognition model, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set, and the voiceprint recognition model is used for recognizing the voiceprint of the user.
In a second aspect, an embodiment of the present disclosure provides an apparatus for obtaining a voiceprint recognition model, where the apparatus includes:
the model correction module is used for replacing a second last layer in the pre-trained convolutional neural network model with a Restricted Boltzmann Machine (RBM), replacing the first last layer in the convolutional neural network model with a normalized exponential function (softmax) layer, and obtaining a target model, wherein the convolutional neural network model is obtained by pre-training based on a first voiceprint sample set;
and the training module is used for training the target model based on a second voiceprint sample set to obtain a voiceprint recognition model, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set, and the voiceprint recognition model is used for recognizing the voiceprint of the user.
In a third aspect, an embodiment of the present disclosure provides an intelligent home system, where the system includes: the system comprises a voiceprint recognition module and an access control module;
the voiceprint recognition module is in communication connection with the access control module and is used for carrying out voiceprint recognition on the received voice information through the voiceprint recognition model of any one of claims 1 to 4 to obtain a voiceprint recognition result and sending the voiceprint recognition result to the access control module;
and the entrance guard control module is used for determining whether to control the electronic lock to be opened or not according to the voiceprint recognition result.
In a fourth aspect, an embodiment of the present disclosure provides a control method for an intelligent home system, where the method is applied to the intelligent home system, and the method includes:
performing voiceprint recognition on the received voice information through a voiceprint recognition module to obtain a voiceprint recognition result, and sending the voiceprint recognition result to an access control module;
and determining whether the voiceprint recognition result meets a preset condition or not through an access control module, and controlling the electronic lock to be opened if the voiceprint recognition result meets the preset condition.
In a fifth aspect, an embodiment of the present disclosure provides an apparatus for acquiring a voiceprint recognition model, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first and fourth aspects.
In a sixth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first and fourth aspects.
The method for obtaining the voiceprint recognition model provided by the embodiment of the disclosure includes firstly pre-training a convolutional neural network model based on a first voiceprint sample set, then improving the structure of the pre-trained convolutional neural network model, specifically, replacing a second last layer in the pre-trained convolutional neural network model with a restricted boltzmann machine RBM, replacing the first last layer with a normalized exponential function softmax layer, obtaining a target model, and then continuing training the target model based on a second voiceprint sample set, obtaining the voiceprint recognition model, so as to be used for recognizing the voiceprint of a user, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set. The structure of the pre-trained convolutional neural network model is improved to further learn the specific high-order voiceprint characteristics of the second voiceprint sample set, so that more voiceprint characteristics are extracted, the voiceprint recognition rate of small samples is improved, and the problem that the sample size of the second voiceprint sample set is small is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of a method for obtaining a voiceprint recognition model according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a target model provided in an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an acquisition process of a voiceprint recognition model according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an intelligent home system according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an intelligent home system according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an intelligent home system according to an embodiment of the present disclosure;
fig. 7 is a flowchart of a control method of the smart home system according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an apparatus for acquiring a voiceprint recognition model according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an obtaining apparatus of a voiceprint recognition model according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
For smart home, in addition to providing efficient and convenient solutions for life of people, the most important thing is to ensure the safety of home scenes. The intelligent access control system of the intelligent home is an important guarantee for the safety of the intelligent home. Be different from traditional key entrance guard, the access control system of intelligent house often adopts automatic entrance guard or semi-automatization entrance guard.
Typical ways of opening a door include the use of keys, password identification, card identification, and biometric identification. Wherein, the mode of using the key convenient operation, but need carry at any time, intelligent degree is lower. Although the password identification mode does not need to carry keys, cards and other objects, the password identification mode has the defects of easy password leakage, high cost and low intelligence degree. The card identification mode comprises magnetic card identification and radio frequency card identification, wherein the magnetic card identification has the advantages of low cost, easy abrasion, short service life, easy card duplication and difficult bidirectional control; the radio frequency card identification has the advantages of non-contact equipment, long service life, single function, low intelligent degree of management and complex wiring. Biometric identification currently includes only fingerprint identification, face identification, and iris identification. Sound is one of the most natural information communication methods for human beings, and is undoubtedly one of the efficient means for controlling smart home devices, and meanwhile, since human sound in a natural state has specificity, voiceprint recognition should be used in a smart home access control scene, so how to improve the accuracy of voiceprint recognition becomes the primary problem to be solved.
In view of the above problems, embodiments of the present disclosure provide a method for obtaining a voiceprint recognition model, which aims to obtain a model suitable for voiceprint recognition and having a higher accuracy. Fig. 1 is a flowchart of a method for acquiring a voiceprint recognition model according to an embodiment of the present disclosure, where the method includes the following specific steps:
and 110, replacing a second last layer in the pre-trained convolutional neural network model with a Restricted Boltzmann Machine (RBM), and replacing a first last layer in the convolutional neural network model with a normalized exponential function (softmax) layer to obtain a target model.
And the convolutional neural network model is obtained by pre-training based on the first voiceprint sample set. The first voiceprint sample set can be an open data set, the sample size of the first voiceprint sample set is large, most of the samples are obtained through simulation, for example, a large number of samples are obtained by adding different noises into real speech, so that the sample size of the first voiceprint sample set is large, but the pertinence is not strong, the voiceprint recognition accuracy of the convolutional neural network model obtained through the pre-training of the first voiceprint sample set is not high, and the convolutional neural network model after the pre-training needs to be improved continuously.
In order to adapt to the current situation that the real voiceprint sample size is small and differences exist among the voiceprint samples, and meanwhile obtain a voiceprint recognition model with high recognition accuracy, in the scheme of the embodiment of the disclosure, the second last layer in the pre-trained convolutional neural network model is replaced by a limited boltzmann machine RBM, the first last layer is replaced by a normalized exponential function softmax layer, and other layers except the first last layer and the second last layer in the pre-trained convolutional neural network model are reserved to obtain a target model, so that the limited boltzmann machine RBM is used for further learning the specific high-order voiceprint features of the second voiceprint sample set, thereby extracting more targeted voiceprint features and improving the voiceprint recognition rate of small samples.
In order to improve the convergence rate of the convolutional neural network model during pre-training, before training the convolutional neural network model based on the first voiceprint sample set, the method further includes:
segmenting each voiceprint sample in the first set of voiceprint samples into audio segments of a preset duration (e.g., the preset duration is 4 s); sampling is carried out on each audio segment according to a preset frequency (for example, the preset frequency is 8000Hz), and a sampling result is obtained; carrying out fast Fourier transform on the sampling result to obtain discrete audio band signals; and inputting the discrete audio segment signals into the convolutional neural network model for pre-training the convolutional neural network model.
In one embodiment, the pre-trained convolutional neural network model comprises a four-layer structure, wherein a first layer and a second layer are convolutional layers, the first layer comprises one convolutional kernel, and the second layer comprises three convolutional kernels; the third layer and the fourth layer are all connecting layers; the penultimate layer is the third layer, and the penultimate layer is the fourth layer.
Specifically, when the voiceprint recognition model is applied to an access control system of an intelligent home, the number of nodes of a first layer of convolution layer of the convolution neural network model can be 16000, the size of the convolution core is 3 × 3, and the step number is 1; the number of nodes of the second convolution layer is 3 × 15998, the sizes of the three convolution kernels are 3 × 3, and the step number is 1; the number of nodes of the third fully-connected layer is 3 × 15996; the number of nodes of the fourth fully-connected layer is 1 × 47988. Replacing a penultimate layer, namely a third layer of fully-connected layers in the pretrained convolutional neural network model, with an RBM, replacing a penultimate layer, namely a fourth layer of fully-connected layers, with a normalized exponential function softmax layer, and reserving convolutional layers of the first layer and the second layer and associated model parameters (such as the number of nodes, the size of convolutional kernels, the number of steps and the like). In this way, the 3 x 15996 dimensional features output by the convolution layer of the second layer are used as the input of the RBM, and the output of the RBM is used as the input of the softmax layer, so that the improvement on the convolution neural network model has the advantages that the most useful convolution part of the large sample model can be reserved, and simultaneously, the voiceprint corresponding to the probability maximum value is found through the improvement.
Illustratively, referring to a schematic structural diagram of an object model as shown in fig. 2, it includes a first layer of convolutional layer, a second layer of convolutional layer, RBM layer and softmax layer. The input of the first layer of the convolution layer is 1 x 16000-dimensional feature data, the output is 3 x 15998-dimensional feature data, the 3 x 15998-dimensional feature data is used as the input of the second layer of the convolution layer, the output of the second layer of the convolution layer is 3 x 15996-dimensional feature data, the 3 x 15996-dimensional feature data is used as the input of the RBM, the RBM layer is not only fully connected with all the convolved voiceprint feature maps, but also can further learn the high-order voiceprint features specific to small samples from the feature maps, the output of the RBM is 1 x 1000-dimensional feature data, the 1 x 1000-dimensional feature data is used as the input of the softmax layer, and the output of the softmax layer is 1 x 100-dimensional feature data.
And 120, training the target model based on a second voiceprint sample set to obtain a voiceprint recognition model, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set, and the voiceprint recognition model is used for recognizing the voiceprint of the user.
The samples in the second set of voiceprint samples are real voiceprint samples, so that the sample size of the second set of voiceprint samples is small and the voiceprint samples are different from each other, and therefore the directly extracted voiceprint feature recognition rate is low. In order to adapt to the current situation that the real voiceprint sample size is small and simultaneously obtain a voiceprint recognition model with high recognition accuracy, firstly, a convolutional neural network model is pre-trained on the basis of a first voiceprint sample set (namely, a large sample with a large sample number), then, the structure of the pre-trained convolutional neural network model is improved to obtain a target model, and then, the target model is continuously trained on the basis of a second voiceprint sample set (namely, a small sample with a small sample size) to obtain the voiceprint recognition model for recognizing the voiceprint of a user.
Further, in order to improve the convergence rate of the convolutional neural network model during pre-training, before training the target model based on the second set of fingerprint samples, the method further includes:
segmenting each voiceprint sample in the second set of voiceprint samples into audio segments of a preset duration (e.g., the preset duration is 4 s); sampling is carried out on each audio segment according to a preset frequency (for example, the preset frequency is 8000Hz), and a sampling result is obtained; carrying out fast Fourier transform on the sampling result to obtain discrete audio band signals; the training of the target model based on the second set of fingerprint samples comprises: training the target model based on the discrete audio segment signals.
Specifically, in order to adapt the target model to a specific recognition task, for example, to adapt the target model to an access control system of a smart home for recognizing a voiceprint of a home owner, in the solution of the embodiment of the present disclosure, when the target model is continuously trained based on the second voiceprint sample set, the weight and the bias of the target model are reversely fine-tuned by using the BP algorithm. In other words, the deviation between the predicted value and the actual value corresponding to the sample is determined by the cross entropy cost function, so that the target model adjusts the model parameters based on the deviation. And the target model after training based on the second voiceprint sample set, namely the voiceprint recognition model can be used for a voiceprint recognition task to recognize the voiceprints of certain preset users.
The method for acquiring a voiceprint recognition model provided in this embodiment includes pre-training a convolutional neural network model based on a first voiceprint sample set, then improving a structure of the pre-trained convolutional neural network model, specifically replacing a second last layer in the pre-trained convolutional neural network model with a restricted boltzmann machine RBM, replacing a first last layer in the pre-trained convolutional neural network model with a normalized exponential function softmax layer, obtaining a target model, and then continuing training the target model based on a second voiceprint sample set to obtain the voiceprint recognition model for recognizing a voiceprint of a user, where a sample size of the first voiceprint sample set is greater than a sample size of the second voiceprint sample set. The structure of the pre-trained convolutional neural network model is improved to further learn the specific high-order voiceprint characteristics of the second voiceprint sample set, so that more voiceprint characteristics are extracted, the voiceprint recognition rate of small samples is improved, and the problem that the sample size of the second voiceprint sample set is small is solved. The voiceprint feature recognition rate of direct extraction is low due to the small data volume of the voiceprint and the difference between the voiceprint data sets. In order to improve the small sample voiceprint recognition performance, the improved CNN model based on transfer learning is provided, a fully connected layer of a CNN network is replaced by RBM, the layer is not only fully connected with all voiceprint characteristic graphs after convolution, but also can further learn the special high-order voiceprint characteristics of the small sample from the characteristic graphs, so that more voiceprint characteristics are extracted, and the voiceprint recognition rate of the small sample is improved.
In summary, referring to the schematic diagram of the obtaining process of the voiceprint recognition model shown in fig. 3, specifically, the convolutional neural network model is pre-trained through large sample voiceprint data, the structure of the pre-trained convolutional neural network model is improved, and then the improved model is continuously trained through small sample voiceprint data, so as to obtain a model capable of completing a voiceprint recognition task. The "migration model" in fig. 3 refers to the improvement of the structure of the pre-trained convolutional neural network model. The 'preprocessing of the large sample voice data' refers to dividing sample data in a large sample into audio segments with preset time duration (for example, the preset time duration is 4s), sampling according to preset frequency (for example, the preset frequency is 8000Hz) for each audio segment to obtain a sampling result, and performing fast Fourier transform on the sampling result to obtain discrete audio segment signals. The small sample voice data preprocessing refers to dividing sample data in a small sample into audio segments with preset time length (for example, the preset time length is 4s), sampling according to preset frequency (for example, the preset frequency is 8000Hz) aiming at each audio segment to obtain a sampling result, and performing fast Fourier transform on the sampling result to obtain discrete audio segment signals. "adaptive adjustment of the new network" refers to determining a deviation between a predicted value and an actual value corresponding to a sample by a cross entropy cost function, and adjusting a model parameter based on the deviation.
On the basis of the above embodiments, the present embodiment provides an intelligent home system using the above voiceprint recognition model. As shown in fig. 4, a schematic structural diagram of an intelligent home system is provided, where the intelligent home system includes: a voiceprint recognition module 410 and an access control module 420.
The voiceprint recognition module 410 is in communication connection with the access control module 420, and is configured to perform voiceprint recognition on the received voice information through the voiceprint recognition model in the embodiment to obtain a voiceprint recognition result, and send the voiceprint recognition result to the access control module 420; the entrance guard control module 420 is configured to determine whether to control the electronic lock to open according to the voiceprint recognition result. The voice information is specifically the voice of the user currently standing at the doorway. If the voiceprint recognition result shows that the voice information is the voice information of the preset user, the access control module 420 controls the electronic lock to be opened. The preset user may be a user who previously inputs voice in the smart home system, and is typically a family member, a friend, a relative, or the like.
Further, referring to the schematic structural diagram of the smart home system shown in fig. 5, the smart home system further includes: the alarm module 510 is in communication connection with the access control module 420, and the alarm module 510 is used for executing alarm operation when receiving an alarm instruction sent by the access control module 420; the access control module 420 is further configured to send the alarm instruction to the alarm module 510 when it is determined that the voiceprint recognition result does not meet the preset condition. The alarm execution operation can be sending a short message or alarm information to the associated terminal; or sending alarm information through the application client. Specifically, the short message is sent through the GSM module, and the alarm information can also be sent through wifi in a home through app.
Further, referring to the schematic structural diagram of the smart home system shown in fig. 6, the smart home system further includes: the recording module 610 is in communication connection with the access control module 420, and the recording module 610 is used for recording the video information of the user and the voice information when receiving a recording instruction sent by the access control module 420; the voiceprint recognition module 410 is further configured to send the voice message to the recording module 610, so that the voice message is recorded when a recording instruction is received, and later-stage tracing is facilitated. When the voiceprint recognition module 410 detects an unqualified user, the access control module 420 controls the recording module 610 to record the short-time video at the moment for later use when providing evidence.
The intelligent home system provided by the embodiment is used for identifying the entrance guard scene of the intelligent home for the first time by the voiceprint, can ensure higher identification precision, not only ensures the intelligent degree of the intelligent home system, but also realizes higher convenience, safety and better use experience. The important point is voiceprint detection, and the voiceprint feature recognition rate of direct extraction is low due to the fact that the data volume of voiceprints is small and differences exist among voiceprint data sets. In order to improve the small sample voiceprint recognition performance, the improved CNN model based on transfer learning is provided, a fully connected layer of a CNN network is replaced by RBM, the layer is not only fully connected with all voiceprint characteristic graphs after convolution, but also can further learn the special high-order voiceprint characteristics of the small sample from the characteristic graphs, so that more voiceprint characteristics are extracted, and the voiceprint recognition rate of the small sample is improved.
On the basis of the foregoing embodiment, fig. 7 provides a control method of an intelligent home system, which is applied to the intelligent home system in the foregoing embodiment. As shown in fig. 7, the control method includes the steps of:
and 710, performing voiceprint recognition on the received voice information through a voiceprint recognition module to obtain a voiceprint recognition result, and sending the voiceprint recognition result to an access control module.
And 720, determining whether the voiceprint recognition result meets a preset condition or not through an access control module, and controlling the electronic lock to be unlocked if the voiceprint recognition result meets the preset condition.
Further, the method also comprises the following steps: when the entrance guard control module determines that the voiceprint recognition result does not meet the preset condition, an alarm instruction is sent to an alarm module through the entrance guard control module; and alarming through the alarm module.
The alarming through the alarming module comprises:
and sending short messages or alarm information to the associated terminal through the alarm module.
Further, the method also comprises the following steps:
when the entrance guard control module determines that the voiceprint recognition result does not meet the preset condition, sending a recording instruction to a recording module through the entrance guard control module; and recording video recording information and the voice information of the user through the recording module.
Fig. 8 is a schematic structural diagram of an apparatus for acquiring a voiceprint recognition model according to an embodiment of the present disclosure. The apparatus for acquiring a voiceprint recognition model provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the method for acquiring a voiceprint recognition model, as shown in fig. 8, the apparatus for acquiring a voiceprint recognition model includes: a model modification module 810 and a training module 820.
The model modification module 810 is configured to replace a second last layer in a pretrained convolutional neural network model with a limited boltzmann machine, replace a first last layer in the convolutional neural network model with a normalized exponential function layer, and obtain a target model, where the convolutional neural network model is obtained by pretraining based on a first voiceprint sample set; a training module 820, configured to train the target model based on a second set of voiceprint samples to obtain a voiceprint recognition model, where a sample size of the first set of voiceprint samples is greater than a sample size of the second set of voiceprint samples, and the voiceprint recognition model is configured to recognize a voiceprint of a user.
Optionally, the training module 820 is specifically configured to: and determining the deviation between the predicted value and the actual value corresponding to the sample through a cross entropy cost function so that the target model adjusts the model parameters based on the deviation.
Optionally, the method further includes: a preprocessing module, configured to segment each voiceprint sample in the second set of voiceprint samples into an audio segment of a preset duration before the target model is trained based on the second set of voiceprint samples; sampling is carried out on each audio segment according to a preset frequency, and a sampling result is obtained; and carrying out fast Fourier transform on the sampling result to obtain discrete audio band signals. The training module 820 is specifically configured to: training the target model based on the discrete audio segment signals.
Optionally, the convolutional neural network model includes a four-layer structure, where the first layer and the second layer are convolutional layers, the first layer includes one convolutional kernel, and the second layer includes three convolutional kernels; the third layer and the fourth layer are all connecting layers; the penultimate layer is the third layer, and the penultimate layer is the fourth layer.
The apparatus for acquiring a voiceprint recognition model in the embodiment shown in fig. 8 can be used to implement the technical solution of the above method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 9 is a schematic structural diagram of an obtaining apparatus of a voiceprint recognition model according to an embodiment of the present disclosure. The apparatus for acquiring a voiceprint recognition model provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the method for acquiring a voiceprint recognition model, as shown in fig. 9, an apparatus 90 for acquiring a voiceprint recognition model includes: memory 91, processor 92, computer programs and communications interface 93; wherein the computer program is stored in the memory 91 and is configured to execute the method of obtaining a voiceprint recognition model as described above by the processor 92.
In addition, the embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for acquiring the voiceprint recognition model according to the foregoing embodiment.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method for acquiring a voiceprint recognition model, the method comprising:
replacing a second last layer in a pre-trained convolutional neural network model with a limited Boltzmann machine, and replacing the first last layer in the convolutional neural network model with a normalized exponential function layer to obtain a target model, wherein the convolutional neural network model is obtained by pre-training based on a first voiceprint sample set;
training the target model based on a second voiceprint sample set to obtain a voiceprint recognition model, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set, and the voiceprint recognition model is used for recognizing the voiceprint of the user.
2. The method of claim 1, wherein the training the target model based on the second set of fingerprint samples comprises:
and determining the deviation between the predicted value and the actual value corresponding to the sample through a cross entropy cost function so that the target model adjusts the model parameters based on the deviation.
3. The method of claim 1, wherein prior to training the target model based on the second set of fingerprint samples, further comprising:
dividing each voiceprint sample in the second set of voiceprint samples into audio segments with preset duration;
sampling is carried out on each audio segment according to a preset frequency, and a sampling result is obtained;
carrying out fast Fourier transform on the sampling result to obtain discrete audio band signals;
the training of the target model based on the second set of fingerprint samples comprises:
training the target model based on the discrete audio segment signals.
4. The method of any one of claims 1-3, wherein the convolutional neural network model comprises a four-layer structure, and wherein the first layer and the second layer are convolutional layers, wherein the first layer comprises one convolutional kernel and the second layer comprises three convolutional kernels; the third layer and the fourth layer are all connecting layers; the penultimate layer is the third layer, and the penultimate layer is the fourth layer.
5. The utility model provides an intelligent home systems which characterized in that includes: the system comprises a voiceprint recognition module and an access control module;
the voiceprint recognition module is in communication connection with the access control module and is used for carrying out voiceprint recognition on the received voice information through the voiceprint recognition model of any one of claims 1 to 4 to obtain a voiceprint recognition result and sending the voiceprint recognition result to the access control module;
and the entrance guard control module is used for determining whether to control the electronic lock to be opened or not according to the voiceprint recognition result.
6. The system of claim 5, further comprising: the alarm module is in communication connection with the access control module and is used for executing alarm operation when receiving an alarm instruction sent by the access control module;
and the entrance guard control module is also used for sending the alarm instruction to the alarm module when the voiceprint recognition result is determined not to accord with the preset condition.
7. The system of claim 5 or 6, further comprising: the recording module is in communication connection with the access control module and is used for recording video information and the voice information of a user when receiving a recording instruction sent by the access control module;
the voiceprint recognition module is also used for sending the voice information to the recording module.
8. A control method of an intelligent home system is applied to the intelligent home system according to any one of the claims 5 to 7, and is characterized by comprising the following steps:
performing voiceprint recognition on the received voice information through a voiceprint recognition module to obtain a voiceprint recognition result, and sending the voiceprint recognition result to an access control module;
and determining whether the voiceprint recognition result meets a preset condition or not through an access control module, and controlling the electronic lock to be opened if the voiceprint recognition result meets the preset condition.
9. The method of claim 8, further comprising:
when the entrance guard control module determines that the voiceprint recognition result does not meet the preset condition, an alarm instruction is sent to an alarm module through the entrance guard control module;
and alarming through the alarm module.
10. The method of claim 9, wherein said alerting by said alerting module comprises:
and sending short messages or alarm information to the associated terminal through the alarm module.
11. The method of claim 8, further comprising:
when the entrance guard control module determines that the voiceprint recognition result does not meet the preset condition, sending a recording instruction to a recording module through the entrance guard control module;
and recording video recording information and the voice information of the user through the recording module.
12. An apparatus for acquiring a voiceprint recognition model, comprising:
the model correction module is used for replacing a second last layer in the pre-trained convolutional neural network model with a limited Boltzmann machine, replacing a first last layer in the convolutional neural network model with a normalized exponential function layer and obtaining a target model, wherein the convolutional neural network model is obtained by pre-training based on a first voiceprint sample set;
and the training module is used for training the target model based on a second voiceprint sample set to obtain a voiceprint recognition model, wherein the sample size of the first voiceprint sample set is larger than that of the second voiceprint sample set, and the voiceprint recognition model is used for recognizing the voiceprint of the user.
13. An apparatus for acquiring a voiceprint recognition model, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-4.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN202111594604.XA 2021-12-24 2021-12-24 Method, device, system, equipment and medium for acquiring voiceprint recognition model Pending CN114220439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111594604.XA CN114220439A (en) 2021-12-24 2021-12-24 Method, device, system, equipment and medium for acquiring voiceprint recognition model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111594604.XA CN114220439A (en) 2021-12-24 2021-12-24 Method, device, system, equipment and medium for acquiring voiceprint recognition model

Publications (1)

Publication Number Publication Date
CN114220439A true CN114220439A (en) 2022-03-22

Family

ID=80705469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111594604.XA Pending CN114220439A (en) 2021-12-24 2021-12-24 Method, device, system, equipment and medium for acquiring voiceprint recognition model

Country Status (1)

Country Link
CN (1) CN114220439A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116573508A (en) * 2023-07-13 2023-08-11 深圳市万物云科技有限公司 High-resolution elevator fault identification method, device and related medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116573508A (en) * 2023-07-13 2023-08-11 深圳市万物云科技有限公司 High-resolution elevator fault identification method, device and related medium
CN116573508B (en) * 2023-07-13 2023-10-10 深圳市万物云科技有限公司 High-resolution elevator fault identification method, device and related medium

Similar Documents

Publication Publication Date Title
CN107395352B (en) Personal identification method and device based on vocal print
CN109769099B (en) Method and device for detecting abnormality of call person
US20220317641A1 (en) Device control method, conflict processing method, corresponding apparatus and electronic device
CN103971680B (en) A kind of method, apparatus of speech recognition
EP3047622B1 (en) Method and apparatus for controlling access to applications
CN108009521A (en) Humanface image matching method, device, terminal and storage medium
CN106599866A (en) Multidimensional user identity identification method
CN111524527B (en) Speaker separation method, speaker separation device, electronic device and storage medium
CN108399671A (en) A kind of Internet of Things vena metacarpea video gate inhibition integrated system
CN109036412A (en) voice awakening method and system
CN113033490B (en) Industrial equipment general fault detection method and system based on sound signals
CN106991312B (en) Internet anti-fraud authentication method based on voiceprint recognition
CN110797031A (en) Voice change detection method, system, mobile terminal and storage medium
CN109901955A (en) A kind of method and system for testing power-on screen state
CN112491844A (en) Voiceprint and face recognition verification system and method based on trusted execution environment
CN113936298A (en) Feature recognition method and device and computer readable storage medium
CN116490920A (en) Method for detecting an audio challenge, corresponding device, computer program product and computer readable carrier medium for a speech input processed by an automatic speech recognition system
CN114491525A (en) Android malicious software detection feature extraction method based on deep reinforcement learning
CN114220439A (en) Method, device, system, equipment and medium for acquiring voiceprint recognition model
KR20090089674A (en) An apparatus of sound recognition in a portable terminal and a method thereof
CN115881126A (en) Switch control method and device based on voice recognition and switch equipment
CN115862634A (en) Voiceprint recognition method and embedded device
CN112926126B (en) Federal learning method based on Markov random field
US11562173B2 (en) Method, device, and computer program product for model updating
CN114387968A (en) Voice unlocking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240527

Address after: No.006, 6th floor, building 4, No.33 yard, middle Xierqi Road, Haidian District, Beijing 100085

Applicant after: BEIJING KINGSOFT CLOUD NETWORK TECHNOLOGY Co.,Ltd.

Country or region after: China

Applicant after: Wuxi Jinyun Zhilian Technology Co.,Ltd.

Address before: No.006, 6th floor, building 4, No.33 yard, middle Xierqi Road, Haidian District, Beijing 100085

Applicant before: BEIJING KINGSOFT CLOUD NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China