WO2022161025A1 - 声纹识别方法、装置、电子设备和可读存储介质 - Google Patents

声纹识别方法、装置、电子设备和可读存储介质 Download PDF

Info

Publication number
WO2022161025A1
WO2022161025A1 PCT/CN2021/139745 CN2021139745W WO2022161025A1 WO 2022161025 A1 WO2022161025 A1 WO 2022161025A1 CN 2021139745 W CN2021139745 W CN 2021139745W WO 2022161025 A1 WO2022161025 A1 WO 2022161025A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint model
voiceprint
internet
model
electronic device
Prior art date
Application number
PCT/CN2021/139745
Other languages
English (en)
French (fr)
Inventor
胡宁宁
陈喆
曹冰
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022161025A1 publication Critical patent/WO2022161025A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Definitions

  • the present application relates to the technical field of information security, and more particularly, to a voiceprint recognition method, apparatus, electronic device and readable storage medium.
  • Voiceprint feature is one of the important biological features of the human body, with strong individual particularity, and is often used in voiceprint recognition, voiceprint authentication and other fields as a feature of identity authentication.
  • the voiceprint wake-up technology has been widely used in terminal equipment products such as mobile phones.
  • the application in IOT (The Internet of Things, Internet of Things) and other products is also gradually popularizing.
  • IOT The Internet of Things, Internet of Things
  • voiceprint training when users register voiceprints, a certain number of user voices are collected at one time, and then Using these collected audios, voiceprint information is extracted to generate a voiceprint wake-up model.
  • the voiceprint training process of the IOT device is usually performed by means of an APP (Application) of a mobile terminal such as a mobile phone. Therefore, how to better use the voiceprint recognition in the IOT and electronic devices is a technical problem that needs to be solved urgently.
  • the present application proposes a voiceprint recognition method, device, electronic device and readable storage medium to improve the above-mentioned defects.
  • an embodiment of the present application provides a voiceprint recognition method, which is applied to an electronic device.
  • the method includes: acquiring an Internet of Things device connected to the electronic device, and determining whether the Internet of Things device includes a first voiceprint model; if it is determined that the IoT device does not include the first voiceprint model, then determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device; if the IoT device is compatible with all the second voiceprint model, then send the second voiceprint model to the Internet of Things device, and the Internet of Things device is configured to use the second voiceprint model for the The second audio data is used for voiceprint recognition.
  • an embodiment of the present application further provides a voiceprint recognition apparatus, which is applied to an electronic device, and the apparatus includes: an acquisition module, a determination module, and a transmission module.
  • the acquiring module is configured to acquire the IoT device connected to the electronic device, and determine whether the IoT device includes the first voiceprint model.
  • a determining module configured to determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device if it is determined that the IoT device does not include the first voiceprint model.
  • a sending module configured to send the second voiceprint model to the Internet of Things device if the Internet of Things device is compatible with the second voiceprint model, and the Internet of Things device receives the second audio data when Perform voiceprint recognition on the second audio data by using the second voiceprint model.
  • embodiments of the present application further provide an electronic device, including one or more processors; a memory; and one or more application programs, wherein the one or more application programs are stored in the memory and Configured to be executed by the one or more processors, the one or more programs are configured to perform the above-described method.
  • an embodiment of the present application further provides a computer-readable medium, where a program code is stored in the computer-readable storage medium, and the program code can be invoked by a processor to execute the foregoing method.
  • an embodiment of the present application further provides a computer program product, including a computer program/instruction, wherein the computer program/instruction implements the above method when the computer program/instruction is executed by a processor.
  • FIG. 1 shows an application scenario diagram of the voiceprint recognition method and device provided by an embodiment of the present application
  • FIG. 2 shows a method flowchart of a voiceprint recognition method provided by an embodiment of the present application
  • FIG. 3 shows a method flowchart of a voiceprint recognition method provided by another embodiment of the present application.
  • Fig. 4 shows the flowchart of step S304 in Fig. 3;
  • FIG. 5 shows a method flowchart of a voiceprint recognition method provided by another embodiment of the present application.
  • FIG. 6 shows a method flowchart of a voiceprint recognition method provided by still another embodiment of the present application.
  • FIG. 7 shows a block diagram of a module of a voiceprint recognition device provided by an embodiment of the present application.
  • FIG. 8 shows a module block diagram of a voiceprint recognition device provided by another embodiment of the present application.
  • FIG. 9 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 10 shows a storage unit provided by an embodiment of the present application for storing or carrying a program code for implementing the voiceprint recognition method according to the embodiment of the present application.
  • FIG. 11 shows a structural block diagram of a computer program product provided by an embodiment of the present application.
  • FIG. 1 an application scenario diagram of the voiceprint recognition method and apparatus provided by the embodiments of the present application is shown.
  • the electronic device 1 and the IoT device 2 are located in a wireless network or a wired network, and the electronic device 1 and the IoT device 3 perform data interaction.
  • the electronic device 1 may be a mobile terminal device, for example, may include a smart phone, a tablet computer, an e-book reader, a laptop computer, a vehicle-mounted computer, a wearable mobile terminal, and the like.
  • the electronic device 1 can be connected to at least one IoT device 2 through a wired or wireless network, and the IoT device 2 can be a smart home device, such as a TV, an air conditioner, a refrigerator, a sensor, etc.
  • the electronic device 1 can be Data is transmitted to the IoT device 2, and the IoT device 2 can also transmit the acquired data to the electronic device 1, and when there are multiple IoT devices 2, data can also be performed between the multiple IoT devices 2. interaction.
  • Voiceprint feature is one of the important biological features of the human body, with strong individual particularity, and is often used in voiceprint recognition, voiceprint authentication and other fields as a feature of identity authentication.
  • the voiceprint wake-up technology has been widely used in terminal equipment products such as mobile phones.
  • the application in IOT (The Internet of Things, Internet of Things) and other products is also gradually popularizing.
  • voiceprint training when users register voiceprints, a certain number of user voices are collected at one time, and then Using these collected audios, voiceprint information is extracted to generate a voiceprint wake-up model.
  • the voiceprint training process of the IOT device is usually performed with the help of an APP (APPlication, application program) of a mobile terminal such as a mobile phone.
  • APP Application, application program
  • the voiceprint training of the IOT device needs to be trained using electronic devices such as mobile phones, and the corresponding APP needs to be downloaded, and then When voiceprint training is performed again, the whole process is cumbersome and complicated, so that the voiceprint recognition operation is not simple and convenient.
  • users register voiceprint wake-up they usually need to collect multiple voices at one time to generate a wake-up model, because the user is using When the voiceprint wake-up function is used, the user's voiceprint will continue to change over time, that is, voiceprint drift occurs, resulting in a decrease in the voiceprint wake-up rate over time.
  • an embodiment of the present application provides a voiceprint recognition method.
  • FIG. 2 shows a voiceprint recognition method provided by an embodiment of the present application, which is applied to an electronic device.
  • the voiceprint recognition method includes steps S201 to S204 .
  • Step S201 Acquire an Internet of Things device connected to the electronic device.
  • the embodiments of the present application may acquire IoT devices connected to electronic devices, where the electronic devices may be electronic devices capable of running applications, such as smartphones, tablet computers, and e-books, and IoT devices (IoT devices) It can be a smart home device, such as it can include a TV, an air conditioner, a refrigerator, and information sensors, infrared sensors, and laser scanners, among others.
  • the Internet of Things is the "Internet of Everything Connected", which is an extension and expansion of the Internet based on the Internet, that is, the Internet of Things is a huge network formed by combining various information sensing devices with the Internet.
  • the IoT device in this embodiment of the present application may be any device in the Internet of Things, or multiple devices in the Internet of Things, or may also be Specify the device, what the IoT device specifically refers to is not clearly limited here, and can be selected according to the actual situation.
  • the electronic device when it obtains that there are multiple IoT devices connected to it, it can sort the priorities of the multiple IoT devices from high to low to obtain a priority sorting result, and then Whether the Internet of Things device includes the first voiceprint model is determined according to the priority sorting result. Specifically, the electronic device can determine the priority of the IoT device according to the frequency of use of the user. The longer it is, the higher the priority. By determining the sending order of the second voiceprint model according to the sorting result of the priority, the user experience can be improved to a certain extent. For example, refrigerators are the IoT devices most frequently used by users, so configuring voiceprint recognition models for TVs first can improve the user experience.
  • Step S202 Determine whether the IoT device includes a first voiceprint model.
  • the electronic device may determine whether the IoT device includes the first voiceprint model. Specifically, the electronic device may send a voiceprint model detection instruction to the IoT device, To instruct the IoT device to determine whether it includes the first voiceprint model, when the electronic device receives the confirmation information sent by the IoT, it can be determined that the IoT device includes the first voiceprint model. confirmation information, it is determined that the IoT device does not include the first voiceprint model.
  • the electronic device after the electronic device acquires the IoT device connected to it, it can also send a data acquisition instruction to the IoT device, and when receiving the data sent by the IoT device, the electronic device can It is determined whether the IoT device includes the first voiceprint model.
  • the electronic device when the electronic device acquires the IoT device connected to it, it can also acquire the relevant historical data of interaction with the IoT device stored in the electronic device, and determine whether the IoT device includes the first voiceprint model based on the historical data.
  • the electronic device can use artificial intelligence to analyze the historical data of the IoT device to determine whether the historical data includes data related to the first voiceprint model, and if so, determine that the IoT device includes the first voiceprint model. pattern.
  • the electronic device can also determine whether the first voiceprint model is included in the IoT device through data sent by the server.
  • the first voiceprint model may be a network model related to voiceprint recognition.
  • the main function of the first voiceprint model is to receive the voice input by the user and recognize the voice.
  • the content related to the input voice can be executed. For example, user A inputs the voice content "Xiaomei, I want to increase the temperature", when the IoT device receives the voice, it can determine whether user A is a registered user, and if it is a registered user, increase the temperature.
  • the first voiceprint model may also be obtained by training the first voiceprint recognition network through the audio data input by the user through the Internet of Things device, or may be obtained through training on the audio data sent by other devices.
  • the voiceprint recognition network may be a voiceprint recognition network configured before the IoT device leaves the factory. After the IoT device obtains 3 to 5 pieces of audio data input by the user, it can use the audio data to identify the first sound.
  • the fingerprint recognition network is trained to obtain the first voiceprint model.
  • the first voiceprint recognition network may be constructed based on hardware parameters of the IoT device, and the corresponding stored first voiceprint recognition network may also be different for different IoT devices.
  • the embodiment of the present application may determine whether the first voiceprint model is the latest voiceprint model, and if it is the latest voiceprint model, then The first voiceprint model is not updated, and if it is not the latest voiceprint model, the first voiceprint model can be updated by using the second voiceprint model.
  • this embodiment of the present application may determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device, that is, step S102 is entered.
  • Step S203 If it is determined that the IoT device does not include the first voiceprint model, determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device.
  • the electronic device when the electronic device determines that the IoT device does not include the first voiceprint model, it may determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device, and the second voiceprint model may be the same as the second voiceprint model generated by the electronic device.
  • a network model related to voiceprint recognition The main function of the second voiceprint model is to receive the voice input by the user and recognize the voice. When the input voice matches the preset voice, the input voice-related voice can be executed. content.
  • the first voiceprint model is obtained by training the first voiceprint recognition network through the first audio data, wherein the first voiceprint recognition network may be obtained according to the hardware parameters of the Internet of Things device, and the first voiceprint recognition network
  • An audio data may be audio data input by a user based on an IoT device.
  • the second voiceprint recognition model is obtained by training the second voiceprint recognition network through the second audio data, wherein the second voiceprint recognition network may be obtained according to the hardware parameters of the electronic device, and the second audio data may be the user Audio data based on electronic device input.
  • the difference between the first voiceprint recognition model and the second voiceprint model is that the first voiceprint recognition model is stored in the Internet of Things device, and is obtained by training the first voiceprint recognition network through the first audio data , and the second voiceprint model is stored in the electronic device and obtained by training the second voiceprint recognition network through the second audio data.
  • the second voiceprint model may be obtained by training the second voiceprint recognition network through the audio data input by the user through the electronic device, or may be obtained through training on the audio data sent by other devices, wherein the second voiceprint model
  • the identification network may be a voiceprint identification network configured before the electronic device leaves the factory, or may also be a voiceprint identification network configured after leaving the factory. After the electronic device acquires 3 to 5 audio data input by the user, The audio data can be used to construct the hardware parameters of the second electronic device, and the corresponding stored second voiceprint recognition network may also be different for different electronic devices.
  • the network structures of the first voiceprint recognition network and the second voiceprint recognition network may or may not be the same, and the electronic equipment and the Internet of Things equipment may or may not belong to the same manufacturer.
  • the first audio data for the voiceprint model and the second audio data for training the second voiceprint model may be the same or different.
  • the implementation of the present application can determine whether the Internet of Things device is compatible with the second voiceprint model generated by the electronic device, where compatibility refers to the Internet of Things device. Whether the second voiceprint model can be run normally, and whether the accuracy of the voiceprint recognition by the IoT device using the second voiceprint model is greater than the accuracy threshold, the accuracy threshold may be that the electronic device uses the second voiceprint model to perform voiceprint recognition. The average accuracy rate of pattern recognition, and the accuracy rate threshold may also be set by the user according to the experience value.
  • the electronic device may send the second voiceprint model to the IoT device, that is, step S103 is entered.
  • the electronic device may not send the second voiceprint model to the Internet of Things device, that is, end the model sending operation, and at the same time, the second voiceprint model can be trained.
  • the second audio data of the model is sent to the Internet of Things device to instruct the Internet of Things device to use the second audio data to update the first voiceprint model; Internet-connected devices, and determine whether the Internet-of-Things devices include the first voiceprint model, that is, step S101 is entered.
  • Step S204 Send the second voiceprint model to the Internet of Things device, where the Internet of Things device is configured to use the second voiceprint model to perform processing on the second audio data when receiving the second audio data.
  • Voiceprint recognition Send the second voiceprint model to the Internet of Things device, where the Internet of Things device is configured to use the second voiceprint model to perform processing on the second audio data when receiving the second audio data.
  • the electronic device when the electronic device determines that the IoT device does not include the first voiceprint model, and the IoT device is compatible with the second voiceprint model generated by the electronic device, it may send the second voiceprint model to all
  • the Internet of Things device is configured to perform voiceprint recognition on the second audio data by using the second voiceprint model when receiving the second audio data.
  • a voiceprint recognition method proposed by an embodiment of the present application By sending the second voiceprint model included in the electronic device to the Internet of Things device, the user experience of training the voiceprint model can be improved.
  • the IoT device connected to the electronic device, and determine whether the IoT device includes the first voiceprint model, when it is determined that the IoT device does not include the first voiceprint model, determine whether the IoT device is compatible with the first voiceprint model, if If it is compatible, the second voiceprint model included in the electronic device is sent to the IoT device, and the IoT device is mainly used to use the second voiceprint model to voiceprint the second audio data when the second audio data is received. identify.
  • the user can avoid tediously training the voiceprint model on different Internet of Things devices.
  • the voiceprint recognition method may include steps S301 to S306.
  • Step S301 Acquire an Internet of Things device connected to the electronic device.
  • Step S302 Determine whether the IoT device includes a first voiceprint model.
  • the electronic device when the IoT device includes the first voiceprint model, the electronic device can obtain the first generation time of the first voiceprint model and the second generation time of the second voiceprint model, that is, enter step S303 . In addition, when the Internet of Things device does not include the first voiceprint model, the electronic device may determine whether the Internet of Things device is compatible with the second voiceprint model generated by the electronic device, that is, step S305 is entered.
  • Step S303 Obtain a first generation time of the first voiceprint model, and obtain a second generation time of the second voiceprint model.
  • the first generation time of the first voiceprint model may be the time when the IoT device obtains the first voiceprint model, that is, the first voiceprint model may be trained by using the first audio data to train the first voiceprint model Identify the network and get the time for the first voiceprint model. For example, at 14:00 on January 10, 2021, the IoT device A obtains the audio data input by the user based on the IoT device A, and then the IoT device A uses the audio data to identify the pre-acquired first voiceprint network After training, the first voiceprint model is finally obtained, and the first voiceprint model is generated at 15:00 on January 10, 2021, and the first generation time is at 15:00 on January 10, 2021.
  • the second generation time of the second voiceprint model may be the time when the electronic device obtains the second voiceprint model, that is, the second voiceprint model may be obtained by training the second voiceprint recognition network by using the second audio data.
  • Time for the second voiceprint model For example, at 9:00 on January 20, 2021, the electronic device B obtains the audio data input by the user based on the electronic device B, and then the electronic device B uses the audio data to train the pre-acquired second voiceprint recognition network, Finally, the second voiceprint model is obtained, and the second voiceprint model is generated at 11:00 on January 20, 2021, and the second generation time is at 11:00 on January 20, 2021.
  • the first generation time and the second generation time may also be the update time of the model, that is, the first generation time may be the time at which the first voiceprint model is updated by using the second voiceprint model, or It is the time when the first voiceprint model is updated by using the second audio data corresponding to the second voiceprint model, or it may be the time when the user updates the first voiceprint model based on the latest audio data input by the IoT device.
  • the second generation time can also be the time when the user updates the second voiceprint model based on the latest audio data input by the electronic device, or can also be the time when the second voiceprint model is updated by the voiceprint model sent by other electronic devices. update time etc.
  • the first generation time refers to the generation time of the latest version of the first voiceprint model included in the IoT device
  • the second generation time refers to the generation time of the latest version of the second voiceprint model included in the electronic device. time.
  • the electronic device may determine whether the first generation time is earlier than the second generation time, If the first generation time is earlier than the second generation time, the second voiceprint model is sent to the Internet of Things device, that is, step S204 is entered. If the first generation time is later than the second generation time, the first voiceprint model stored in the IoT device is not updated, that is, the first voiceprint model is kept unchanged.
  • Step S304 When the first generation time is earlier than the second generation time, send the second voiceprint model to the IoT device, and instruct the IoT device to use the second voiceprint The model updates the first voiceprint model.
  • the electronic device when the electronic device determines that the first generation time is earlier than the second generation time, the electronic device sends the second voiceprint model to the IoT device, so as to instruct the IoT device to use the second voiceprint model for the first voiceprint
  • the model is updated.
  • the first generation time of the first voiceprint model is 15:00 on January 10, 2021
  • the second generation time of the second voiceprint model is 11:00 on January 20, 2021. It can be seen that, The first generation time is earlier than the second generation time.
  • the electronic device may send the second voiceprint model to the IoT device, and instruct the IoT device to update the first voiceprint model by using the second voiceprint model.
  • the electronic device instructs the IoT device to use the second voiceprint model to update the first voiceprint model, which may be to instruct the IoT device to directly replace the first voiceprint model with the second voiceprint model, and then the IoT device directly replaces the first voiceprint model with the second voiceprint model.
  • the electronic device can use the second voiceprint model to identify the audio data.
  • the electronic device may also instruct the IoT device to fuse the second voiceprint model with the first voiceprint model to obtain The third voiceprint model.
  • the IoT device receives the audio data input by the user, it can use the third voiceprint model to identify the audio data.
  • step S304 may include steps S3041 to S3042.
  • Step S3041 Determine whether the IoT device is compatible with the second voiceprint model.
  • the electronic device may also determine whether the IoT device is compatible with the first voiceprint model.
  • Second voiceprint model if the electronic device is compatible with the second voiceprint model, then send the second voiceprint model to the IoT device, that is, go to step S3042, if the electronic device is not compatible with the second voiceprint model, it is not necessary to The voiceprint model in the device is updated, or the second audio data corresponding to the second voiceprint model can also be sent to the Internet of Things device, so as to update the first voiceprint model through the audio data.
  • Step S3042 When the IoT device is compatible with the second voiceprint model, send the second voiceprint model to the IoT device, and instruct the IoT device to use the second voiceprint model The first voiceprint model is updated.
  • the electronic device may acquire second audio data corresponding to the second voiceprint model, and then send the second audio data to the IoT device, And instruct the IoT device to use the second audio data to update the first voiceprint model.
  • the second audio data may be audio training data input by the user based on the electronic device. After receiving the second audio data, the electronic device may use the second audio data to train the second voiceprint recognition network to obtain the first voiceprint recognition network.
  • the second audio data may be the audio data obtained by training the second voiceprint recognition network to obtain the second voiceprint model, or it may be the data of updating the second voiceprint model.
  • the memory of the electronic device stores a second voiceprint model, and the user updates the second voiceprint model during use, and the data for updating the second voiceprint model at this time may also be referred to as second audio data.
  • Step S305 Determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device.
  • Step S306 Send the second voiceprint model to the Internet of Things device, and the Internet of Things device is configured to use the second voiceprint model to perform processing on the second audio data when receiving the second audio data.
  • Voiceprint recognition Send the second voiceprint model to the Internet of Things device, and the Internet of Things device is configured to use the second voiceprint model to perform processing on the second audio data when receiving the second audio data.
  • the voiceprint recognition method proposed in the embodiment of the present application can improve the user experience of training the voiceprint model by sending the second voiceprint model included in the electronic device to the Internet of Things device. Connect the IoT device, and determine whether the IoT device includes the first voiceprint model. When it is determined that the IoT device does not include the first voiceprint model, determine whether the IoT device can be compatible with the first voiceprint model, and if so , the second voiceprint model included in the electronic device is sent to the IoT device, and the IoT device is mainly used to perform voiceprint recognition on the second audio data by using the second voiceprint model when receiving the second audio data.
  • the embodiment of the present application by sending the second voiceprint model included in the electronic device to the Internet of Things device, it can prevent users from tediously training the voiceprint model on different Internet of Things devices to a certain extent.
  • the embodiment of the present application can accurately and timely update the first voiceprint model by judging the generation time of the first voiceprint model and the second voiceprint model, which can avoid the phenomenon of voiceprint drift to a certain extent.
  • the wake rate drop problem by sending the second voiceprint model included in the electronic device to the Internet of Things device to the Internet of Things device, it can prevent users from tediously training the voiceprint model on different Internet of Things devices to a certain extent.
  • the embodiment of the present application can accurately and timely update the first voiceprint model by judging the generation time of the first voiceprint model and the second voiceprint model, which can avoid the phenomenon of voiceprint drift to a certain extent.
  • the wake rate drop problem by sending the second voiceprint model included in the electronic device to the Internet of Things device.
  • the voiceprint recognition method may include steps S501 to S510.
  • Step S501 Acquire an Internet of Things device connected to the electronic device.
  • Step S502 Determine whether the IoT device includes a first voiceprint model.
  • Step S503 Obtain a first generation time of the first voiceprint model, and obtain a second generation time of the second voiceprint model.
  • the implementation of the present application may determine whether the first generation time of the first voiceprint model is earlier than that of the second voiceprint model If the first generation time is earlier than the second generation time, the time difference between the first generation time and the second generation time is obtained, that is, step S504 is entered.
  • Step S504 When the first generation time is earlier than the second generation time, obtain a time difference between the first generation time and the second generation time.
  • the first generation time may be the time when the first voiceprint model is obtained by training the first voiceprint recognition network.
  • the second generation time may be the time when the second voiceprint model is obtained by training the second voiceprint recognition network. time to tattoo the model.
  • the implementation of the present application may also obtain the time difference between the first generation time and the second generation time, and then determine the time difference between the first generation time and the second generation time. If the time difference is greater than the time threshold, step S505 is entered.
  • Step S405 Determine whether the time difference is greater than a time threshold.
  • the time threshold may be preset or determined according to the structural complexity of the first voiceprint recognition network.
  • the network structure complexity and the time threshold can be stored correspondingly in a one-to-one correspondence manner, and the corresponding time threshold can be obtained after the first voiceprint recognition network is obtained.
  • the embodiment of the present application may first obtain the network structure complexity of the first voiceprint recognition network, and then obtain the time threshold corresponding to the network structure complexity.
  • the complexity of the network structure can be comprehensively determined by combining the depth, weight parameters and loss function of the neural network.
  • Step S506 when the time difference is greater than a time threshold, send the second voiceprint model to the Internet of Things device.
  • the electronic device may send the second voiceprint model to the object Internet-connected devices.
  • the electronic device may also determine whether the IoT device is compatible with the second voiceprint model, and if it is determined that the IoT device is compatible with the second voiceprint model, then The second voiceprint model is directly sent to the Internet of Things device to instruct the Internet of Things device to use the second voiceprint model for audio recognition when receiving audio data. If the IoT device is not compatible with the second voiceprint model, the electronic device may send the second audio data for training the second voiceprint model to the IoT device, and instruct the IoT device to use the second audio data to The sound pattern model is updated.
  • the electronic device may stop the voiceprint in the IoT device.
  • the update operation of the model does not change the first voiceprint model stored in the IoT device.
  • the embodiment of the present application may determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device, that is, step S507 is entered.
  • Step S507 Determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device.
  • Step S508 Send the second voiceprint model to the Internet of Things device, where the Internet of Things device is used to perform the second audio data using the second voiceprint model when receiving the second audio data.
  • Voiceprint recognition Send the second voiceprint model to the Internet of Things device, where the Internet of Things device is used to perform the second audio data using the second voiceprint model when receiving the second audio data.
  • Step S509 Acquire second audio data corresponding to the second voiceprint model.
  • Step S510 Send the second audio data to the IoT device, and instruct the IoT device to use the second audio data to train the first voiceprint recognition network to obtain a first voiceprint model .
  • the voiceprint recognition method proposed in the embodiment of the present application can improve the user experience of training the voiceprint model by sending the second voiceprint model included in the electronic device to the Internet of Things device. Connect the IoT device, and determine whether the IoT device includes the first voiceprint model. When it is determined that the IoT device does not include the first voiceprint model, determine whether the IoT device can be compatible with the first voiceprint model, and if so , the second voiceprint model included in the electronic device is sent to the IoT device, and the IoT device is mainly used to perform voiceprint recognition on the second audio data by using the second voiceprint model when receiving the second audio data.
  • the user can avoid tediously training the voiceprint model on different Internet of Things devices.
  • the time difference between the first generation time and the second generation time in this embodiment of the present application not only can the influence of voiceprint drift be avoided, but also unnecessary power consumption caused by model updating can be reduced.
  • the voiceprint recognition method may include steps S601 to S604.
  • Step S601 Acquire an Internet of Things device connected to the electronic device.
  • Step S602 When it is determined that the IoT device does not include the first voiceprint model, determine whether the structure of the first voiceprint recognition network matches the structure of the second voiceprint recognition network.
  • the electronic device may determine whether the network structure of the first voiceprint recognition network matches the structure of the second voiceprint recognition network, that is, determine whether the first voiceprint recognition network matches the network structure of the second voiceprint recognition network. Whether the network structure of the fingerprint recognition network and the network structure of the second recognition network are basically the same, if they are the same or approximately the same, it is determined that the structure of the first voiceprint recognition network matches the structure of the second voiceprint recognition network, and it can be determined at this time.
  • the IoT device is compatible with the second voiceprint model generated by the electronic device.
  • the first voiceprint recognition network is composed of an input layer, a convolutional layer, a pooling layer and an activation function layer
  • the second voiceprint recognition network is also composed of an input layer, a convolutional layer, a pooling layer and an activation layer. It is composed of function layers, and the number of layers of input layer, convolution layer, pooling layer and activation function layer in the first voiceprint recognition network and the input layer, convolution layer, pooling layer and activation layer in the second voiceprint recognition network The number of layers of the function layer is the same, and at this time, it can be determined that the structures of the first voiceprint recognition network and the second voiceprint recognition network match.
  • the embodiment of the present application needs to determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device, and the method for judging compatibility can be as described above That is, it is judged whether the structure of the first voiceprint recognition network matches the structure of the second voiceprint recognition network.
  • the electronic device can also obtain the manufacturers of the IoT device and the electronic device, and determine whether the manufacturers of the IoT device and the electronic device are the same, and if the manufacturers of the IoT device and the electronic device are the same, it can be determined
  • the networked device can be compatible with the second voiceprint model generated by the electronic device. It should be noted that, when the electronic device determines whether the Internet of Things device is compatible with the second voiceprint model generated by the electronic device, the embodiment of the present application may determine at least one of the manufacturer, the version, and the network structure, etc. After deciding which one is not clearly limited here, you can choose according to the actual situation.
  • Step S603 If there is a match, determine that the IoT device is compatible with the second voiceprint model generated by the electronic device.
  • Step S604 Send the second voiceprint model to the Internet of Things device, where the Internet of Things device is configured to use the second voiceprint model to perform processing on the second audio data when receiving the second audio data.
  • Voiceprint recognition Send the second voiceprint model to the Internet of Things device, where the Internet of Things device is configured to use the second voiceprint model to perform processing on the second audio data when receiving the second audio data.
  • the electronic device may be a mobile terminal, for example, the Internet of Things device is a TV, and the electronic device is a mobile phone.
  • the electronic device may also be an Internet of Things device, for example, the Internet of Things device is a TV, and the electronic device is an air conditioner.
  • the Internet of Things device is TV 1 and the electronic device is TV 2.
  • the IoT device can also be a mobile terminal, for example, the electronic device is a mobile phone 1 , and the IoT device can be a mobile phone 2 .
  • Electronic devices and IoT devices are not specifically limited here.
  • the voiceprint recognition method proposed in the embodiment of the present application can improve the user experience of training the voiceprint model by sending the second voiceprint model included in the electronic device to the Internet of Things device. Connect the IoT device, and determine whether the IoT device includes the first voiceprint model. When it is determined that the IoT device does not include the first voiceprint model, determine whether the IoT device can be compatible with the first voiceprint model, and if so , the second voiceprint model included in the electronic device is sent to the IoT device, and the IoT device is mainly used to perform voiceprint recognition on the second audio data by using the second voiceprint model when receiving the second audio data.
  • the user can avoid tediously training the voiceprint model on different Internet of Things devices.
  • the erroneous transmission of the model can be avoided, and the power consumption caused by the erroneous transmission can be reduced.
  • an embodiment of the present application provides a voiceprint recognition apparatus 700, which is applied to electronic equipment.
  • the voiceprint recognition apparatus 700 includes: an acquisition module 701 , a determination module 702 and a transmission module 703 .
  • the obtaining module 701 is configured to obtain an IoT device connected to the electronic device, and determine whether the IoT device includes a first voiceprint model.
  • the determining module 702 is configured to determine whether the IoT device is compatible with the second voiceprint model generated by the electronic device if it is determined that the IoT device does not include the first voiceprint model.
  • the determining module 702 is further configured to determine whether the structure of the first voiceprint recognition network and the structure of the second voiceprint recognition network are not when it is determined that the IoT device does not include the first voiceprint model. Matching; if matching, it is determined that the IoT device is compatible with the second voiceprint model generated by the electronic device.
  • a sending module 703, configured to send the second voiceprint model to the Internet of Things device if the Internet of Things device is compatible with the second voiceprint model, and the Internet of Things device receives the second audio data At the same time, use the second voiceprint model to perform voiceprint recognition on the second audio data.
  • the voiceprint recognition apparatus 800 may further include a time acquisition module 804 and a model update module 805 .
  • a time acquisition module 804 configured to acquire a first generation time of the first voiceprint model if the IoT device includes the first voiceprint model, and acquire a second generation time of the second voiceprint model time.
  • a model updating module 805, configured to send the second voiceprint model to the IoT device when the first generation time is earlier than the second generation time, and instruct the IoT device to use the The second voiceprint model updates the first voiceprint model.
  • model updating module 805 is further configured to determine whether the IoT device is compatible with the second voiceprint model; when the IoT device is compatible with the second voiceprint model, the second voiceprint model is updated. sending the data to the Internet of Things device, and instructing the Internet of Things device to update the first voiceprint model by using the second voiceprint model.
  • model updating module 805 is further configured to acquire the second audio data corresponding to the second voiceprint model when the IoT device is not compatible with the second voiceprint model; send the second audio data to the IoT device, and instruct the IoT device to use the second audio data to update the first voiceprint model.
  • model updating module 805 is further configured to acquire the time difference between the first generation time and the second generation time when the first generation time is earlier than the second generation time, and determine the Whether the time difference is greater than the time threshold; when the time difference is greater than the time threshold, the second voiceprint model is sent to the IoT device.
  • the voiceprint recognition apparatus 800 is further configured to obtain second audio data corresponding to the second voiceprint model if the IoT device is not compatible with the second voiceprint model; The data is sent to the Internet of Things device, and the Internet of Things device is instructed to use the second audio data to train the first voiceprint recognition network to obtain a first voiceprint model.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • the present application can improve the user experience of training the voiceprint model by sending the second voiceprint model included in the electronic device to the Internet of Things device.
  • the present application can first obtain the an IoT device connected to the electronic device, and determine whether the IoT device includes the first voiceprint model, and when it is determined that the IoT device does not include the first voiceprint model, determine whether the IoT device is compatible with the first voiceprint model, If it is compatible, the second voiceprint model included in the electronic device is sent to the Internet of Things device, and the Internet of Things device is mainly used to use the second voiceprint model to sound the second audio data when the second audio data is received. Pattern recognition.
  • the present application by sending the second voiceprint model included in the electronic device to the Internet of Things device, to a certain extent, the user can avoid tediously training the voiceprint model on different Internet of Things devices.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
  • FIG. 9 shows a structural block diagram of an electronic device 900 provided by an embodiment of the present application.
  • the electronic device 900 may be an electronic device capable of running an application program, such as a smart phone, a tablet computer, an electronic book, or the like.
  • the electronic device 900 in the present application may include one or more of the following components: a processor 910, a memory 920, and one or more application programs, wherein the one or more application programs may be stored in the memory 920 and configured by One or more processors 910 execute, and one or more programs are configured to perform the methods described in the foregoing method embodiments.
  • Processor 910 may include one or more processing cores.
  • the processor 910 uses various interfaces and lines to connect various parts of the entire electronic device 900, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 920, and calling the data stored in the memory 920.
  • the processor 910 may adopt at least one of a digital signal processing (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and a Programmable Logic Array (Programmable Logic Array, PLA). implemented in a hardware form.
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 910 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a voiceprint recognition device (Graphics Processing Unit, GPU), a modem, and the like.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the CPU mainly handles the operating system, user interface and application programs, etc.
  • the GPU is used for rendering and drawing of the display content
  • the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 910, and is implemented by a communication chip alone.
  • the memory 920 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 920 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 920 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
  • the storage data area may also store data (such as phone book, audio and video data, chat record data) created by the electronic device 900 during use.
  • FIG. 10 shows a structural block diagram of a computer-readable storage medium 1000 provided by an embodiment of the present application.
  • Program codes are stored in the computer-readable storage medium 1000, and the program codes can be invoked by a processor to execute the methods described in the above method embodiments.
  • the computer-readable storage medium 1000 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 1000 includes a non-transitory computer-readable storage medium.
  • Computer readable storage medium 1000 has storage space for program code 1010 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products.
  • Program code 1010 may be compressed, for example, in a suitable form.
  • FIG. 11 shows a computer program product 1100 provided by an embodiment of the present application, including a computer program/instruction 1110 , which implements the above method when the computer program/instruction is executed by a processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请公开了一种声纹识别方法、装置、电子设备和可读存储介质,属于信息安全技术领域。该方法应用于电子设备,所述方法包括:获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型;若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型;若所述物联网设备兼容所述第二声纹模型,则将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。本申请通过将电子设备上第一声纹模型发送至物联网设备可以避免用户繁琐的训练声纹模型。

Description

声纹识别方法、装置、电子设备和可读存储介质
相关申请的交叉引用
本申请要求于2021年01月28日提交中国专利局的申请号为CN202110118214.9、名称为“声纹识别方法、装置、电子设备和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息安全技术领域,更具体地,涉及一种声纹识别方法、装置、电子设备和可读存储介质。
背景技术
声纹特征是人体重要生物特征之一,具有较强的个体特殊性,常用于声纹识别、声纹认证等领域作为身份认证的一种特征。目前声纹唤醒技术已经广泛应用于手机等终端设备产品中。在IOT(The Internet of Things,物联网)等产品中的应用也正在慢慢的普及,如在声纹训练的过程中,一般采用用户注册声纹时,一次性采集一定数量的用户语音,然后使用这些采集到的音频,提取声纹信息,生成声纹唤醒模型。另外,IOT设备的声纹训练过程,通常是借助手机等移动终端的APP(APPlication,应用程序),因此,如何更好的在IOT和电子设备中使用声纹识别是亟待解决的技术问题。
发明内容
本申请提出了一种声纹识别方法、装置、电子设备和可读存储介质,以改善上述缺陷。
第一方面,本申请实施例提供了一种声纹识别方法,应用于电子设备,所述方法包括:获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型;若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼 容所述电子设备生成的第二声纹模型;若所述物联网设备兼容所述第二声纹模型,则将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
第二方面,本申请实施例还提供了一种声纹识别装置,应用于电子设备,所述装置包括:获取模块、确定模块和发送模块。其中,获取模块,用于获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型。确定模块,用于若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型。发送模块,用于若所述物联网设备兼容所述第二声纹模型,则将所述第二声纹模型发送至所述物联网设备,所述物联网设备在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
第三方面,本申请实施例还提供了一种电子设备,包括一个或多个处理器;存储器;一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述方法。
第四方面,本申请实施例还提供了一种计算机可读介质,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行上述方法。
第五方面,本申请实施例还提供了一种计算机程序产品,包括计算机程序/指令,其特征在于,该计算机程序/指令被处理器执行时实现上述方法。
本申请实施例的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请实施例而了解。本申请实施例的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请实施例提供的声纹识别方法及装置的应用场景图;
图2示出了本申请一个实施例提供的声纹识别方法的方法流程图;
图3示出了本申请另一个实施例提供的声纹识别方法的方法流程图;
图4示出了图3中步骤S304的流程图;
图5示出了本申请又一个实施例提供的声纹识别方法的方法流程图;
图6示出了本申请再一个实施例提供的声纹识别方法的方法流程图;
图7示出了本申请实施例提供的声纹识别装置的模块框图;
图8示出了本申请另一个实施例提供的声纹识别装置的模块框图;
图9示出了本申请实施例提供的电子设备的结构框图;
图10示出了本申请实施例提供的用于保存或者携带实现根据本申请实施例的声纹识别方法的程序代码的存储单元。
图11示出了本申请实施例提供的计算机程序产品的结构框图。
具体实施方式
下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
请参阅图1,示出了本申请实施例提供的声纹识别方法及装置的应用场景图。如图1中所示,电子设备1和物联网设备2位于无线网络或有线网络中,电子设备1和物联网设备3进行数据交互。
于本申请实施例中,电子设备1可以为移动终端设备,例如可以包括智能手机、平板电脑、电子书阅读器、膝上型便携计算机、车载电脑、穿戴式移动终端等等。在一些实施方式中,电子设备1可以通过有线或者无线网络与至少一个物联网设备2进行连接,物联网设备2可以是智能家居设备,如其可以是电视、空调、冰箱以及传感器等电子设备1可以传输数据至物联网设备2,物联网设备2也可以将其获取的数据传输至电子设备1,并且当物联网设备2为多个时,所述多个物联网设备2之间也可以进行数据的交互。
声纹特征是人体重要生物特征之一,具有较强的个体特殊性,常用于声纹识别、声纹认证等领域作为身份认证的一种特征。目前声纹唤醒技术已经广泛应用于手机等终端设备产品中。在IOT(The Internet of Things,物联网)等产品中的应用也正在慢慢的普及,如在声纹训练的过程中,一般采用用户注册声纹时,一次性采集一定数量的用户语音,然后使用这些采集到的音频,提取声纹信息,生成声纹唤醒模型。另外,IOT设备的声纹训练过程,通常是借助手机等移动终端的APP(APPlication,应用程序。目前IOT 设备的声纹训练需要使用手机等电子设备进行训练,并且需要下载对应的APP等,然后再进行声纹训练,整个过程操作繁琐且复杂,如此导致声纹识别操作不够简单方便。另外,用户在注册声纹唤醒时,通常需要一次性采集多个语音来生成唤醒模型,由于用户在使用声纹唤醒功能时,用户的声纹会随着时间不断发生改变,即产生声纹漂移,导致声纹唤醒率随着时间增长而出现下降情况。
因此,为了解决上述问题,本申请实施例提供了一种声纹识别方法。请参阅图2,示出了本申请一个施例提供的声纹识别方法,应用于电子设备,该声纹识别方法包括:步骤S201至步骤S204。
步骤S201:获取与所述电子设备连接的物联网设备。
作为一种方式,本申请实施例可以获取与电子设备连接的物联网设备,其中,电子设备可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备,物联网设备(IOT设备)可以是智能家居设备,如其可以包括电视、空调、冰箱以及信息传感器、红外传感器和激光扫描器等。物联网即“万物相连的互联网”,是互联网基础上的延伸和扩展的网络,即物联网是将各种信息传感设备与互联网结合起来而形成的一个巨大网络,其可以实现在任何时间、任何地点,人、机以及物的互通互联,本申请实施例中的物联网设备可以是物联网中的任一设备,也可以是物联网中的多个设备,或者也可以是物联网中的指定设备,物联网设备具体指的是什么设备这里不进行明确限制,可以根据实际情况进行选择。
本申请实施例中,当电子设备获取到与其连接的物联网设备为多个时,其可以对所述多个物联网设备的优先级进行从高到低的排序,得到优先级排序结果,而后根据优先级排序结果确定物联网设备是否包括第一声纹模型。具体的,电子设备可以根据用户的使用频次确定物联网设备的优先级,使用频次越高则优先级越高,或者也可以根据物联网设备的使用时长确定物联网设备的优先级,使用时间越长则优先级越高。通过根据优先级的排序结果来确定第二声纹模型的发送顺序,在一定程度上可以提高用户的使用体验。例如,冰箱是用户使用频率最高的物联网设备,那么先为电视配置声纹识别模型可以提高用户的使用体验。
步骤S202:确定所述物联网设备是否包括第一声纹模型。
在一些实施方式中,电子设备在获取到与其连接的物联网设备之后,可以确定该物联网设备是否包括第一声纹模型,具体的,电子设备可以发送声纹模型检测指令至物联网设备,以指示物联网设备确定其是否包括第一声纹模型,当电子设备接收到物联网发送的确认信息后,即可确定物联网设备包括第一声纹模型,当电子设备未接收到物联网发送的确认信息,则确定物联网设备不包括第一声纹模型。
在另一些实施方式中,电子设备在获取到与其连接的物联网设备之后,其也可以发送数据获取指令至物联网设备,当接收到物联网设备发送的数据时,电子设备则可以根 据该数据确定物联网设备是否包括第一声纹模型。另外,电子设备在获取到与其连接的物联网设备时,其也可以获取其存储的与物联网设备交互的相关历史数据,并基于该历史数据确定物联网设备中是否包括第一声纹模型。具体的,电子设备可以利用人工智能对物联网设备的历史数据进行分析,确定出该历史数据中是否包含与第一声纹模型相关的数据,如果包括,则确定物联网设备中包括第一声纹模型。另外,电子设备也可以通过服务器发送的数据确定物联网设备中是否包括第一声纹模型。
本申请实施例中,第一声纹模型可以是与声纹识别相关的网络模型,所述第一声纹模型的主要作用是接收用户输入的语音,并对该语音进行识别,当输入语音与预设语音匹配时,则可以执行该输入语音相关的内容。例如,用户A输入了语音内容“小美,我想提高温度”,当物联网设备接收到该语音时可以系确定用户A是否为注册用户,如果是注册用户,则提高温度。
作为一种方式,第一声纹模型也可以是用户通过物联网设备输入的音频数据训练第一声纹识别网络获取的,也可以是通过其他设备发送的音频数据训练获取的,其中,第一声纹识别网络可以是物联网设备在出厂前就配置有的声纹识别网络,所述物联网设备在获取到用户输入的3至5个音频数据后,可以利用所述音频数据对第一声纹识别网络进行训练,进而得到第一声纹模型。另外,第一声纹识别网络可以是基于物联网设备的硬件参数构建的,物联网设备不同则对应存储的第一声纹识别网络也可能不同。
在另一些实施方式中,当确定物联网设备包括第一声纹模型时,本申请实施例可以确定所述第一声纹模型是否是最新的声纹模型,如果是最新的声纹模型,则不对所述第一声纹模型进行更新,如果不是最新的声纹模型则可以利用第二声纹模型对第一声纹模型进行更新。另外,当确定物联网设备不包括第一声纹模型时,本申请实施例则可以确定该物联网设备是否能兼容所述电子设备生成的第二声纹模型,即进入步骤S102。
步骤S203:若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型。
在一些实施方式中,电子设备在确定物联网设备不包括第一声纹模型时,其可以确定该物联网设备是否能兼容电子设备生成的第二声纹模型,第二声纹模型可以是与声纹识别相关的网络模型,所述第二声纹模型的主要作用是接收用户输入的语音,并对该语音进行识别,当输入语音与预设语音匹配时,则可以执行该输入语音相关的内容。本申请实施例中,第一声纹模型是通过第一音频数据对第一声纹识别网络训练获取的,其中,第一声纹识别网络可以是根据物联网设备的硬件参数获取的,且第一音频数据可以是用户基于物联网设备输入的音频数据。第二声纹识别模型是通过第二音频数据对第二声纹识别网络训练获取的,其中,第二声纹识别网络可以是根据电子设备的硬件参数获取的,且第二音频数据可以是用户基于电子设备输入的音频数据。可见,第一声纹识别模型和第二声纹模型的不同之处在于,第一声纹识别模型存储于物联网设备中,且其是通过第 一音频数据训练第一声纹识别网络获取的,而第二声纹模型则是存储于电子设备中,且其是通过第二音频数据训练第二声纹识别网络获取的。
作为一种方式,第二声纹模型可以是用户通过电子设备输入的音频数据训练第二声纹识别网络获取的,也可以是通过其他设备发送的音频数据训练获取的,其中,第二声纹识别网络可以是电子设备在出厂前就配置有的声纹识别网络,或者也可以是在出厂后配置的声纹识别网络,所述电子设备在获取到用户输入的3至5个音频数据后,可以利用所述音频数据对第二电子设备的硬件参数构建的,电子设备不同则对应存储的第二声纹识别网络也可能不同。
需要说明的是,第一声纹识别网络和第二声纹识别网络的网络结构可以相同也可以不相同,且电子设备和物联网设备可以属于同一厂家也可以不属于同一厂家,以及训练第一声纹模型的第一音频数据和训练第二声纹模型第二音频数据可以相同也可以不相同。本申请实施例可以结合这些数据综合确定物联网设备是否能兼容电子设备生成的第二声纹模型,或者也可以只根据上述一个条件确定物联网设备是否能兼容电子设备生成的第二声纹模型,具体如何判断这里不进行明确限制。
通过上述介绍可以知道,本申请实施在确定物联网设备不包括第一声纹模型时,可以确定物联网设备是否能兼容电子设备生成的第二声纹模型,其中,兼容指的是物联网设备是否能正常运行第二声纹模型,以及物联网设备利用第二声纹模型进行声纹识别的准确率是否大于准确率阈值,所述准确率阈值可以是电子设备利用第二声纹模型进行声纹识别的准确率平均值,所述准确率阈值也可以是用户根据经验值设置的。当确定物联网设备能够兼容电子设备生成的第二声纹模型时,电子设备可以将第二声纹模型发送至物联网设备,即进入步骤S103。
另外,当确定物联网设备不兼容电子设备生成的第二声纹模型时,电子设备可以不将第二声纹模型发送到物联网设备,即结束模型发送操作,同时可以将训练第二声纹模型的第二音频数据发送至物联网设备,以指示物联网设备利用所述第二音频数据对第一声纹模型进行更新;或者本申请实施例也可以重新获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型,即进入步骤S101。
步骤S204:将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
本申请实施例中,电子设备在确定物联网设备不包括第一声纹模型,并且物联网设备兼容电子设备生成的第二声纹模型时,其可以将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
本申请实施例提出的一种声纹识别方法本申请通过将电子设备包括的第二声纹模型发送至物联网设备可以提高用户训练声纹模型的使用体验,具体的,本申请首先可以 获取与电子设备连接的物联网设备,并确定该物联网设备是否包括第一声纹模型,当确定物联网设备不包括第一声纹模型时,确定物联网设备是否可以兼容第一声纹模型,如果可以兼容,则将电子设备包括的第二声纹模型发送至物联网设备,而物联网设备则主要用于在接收到第二音频数据时利用第二声纹模型对第二音频数据进行声纹识别。本申请通过将电子设备包括的第二声纹模型发送至物联网设备,其在一定程度上可以避免用户繁琐的在不同的物联网设备上训练声纹模型。
本申请另一实施例提供了一种声纹识别方法,应用电子设备,请参阅图3,该声纹识别方法可以包括步骤S301至步骤S306。
步骤S301:获取与所述电子设备连接的物联网设备。
步骤S302:确定所述物联网设备是否包括第一声纹模型。
在一些实施方式中,当物联网设备包括第一声纹模型时,电子设备可以获取第一声纹模型的第一生成时间,以及获取第二声纹模型的第二生成时间,即进入步骤S303。另外,当物联网设备未包括第一声纹模型时,电子设备则可以确定物联网设备是否兼容电子设备生成的第二声纹模型,即进入步骤S305。
步骤S303:获取所述第一声纹模型的第一生成时间,以及获取所述第二声纹模型的第二生成时间。
在一些实施方式中,第一声纹模型的第一生成时间可以是物联网设备获取到第一声纹模型的时间,即第一声纹模型可以是通过利用第一音频数据训练第一声纹识别网络,得到第一声纹模型的时间。例如,物联网设备A在2021年1月10号14:00获取到用户基于所述物联网设备A输入的音频数据,而后物联网设备A利用该音频数据对预先获取的第一声纹识别网络进行训练,最后得到第一声纹模型,而第一声纹模型生成的时间是2021年1月10号15:00,此时第一生成时间即为2021年1月10号15:00。
同理,第二声纹模型的第二生成时间可以是电子设备获取到第二声纹模型的时间,即第二声纹模型可以是通过利用第二音频数据训练第二声纹识别网络,得到第二声纹模型的时间。例如,电子设备B在2021年1月20号9:00获取到用户基于所述电子设备B输入的音频数据,而后电子设备B利用该音频数据对预先获取的第二声纹识别网络进行训练,最后得到第二声纹模型,而第二声纹模型生成的时间是2021年1月20号11:00,此时第二生成时间即为2021年1月20号11:00。
在另一些实施方式中,第一生成时间和第二生成时间也可以是模型的更新时间,即第一生成时间可以是利用第二声纹模型对第一声纹模型进行更新的时间,也可以是利用第二声纹模型对应的第二音频数据对第一声纹模型进行更新的时间,或者也可以是用户基于物联网设备输入的最新的音频数据对第一声纹模型进行更新的时间。同理,第二生成时间也可以是用户基于电子设备输入的最新的音频数据对第二声纹模型进行更新的时间,或者也可以是其他电子设备发送的声纹模型对第二声纹模型进行更新的时间等。 综上,第一生成时间指的是物联网设备中包括的最新版本的第一声纹模型的生成时间,第二生成时间指的是电子设备中包括的最新版本的第二声纹模型的生成时间。
作为一种方式,在获取到第一声纹模型的第一生成时间,以及获取到第二声纹模型的第二生成时间之后,电子设备可以确定第一生成时间是否早于第二生成时间,如果第一生成时间早于第二生成时间,则将第二声纹模型发送至物联网设备,即进入步骤S204。如果第一生成时间晚于第二生成时间,则不对物联网设备中存储的第一声纹模型进行更新,即保持第一声纹模型不变。
步骤S304:当所述第一生成时间早于所述第二生成时间时,将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新。
本申请实施例中,当电子设备确定第一生成时间早于第二生成时间时,将第二声纹模型发送至物联网设备,以指示物联网设备利用第二声纹模型对第一声纹模型进行更新。例如,上述示例中第一声纹模型的第一生成时间是2021年1月10号15:00,而第二声纹模型的第二生成时间是2021年1月20号11:00,可见,第一生成时间是早于第二生成时间的。此时,电子设备则可以将第二声纹模型发送至物联网设备,指示物联网设备利用第二声纹模型对第一声纹模型进行更新。
在一些实施方式中,电子设备指示物联网设备利用第二声纹模型对第一声纹模型进行更新,可以是指示物联网设备直接用第二声纹模型替换第一声纹模型,而后物联网设备在接收到音频数据后,其便可以利用第二声纹模型对音频数据进行识别。另外,当确定第一时间早于第二时间,电子设备将第二声纹模型发送至物联网设备之后,也可以指示物联网设备将第二声纹模型和第一声纹模型进行融合,得到第三声纹模型。同理,物联网设备在接收到用户输入音频数据时,其可以利用所述第三声纹模型对音频数据进行识别。
请参阅图4,步骤S304可以包括步骤S3041至步骤S3042。
步骤S3041:确定所述物联网设备是否兼容所述第二声纹模型。
作为一种方式,在确定物联网设备包括第一声纹模型,且第一声纹模型的生成时间早于第二声纹模型的生成时间时,电子设备也可以确定物联网设备是否能兼容第二声纹模型,如果电子设备能兼容第二声纹模型,则将第二声纹模型发送至物联网设备,即进入步骤S3042,如果电子设备不能兼容第二声纹模型,则可以不对物联网设备中的声纹模型进行更新,或者也可以将第二声纹模型对应的第二音频数据发送至物联网设备,以通过该音频数据实现对第一声纹模型的更新。
步骤S3042:当所述物联网设备兼容所述第二声纹模型时,将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新。
在另一些实施方式中,当物联网设备不兼容第二声纹模型时,电子设备可以获取第二声纹模型对应的第二音频数据,然后将所述第二音频数据发送至物联网设备,并指示物联网设备利用第二音频数据对第一声纹模进行更新。其中,第二音频数据可以是用户基于电子设备输入的音频训练数据,电子设备接收到所述第二音频数据后可以利用所述第二音频数据对第二声纹识别网络进行训练,以得到第二声纹模型。需要说明的是,第二音频数据可以是训练第二声纹识别网络得到第二声纹模型的音频数据,其也可以是,更新第二声纹模型的数据。例如,电子设备的存储器中存储有第二声纹模型,用户在使用过程中更新了所述第二声纹模型,此时更新第二声纹模型的数据也可以称作是第二音频数据。
步骤S305:确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型。
步骤S306:将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
本申请实施例提出的一种声纹识别方法通过将电子设备包括的第二声纹模型发送至物联网设备可以提高用户训练声纹模型的使用体验,具体的,本申请首先可以获取与电子设备连接的物联网设备,并确定该物联网设备是否包括第一声纹模型,当确定物联网设备不包括第一声纹模型时,确定物联网设备是否可以兼容第一声纹模型,如果可以兼容,则将电子设备包括的第二声纹模型发送至物联网设备,而物联网设备则主要用于在接收到第二音频数据时利用第二声纹模型对第二音频数据进行声纹识别。本申请通过将电子设备包括的第二声纹模型发送至物联网设备,其在一定程度上可以避免用户繁琐的在不同物联网设备上训练声纹模型。另外,本申请实施例通过对第一声纹模型和第二声纹模型的生成时间的判断可以准确及时的对第一声纹模型进行更新,其在一定程度上可以避免由于声纹漂移现象导致的唤醒率下降问题。
本申请又一实施例提供了一种声纹识别方法,应用于电子设备,请参阅图5,该声纹识别方法可以包括步骤S501至步骤S510。
步骤S501:获取与所述电子设备连接的物联网设备。
步骤S502:确定所述物联网设备是否包括第一声纹模型。
步骤S503:获取所述第一声纹模型的第一生成时间,以及获取所述第二声纹模型的第二生成时间。
在另一些实施方式中,为了在避免声纹漂移产生的影响的同时降低模型更新带来的功耗,本申请实施可以确定第一声纹模型的第一生成时间是否早于第二声纹模型的第二生成时间,如果第一生成时间早于第二生成时间,则获取第一生成时间与第二生成时间之间的时间差,即进入步骤S504。
步骤S504:当所述第一生成时间早于所述第二生成时间时,获取所述第一生成时间与所述第二生成时间之间的时间差。
本申请实施例中,第一生成时间可以是通过训练第一声纹识别网络得到第一声纹模型的时间,同理,第二生成时间可以是通过训练第二声纹识别网络得到第二声纹模型的时间。在确定第一生成时间早于第二生成时间时,本申请实施也可以获取第一生成时间与第二生成时间之间的时间差,然后确定第一生成时间与第二生成时间之间的所述时间差是否大于时间阈值,即进入步骤S505。
步骤S405:确定所述时间差是否大于时间阈值。
本申请实施例中,时间阈值可以是预先设置的,也可以是根据第一声纹识别网络的结构复杂程度确定,第一声纹识别网络的网络结构越复杂则所述时间阈值越大,反之,第一声纹识别网络的网络结构越简单则时间阈值越小。换句话说,本申请实施例可以通过一一对应的方式,将网络结构复杂度和时间阈值进行对应存储,获取到第一声纹识别网络就可以获取到与其对应的时间阈值,即在获取到第一生成时间与第二生成时间之间的时间差之后,本申请实施例可以先获取第一声纹识别网络的网络结构复杂度,然后获取该网络结构复杂度对应的时间阈值。其中,网络结构复杂度可以通过结合神经网络的深度、权值参数以及损失函数等综合确定。
步骤S506:当所述时间差大于时间阈值时,将所述第二声纹模型发送至所述物联网设备。
本申请实施例中,当确定第一声纹模型的第一生成时间与第二声纹模型的第二生成时间之间的时间差大于时间阈值时,电子设备可以将第二声纹模型发送至物联网设备。另外,在将第二声纹模型发送至物联网设备之前,电子设备也可以确定物联网设备是否兼容所述第二声纹模型,如果确定物联网设备可以兼容所述第二声纹模型,则将第二声纹模型直接发送至物联网设备,以指示物联网设备在接收到音频数据时,利用所述第二声纹模型进行音频识别。如果物联网设备不能兼容所述第二声纹模型,电子设备则可以将训练第二声纹模型的第二音频数据发送至物联网设备,并指示物联网设备利用所述第二音频数据对第一声纹模型进行更新。
作为一种方式,当确定第一声纹模型的第一生成时间与第二声纹模型的第二生成时间之间的时间差小于或者等于时间阈值时,电子设备可以停止对物联网设备中声纹模型的更新操作,即不改变物联网设备中存储的第一声纹模型。另外,在确定物联网设备不包括第一声纹模型时,本申请实施例可以确定物联网设备是否能兼容电子设备生成的第二声纹模型,即进入步骤S507。
步骤S507:确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型。
步骤S508:将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
步骤S509:获取与所述第二声纹模型对应的第二音频数据。
步骤S510:将所述第二音频数据发送至所述物联网设备,并指示所述物联网设备 利用所述第二音频数据对所述第一声纹识别网络进行训练,得到第一声纹模型。
本申请实施例提出的一种声纹识别方法通过将电子设备包括的第二声纹模型发送至物联网设备可以提高用户训练声纹模型的使用体验,具体的,本申请首先可以获取与电子设备连接的物联网设备,并确定该物联网设备是否包括第一声纹模型,当确定物联网设备不包括第一声纹模型时,确定物联网设备是否可以兼容第一声纹模型,如果可以兼容,则将电子设备包括的第二声纹模型发送至物联网设备,而物联网设备则主要用于在接收到第二音频数据时利用第二声纹模型对第二音频数据进行声纹识别。本申请通过将电子设备包括的第二声纹模型发送至物联网设备,其在一定程度上可以避免用户繁琐的在不同的物联网设备上训练声纹模型。另外,本申请实施例通过对第一生成时间与第二生成时间之间的时间差的比较,不仅可以避免声纹漂移产生的影响,同时可以降低模型更新带来的不必要功耗。
本申请再一实施例提供了一种声纹识别方法,应用于电子设备,请参阅图6,该声纹识别方法可以包括步骤S601至步骤S604。
步骤S601:获取与所述电子设备连接的物联网设备。
步骤S602:当确定所述物联网设备不包括所述第一声纹模型时,判断所述第一声纹识别网络的结构与所述第二声纹识别网络的结构是否匹配。
在一些实施方式中,当确定物联网设备不包括第一声纹模型时,电子设备可以判断第一声纹识别网络的网络结构与第二声纹识别网络的结构是否匹配,即确定第一声纹识别网络的网络结构与第二识别网络的网络结构是否基本相同,如果相同或者是近似相同,则确定第一声纹识别网络的结构与第二声纹识别网络的结构匹配,此时可以确定物联网设备兼容电子设备生成的第二声纹模型。
在一个具体实施方式中,第一声纹识别网络由输入层、卷积层、池化层以及激活函数层组成,第二声纹识别网络也是由输入层、卷积层、池化层以及激活函数层组成,并且第一声纹识别网络中输入层、卷积层、池化层以及激活函数层的层数与第二声纹识别网络中的输入层、卷积层、池化层以及激活函数层的层数相同,此时便可以确定第一声纹识别网络与第二声纹识别网络结构匹配。
通过上述介绍可以知道,不管物联网设备中是否包括第一声纹模型,本申请实施例都需要确定物联网设备中是否兼容电子设备生成的第二声纹模型,是否兼容的判断方式可以如上所述,即判断第一声纹识别网络的结构与第二声纹识别网络的结构是否匹配。另外,本申请实施例也可以确定第一声纹模型和第二声纹模型的版本是否相同,如果第一声纹模型和第二声纹模型的版本相同,则确定物联网设备可以兼容电子设备生成的第二声纹模型。如果不相同,则确定物联网设备不能兼容电子设备生成的第二声纹模型。
作为另一种方式,电子设备也可以获取物联网设备和电子设备的厂家,并确定所述物联网设备和电子设备的厂家是否相同,如果物联网设备和电子设备的厂家相同,则可 以确定物联网设备可以兼容电子设备生成的第二声纹模型。需要说明的是,电子设备在确定物联网设备是否兼容电子设备生成的第二声纹模型时,本申请实施例可以结合厂家、版本以及网络结构等中的至少一个来确定,具体是先判断哪个后判断哪个这里不进行明确限制,可以根据实际情况进行选择。
步骤S603:若匹配,则确定所述物联网设备兼容所述电子设备生成的第二声纹模型。
步骤S604:将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
在一些实施方式中,电子设备可以是移动终端,如物联网设备是电视,而电子设备则是手机。另外,电子设备也可以是物联网设备,如物联网设备是电视,而电子设备则是空调,又如,物联网设备是电视1,而电子设备则是电视2。另外,物联网设备也可以是移动终端,如电子设备是手机1,而物联网设备可以是手机2。电子设备和物联网设备具体指的是什么设备这里不进行明确限制。
本申请实施例提出的一种声纹识别方法通过将电子设备包括的第二声纹模型发送至物联网设备可以提高用户训练声纹模型的使用体验,具体的,本申请首先可以获取与电子设备连接的物联网设备,并确定该物联网设备是否包括第一声纹模型,当确定物联网设备不包括第一声纹模型时,确定物联网设备是否可以兼容第一声纹模型,如果可以兼容,则将电子设备包括的第二声纹模型发送至物联网设备,而物联网设备则主要用于在接收到第二音频数据时利用第二声纹模型对第二音频数据进行声纹识别。本申请通过将电子设备包括的第二声纹模型发送至物联网设备,其在一定程度上可以避免用户繁琐的在不同的物联网设备上训练声纹模型。另外,本申请实施例通过判断物联网设备是否兼容电子设备生成的第二声纹模型可以避免模型的误传输,降低误传输带来的功耗。
请参阅图7,本申请实施例提出了一种声纹识别装置700,应用于电子设备。在具体的实施例中,该声纹识别装置700包括:获取模块701、确定模块702以及发送模块703。
获取模块701,用于获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型。
确定模块702,用于若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型。
进一步的,确定模块702还用于当确定所述物联网设备不包括所述第一声纹模型时,判断所述第一声纹识别网络的结构与所述第二声纹识别网络的结构是否匹配;若匹配,则确定所述物联网设备兼容所述电子设备生成的第二声纹模型。
发送模块703,用于若所述物联网设备兼容所述第二声纹模型,则将所述第二声纹模型发送至所述物联网设备,所述物联网设备在接收到第二音频数据时利用所述第二声 纹模型对所述第二音频数据进行声纹识别。
请参阅图8,在另一些实施方式中,声纹识别装置800还可以包括时间获取模块804和模型更新模块805。
时间获取模块804,用于若所述物联网设备包括所述第一声纹模型,则获取所述第一声纹模型的第一生成时间,以及获取所述第二声纹模型的第二生成时间。
模型更新模块805,用于当所述第一生成时间早于所述第二生成时间时,将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新。
进一步的,模型更新模块805还用于确定所述物联网设备是否兼容所述第二声纹模型;当所述物联网设备兼容所述第二声纹模型时,将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新。
进一步的,模型更新模块805还用于当所述物联网设备不兼容所述第二声纹模型时,获取所述第二声纹模型对应的第二音频数据;将所述第二音频数据发送至所述物联网设备,并指示所述物联网设备利用所述第二音频数据对所述第一声纹模型进行更新。
进一步的,模型更新模块805还用于当所述第一生成时间早于所述第二生成时间时,获取所述第一生成时间与所述第二生成时间之间的时间差,并确定所述时间差是否大于时间阈值;当所述时间差大于时间阈值时,将所述第二声纹模型发送至所述物联网设备。
进一步地,声纹识别装置800还用于若所述物联网设备不兼容所述第二声纹模型,则获取与所述第二声纹模型对应的第二音频数据;将所述第二音频数据发送至所述物联网设备,并指示所述物联网设备利用所述第二音频数据对所述第一声纹识别网络进行训练,得到第一声纹模型。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。
本申请实施例提出的一种声纹识别装置,本申请通过将电子设备包括的第二声纹模型发送至物联网设备可以提高用户训练声纹模型的使用体验,具体的,本申请首先可以获取与电子设备连接的物联网设备,并确定该物联网设备是否包括第一声纹模型,当确定物联网设备不包括第一声纹模型时,确定物联网设备是否可以兼容第一声纹模型,如果可以兼容,则将电子设备包括的第二声纹模型发送至物联网设备,而物联网设备则主要用于在接收到第二音频数据时利用第二声纹模型对第二音频数据进行声纹识别。本申请通过将电子设备包括的第二声纹模型发送至物联网设备,其在一定程度上可以避免用户繁琐的在不同的物联网设备上训练声纹模型。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是 各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
请参阅图9,其示出了本申请实施例提供的一种电子设备900的结构框图。该电子设备900可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本申请中的电子设备900可以包括一个或多个如下部件:处理器910、存储器920、以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器920中并被配置为由一个或多个处理器910执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。
处理器910可以包括一个或者多个处理核。处理器910利用各种接口和线路连接整个电子设备900内的各个部分,通过运行或执行存储在存储器920内的指令、程序、代码集或指令集,以及调用存储在存储器920内的数据,执行电子设备900的各种功能和处理数据。可选地,处理器910可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器910可集成中央处理器(Central Processing Unit,CPU)、声纹识别器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器910中,单独通过一块通信芯片进行实现。
存储器920可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器920可用于存储指令、程序、代码、代码集或指令集。存储器920可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备900在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。
请参阅图10,其示出了本申请实施例提供的一种计算机可读存储介质1000的结构框图。该计算机可读存储介质1000中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质1000可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质1000包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1000具有执行上述方法中的任何方法步骤的程序代码1010的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1010可以例如以适当形式进行压缩。
请参考图11,其示出了本申请实施例提供的一种计算机程序产品1100,包括计算机程序/指令1110,该计算机程序/指令被处理器执行时实现上述方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (22)

  1. 一种声纹识别方法,其特征在于,应用于电子设备,所述方法包括:
    获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型;
    若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型;
    若所述物联网设备兼容所述第二声纹模型,则将所述第二声纹模型发送至所述物联网设备,所述物联网设备用于在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    若所述物联网设备包括所述第一声纹模型,则获取所述第一声纹模型的第一生成时间,以及获取所述第二声纹模型的第二生成时间;
    当所述第一生成时间早于所述第二生成时间时,将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新。
  3. 根据权利要求2所述的方法,其特征在于,所述第一生成时间为所述物联网设备获取到所述第一声纹模型的时间,所述第二生成时间为所述电子设备获取到所述第二声纹模型的时间。
  4. 根据权利要求2所述的方法,其特征在于,所述第一生成时间为所述物联网设备中最新版本的第一声纹模型的生成时间,所述第二生成时间为所述电子设备中最新版本的第二声纹模型的生成时间。
  5. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    当所述第一生成时间晚于所述第二生成时间时,保持所述第一声纹模型不变。
  6. 根据权利要求2所述的方法,其特征在于,所述将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新,包括:
    确定所述物联网设备是否兼容所述第二声纹模型;
    当所述物联网设备兼容所述第二声纹模型时,将所述第二声纹模型发送至所述物联网设备,并指示所述物联网设备利用所述第二声纹模型对所述第一声纹模型进行更新。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    当所述物联网设备不兼容所述第二声纹模型时,获取所述第二声纹模型对应的第二音频数据;
    将所述第二音频数据发送至所述物联网设备,并指示所述物联网设备利用所述第二 音频数据对所述第一声纹模型进行更新。
  8. 根据权利要求2至7任一所述的方法,其特征在于,所述当所述第一生成时间早于所述第二生成时间时,将所述第二声纹模型发送至所述物联网设备,包括:
    当所述第一生成时间早于所述第二生成时间时,获取所述第一生成时间与所述第二生成时间之间的时间差,并确定所述时间差是否大于时间阈值;
    当所述时间差大于时间阈值时,将所述第二声纹模型发送至所述物联网设备。
  9. 根据权利要求8所述的方法,其特征在于,所述确定所述时间差是否大于时间阈值之前,还包括:
    根据所述第一声纹识别网络的网络结构复杂度确定所述时间阈值。
  10. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    当所述时间差小于或者等于时间阈值时,保持所述第一声纹模型不变。
  11. 根据权利要求1至6任一所述的方法,其特征在于,所述第一声纹模型是通过训练第一声纹识别网络获取的,所述方法还包括:
    若所述物联网设备不兼容所述第二声纹模型,则获取与所述第二声纹模型对应的第二音频数据;
    将所述第二音频数据发送至所述物联网设备,并指示所述物联网设备利用所述第二音频数据对所述第一声纹识别网络进行训练,得到第一声纹模型。
  12. 根据权利要求1至7任一所述的方法,其特征在于,所述第一声纹模型是通过训练所述第一声纹识别网络获取的,所述第二声纹模型是通过训练所述第二声纹识别网络获取的;
    所述若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型,包括:
    当确定所述物联网设备不包括所述第一声纹模型时,判断所述第一声纹识别网络的结构与所述第二声纹识别网络的结构是否匹配;
    若匹配,则确定所述物联网设备兼容所述电子设备生成的第二声纹模型。
  13. 根据权利要求1所述的方法,其特征在于,所述获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型,包括:
    当电子设备获取到与其连接的物联网设备为多个时,对所述多个物联网设备的优先级进行从高到低的排序,得到优先级排序结果;
    根据所述优先级排序结果确定所述多个物联网设备是否包括所述第一声纹模型。
  14. 根据权利要求1所述的方法,其特征在于,所述确定所述物联网设备是否包括第一声纹模型,包括:
    发送声纹模型检测指令至所述物联网设备,以指示所述物联网设备确定其是否包括所述第一声纹模型。
  15. 根据权利要求1所述的方法,其特征在于,所述确定所述物联网设备是否包括第一声纹模型,包括:
    获取预先存储的与所述物联网设备交互得到的历史数据;基于所述历史数据确定所述物联网设备是否包括第一声纹模型。
  16. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当确定所述物联网设备不兼容所述第二声纹模型时,重新获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型。
  17. 根据权利要求1所述方法,其特征在于,所述确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型,包括:
    判断所述第二声纹模型和所述第一声纹模型的版本是否相同,若所述第一声纹模型和第二声纹模型的版本相同,则确定所述物联网设备兼容所述第二声纹模型。
  18. 根据权利要求1所述方法,其特征在于,所述确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型,包括:
    获取所述物联网设备和所述电子设备的厂家;
    确定所述物联网设备和所述电子设备的厂家是否相同,如果所述物联网设备和所述电子设备的厂家相同,则确定所述物联网设备兼容所述第二声纹模型。
  19. 一种声纹识别装置,其特征在于,应用于电子设备,所述装置包括:
    获取模块,用于获取与所述电子设备连接的物联网设备,并确定所述物联网设备是否包括第一声纹模型;
    确定模块,用于若确定所述物联网设备不包括所述第一声纹模型,则确定所述物联网设备是否兼容所述电子设备生成的第二声纹模型;
    发送模块,用于若所述物联网设备兼容所述第二声纹模型,则将所述第二声纹模型发送至所述物联网设备,所述物联网设备在接收到第二音频数据时利用所述第二声纹模型对所述第二音频数据进行声纹识别。
  20. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行如权利要求1-18任一项所述的方法。
  21. 一种计算机可读取存储介质,其特征在于,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-18任一项所述的方法。
  22. 一种计算机程序产品,其特征在于,包括计算机程序/指令,其特征在于,该计算机程序/指令被处理器执行时实现权利要求1-18任一项所述的方法。
PCT/CN2021/139745 2021-01-28 2021-12-20 声纹识别方法、装置、电子设备和可读存储介质 WO2022161025A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110118214.9 2021-01-28
CN202110118214.9A CN112820302B (zh) 2021-01-28 2021-01-28 声纹识别方法、装置、电子设备和可读存储介质

Publications (1)

Publication Number Publication Date
WO2022161025A1 true WO2022161025A1 (zh) 2022-08-04

Family

ID=75860211

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139745 WO2022161025A1 (zh) 2021-01-28 2021-12-20 声纹识别方法、装置、电子设备和可读存储介质

Country Status (2)

Country Link
CN (1) CN112820302B (zh)
WO (1) WO2022161025A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820302B (zh) * 2021-01-28 2024-04-12 Oppo广东移动通信有限公司 声纹识别方法、装置、电子设备和可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140330566A1 (en) * 2013-05-06 2014-11-06 Linkedin Corporation Providing social-graph content based on a voice print
CN110047492A (zh) * 2019-03-08 2019-07-23 佛山市云米电器科技有限公司 一种通过声纹识别进行组网的方法及系统
CN110166424A (zh) * 2019-04-03 2019-08-23 西安电子科技大学 面向物联网服务隐私保护声纹识别方法及系统、移动终端
CN111161745A (zh) * 2019-12-26 2020-05-15 珠海格力电器股份有限公司 一种智能设备的唤醒方法、装置、设备及介质
CN111462760A (zh) * 2019-01-21 2020-07-28 阿里巴巴集团控股有限公司 声纹识别系统、方法、装置及电子设备
CN112820302A (zh) * 2021-01-28 2021-05-18 Oppo广东移动通信有限公司 声纹识别方法、装置、电子设备和可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545889B (zh) * 2016-06-23 2020-10-23 华为终端有限公司 适用于模式识别的模型的优化方法、装置及终端设备
CN107580237A (zh) * 2017-09-05 2018-01-12 深圳Tcl新技术有限公司 电视的操作方法、装置、系统和存储介质
CN110858479B (zh) * 2018-08-08 2022-04-22 Oppo广东移动通信有限公司 语音识别模型更新方法、装置、存储介质及电子设备
CN109683938B (zh) * 2018-12-26 2022-08-02 思必驰科技股份有限公司 用于移动终端的声纹模型升级方法和装置
CN111081258B (zh) * 2019-11-07 2022-12-06 厦门快商通科技股份有限公司 一种声纹模型管理方法、系统、存储介质及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140330566A1 (en) * 2013-05-06 2014-11-06 Linkedin Corporation Providing social-graph content based on a voice print
CN111462760A (zh) * 2019-01-21 2020-07-28 阿里巴巴集团控股有限公司 声纹识别系统、方法、装置及电子设备
CN110047492A (zh) * 2019-03-08 2019-07-23 佛山市云米电器科技有限公司 一种通过声纹识别进行组网的方法及系统
CN110166424A (zh) * 2019-04-03 2019-08-23 西安电子科技大学 面向物联网服务隐私保护声纹识别方法及系统、移动终端
CN111161745A (zh) * 2019-12-26 2020-05-15 珠海格力电器股份有限公司 一种智能设备的唤醒方法、装置、设备及介质
CN112820302A (zh) * 2021-01-28 2021-05-18 Oppo广东移动通信有限公司 声纹识别方法、装置、电子设备和可读存储介质

Also Published As

Publication number Publication date
CN112820302A (zh) 2021-05-18
CN112820302B (zh) 2024-04-12

Similar Documents

Publication Publication Date Title
CN111260665B (zh) 图像分割模型训练方法和装置
JP2019535055A (ja) ジェスチャに基づく操作の実施
CN109993150B (zh) 用于识别年龄的方法和装置
CN112990390B (zh) 一种图像识别模型的训练方法、图像识别的方法及装置
US11631413B2 (en) Electronic apparatus and controlling method thereof
CN110544287B (zh) 一种配图处理方法及电子设备
CN110781881A (zh) 一种视频中的赛事比分识别方法、装置、设备及存储介质
CN113190646B (zh) 一种用户名样本的标注方法、装置、电子设备及存储介质
CN109951889B (zh) 一种物联网配网方法及移动终端
US20150153827A1 (en) Controlling connection of input device to electronic devices
CN108446339B (zh) 一种应用图标的分类方法及移动终端
CN108549681B (zh) 数据处理方法和装置、电子设备、计算机可读存储介质
WO2022161025A1 (zh) 声纹识别方法、装置、电子设备和可读存储介质
CN110932964A (zh) 一种信息的处理方法及装置
CN103905837A (zh) 图像处理方法、装置及终端
CN112349287A (zh) 显示设备及其控制方法、从设备及计算机可读存储介质
CN109413663B (zh) 一种信息处理方法和设备
CN110674294A (zh) 一种相似度确定方法及电子设备
CN113806712A (zh) 验证处理方法、处理装置及计算机可读存储介质
CN114722234B (zh) 基于人工智能的音乐推荐方法、装置、存储介质
CN111027406A (zh) 图片识别方法、装置、存储介质及电子设备
CN110909190B (zh) 数据搜索方法、装置、电子设备及存储介质
US20220357824A1 (en) Machine learning assisted contactless control of a physical device through a user device
CN113806532B (zh) 比喻句式判断模型的训练方法、装置、介质及设备
CN112397057B (zh) 基于生成对抗网络的语音处理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922615

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922615

Country of ref document: EP

Kind code of ref document: A1