CN117524200A - Wakeup model iteration method and device, storage medium and electronic device - Google Patents

Wakeup model iteration method and device, storage medium and electronic device Download PDF

Info

Publication number
CN117524200A
CN117524200A CN202210906691.6A CN202210906691A CN117524200A CN 117524200 A CN117524200 A CN 117524200A CN 202210906691 A CN202210906691 A CN 202210906691A CN 117524200 A CN117524200 A CN 117524200A
Authority
CN
China
Prior art keywords
wake
model
data set
parameters
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210906691.6A
Other languages
Chinese (zh)
Inventor
葛路奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202210906691.6A priority Critical patent/CN117524200A/en
Publication of CN117524200A publication Critical patent/CN117524200A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Selective Calling Equipment (AREA)

Abstract

The application discloses a wake-up model iteration method, a wake-up model iteration device, a storage medium and an electronic device, and relates to the technical field of intelligent home/intelligent families, wherein the method comprises the following steps: determining a first wake-up model; the first wake-up model is the same as a second wake-up model in the target device; performing optimization training on the first wake-up model based on the acquired user wake-up data set to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is performed on wake-up voice data identified by a second wake-up model in the target equipment based on wake-up keywords; and sending the optimization parameters to the target equipment so that the target equipment performs optimization iteration on the second wake-up model based on the optimization parameters. The method and the device provided by the application enable the wake-up model after parameter optimization iteration to have higher recognition accuracy, can be closer to the use habit of the user, and improve the use experience of the user.

Description

Wakeup model iteration method and device, storage medium and electronic device
Technical Field
The application relates to the technical field of smart home/smart home, in particular to a wake-up model iteration method, a device, a storage medium and an electronic device.
Background
With the rise of intelligent technology, various household appliances are always developed towards intelligence. For example, the user can control the home appliance by means of voice wake-up.
When the existing voice wake-up model is applied to the household appliances, the existing voice wake-up model is limited by hardware resource limitation on the household appliances, the model structure is simple, performance is poor, the requirements of different users under the condition of no use of scenes can not be met, and the use experience of the users is poor.
Disclosure of Invention
The application provides a wake-up model iteration method, a wake-up model iteration device, a storage medium and an electronic device, which are used for solving the technical problems that the performance of the existing voice wake-up model is poor, the requirements of different users under the condition of no use of scenes can not be met, and the use experience of the users is poor.
The application provides a wake-up model iteration method, which comprises the following steps:
determining a first wake-up model; the first wake-up model is the same as a second wake-up model in the target device;
performing optimization training on the first wake-up model based on the acquired user wake-up data set to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is performed on wake-up voice data identified by a second wake-up model in the target equipment based on wake-up keywords;
And sending the optimization parameters to the target equipment so that the target equipment performs optimization iteration on the second wake-up model based on the optimization parameters.
According to the wake-up model iteration method provided by the application, the user wake-up data set is determined based on the following steps:
receiving wake-up voice data identified by a second wake-up model sent by target equipment;
inputting the awakening voice data into an awakening voice verification model, and determining an accuracy verification result corresponding to the awakening voice data based on awakening keywords by the awakening voice verification model;
and determining the user awakening data set based on the awakening voice data and an accuracy check result corresponding to the awakening voice data.
According to the wake-up model iteration method provided by the application, the number of the neural network layers of the wake-up voice verification model is larger than that of the neural network layers of the second wake-up model, and/or the number of the neurons of the neural network layers in the wake-up voice verification model is larger than that of the neurons of the neural network layers in the second wake-up model.
According to the wake-up model iteration method provided by the application, the user wake-up data set is amplified based on the following steps:
Receiving network addresses sent by candidate devices; the candidate device is the same as the target device in device type;
determining a geographic location of each candidate device based on the network address of each candidate device;
determining wake-up language categories of the candidate devices based on the geographic locations of the candidate devices;
and adding the wake-up voice data sent by the candidate device with the same wake-up language class as the target device to the user wake-up data set.
According to the wake-up model iteration method provided by the application, the optimizing training is performed on the first wake-up model based on the acquired user wake-up data set to obtain the optimizing parameters of the first wake-up model, including:
determining a basic training data set and a basic test data set;
training the first wake model based on the base training data set and a training data set in the user wake data set;
based on the basic test data set and the test data set in the user wake-up data set, testing a first wake-up model before training and a first wake-up model after training respectively, and determining the recognition accuracy before training and the recognition accuracy after training of the first wake-up model;
And determining model parameters in the first wake-up model after training as optimization parameters of the first wake-up model under the condition that the difference between the recognition accuracy after training and the recognition accuracy before training is larger than or equal to a preset difference value.
According to the wake-up model iteration method provided by the application, the sending the optimization parameters to the target device includes:
sending a model iteration instruction containing the optimization parameters to the target equipment, and storing the current model parameters of the second wake-up model as historical model parameters;
receiving the identification success rate corresponding to the historical model parameters and the identification success rate corresponding to the optimization parameters sent by the target equipment; the recognition success rate is determined based on the number of voice data received by the target device and the number of recognized wake-up voice data;
and sending a model iteration instruction containing the historical model parameters to the target equipment under the condition that the identification success rate corresponding to the optimized parameters is lower than the identification success rate corresponding to the historical model parameters.
According to the wake-up model iteration method provided by the application, the determining the first wake-up model comprises the following steps:
Determining the number of wake-up voice data in the user wake-up data set;
sending a model parameter acquisition instruction to the target equipment under the condition that the number of the awakening voice data is larger than or equal to the preset number;
receiving current model parameters of the second wake-up model sent by the target equipment based on the model parameter acquisition instruction;
and constructing the first awakening model based on the current model parameters and the model structure of the second awakening model.
The application provides a wake-up model iteration device, which comprises:
a determining unit, configured to determine a first wake-up model; the first wake-up model is the same as a second wake-up model in the target device;
the training unit is used for carrying out optimization training on the first wake-up model based on the acquired user wake-up data set to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is performed on wake-up voice data identified by a second wake-up model in the target equipment based on wake-up keywords;
and the iteration unit is used for sending the optimization parameters to the target equipment so that the target equipment performs optimization iteration on the second wake-up model based on the optimization parameters.
The application provides a computer readable storage medium comprising a stored program, wherein the program executes the wake-up model iteration method.
The present application provides an electronic device comprising a memory, in which a computer program is stored, and a processor arranged to execute the wake-up model iteration method by means of the computer program.
The method, the device, the storage medium and the electronic device for the wake-up model iteration determine a first wake-up model; the first wake model is the same as the second wake model in the target device; training the first wake-up model according to the user wake-up data set to obtain optimization parameters of the first wake-up model; transmitting the optimization parameters to the target device so that the target device iterates the second wake-up model according to the optimization parameters; the model which is the same as the wake-up model in the target equipment is arranged in the remote server, and the parameters after optimization training are sent to the target equipment, so that parameter iteration of the wake-up model in the target equipment is realized, hardware resources of the target equipment are not required to be consumed, and the user wake-up data set is determined after accuracy verification and has higher reliability and contains the voice characteristics of the user corresponding to the target equipment.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions of the present application or the prior art, the following description will briefly introduce the drawings used in the embodiments or the description of the prior art, and it is obvious that, in the following description, the drawings are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a wake-up model iteration method provided in the present application;
FIG. 2 is a second flow chart of the wake-up model iteration method provided in the present application;
FIG. 3 is a schematic structural diagram of a wake-up model iteration apparatus provided in the present application;
FIG. 4 is a schematic diagram of a hardware environment of the wake-up model iteration method provided by the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in the present application.
Reference numerals:
401: a terminal device; 402: and a server.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is one of the flow diagrams of the wake-up model iteration method provided in the present application, and as shown in fig. 1, the method includes step 110, step 120, and step 130.
Step 110, determining a first wake-up model; the first wake model is the same as the second wake model in the target device.
Specifically, the execution body of the wake-up model iteration method provided by the embodiment of the application is a wake-up model iteration device. The device can be embodied in a software form and operates on an intelligent household appliance remote monitoring server; or may be a hardware module in a remote monitoring server. The intelligent household appliance remote monitoring server is a cloud server.
The target equipment can be various household appliances used in a home scene, namely, the target equipment is provided with a network communication module, can be connected to the Internet or the Internet of things, can send the running state of the target equipment, and can receive and execute remote control instructions, and comprises an intelligent sound box, an intelligent air conditioner, an intelligent refrigerator, an intelligent television, an intelligent bathroom heater, an intelligent lamp, an intelligent switch and the like.
Voice wakeup, which is the detection of specific keywords in a continuous voice stream and the triggering of a device to a specific operating state by these keywords. For example, the user can send out "the air conditioner is started" through the pronunciation, then the intelligent air conditioner under the dormant state or the standby state detects user's pronunciation to when discerning that "the air conditioner is started" is obtained, trigger the inside compressor of intelligent air conditioner etc. and begin to operate, get into operating condition.
Voice wakeup is achieved through a wake model in the target device. The wake model may be implemented in accordance with a neural network model or the like. For example, the wake-up model may be a model obtained by training with a neural network model as an initial model and using wake-up voice and wake-up voice tags.
The first wake-up model is operated in the intelligent household appliance remote monitoring server, the second wake-up model is operated in the target equipment, and the two wake-up models are identical, namely, the model structure and the model parameters are identical.
The second wake-up model is operated in the target equipment, and the target equipment can communicate with the intelligent household appliance remote monitoring server to directly send the model parameters of the second wake-up model to the intelligent household appliance remote monitoring server. The intelligent household appliance remote monitoring server can construct a first awakening model according to the received model parameters and the model structure of the second awakening model.
And 120, training the first wake-up model based on the user wake-up data set to obtain the optimized parameters of the first wake-up model. The user wake-up data set is determined after accuracy verification is performed on wake-up voice data identified by the second wake-up model in the target device based on the wake-up keywords.
Specifically, the wake-up voice data is voice data obtained after voice recognition sent by the user through a second wake-up model in the target device, and is voice data determined by the second wake-up model to be capable of waking up the target device. Because of the limitation of hardware resources of the target device, the model structure of the second wake-up model is relatively simple, and model parameters are relatively few, so that misidentified wake-up voice data may exist in the wake-up voice data obtained through identification.
Therefore, the accuracy of the wake-up voice data identified by the second wake-up model in the target device can be checked. The accuracy checking method can be realized by setting a checking model with relatively complex model structure and/or relatively more model parameters in the intelligent household appliance remote monitoring server. The verification model is trained by employing a large number of sample wake-up speech data and sample tags for the sample wake-up speech data.
According to the awakening voice data which is sent by the target equipment and is subjected to accuracy verification, a user awakening data set can be constructed. Training the first wake-up model according to the user wake-up data set, and obtaining the optimized parameters of the first wake-up model.
The user awakening data set is determined after accuracy verification, so that the reliability is higher, and the recognition accuracy of the first awakening model can be improved; because the user wake-up data set is sent by the target device, the first wake-up model can learn the voice characteristics of the user corresponding to the target device, such as speaking habit and pronunciation habit of the user, and the use scene where the user is located. Therefore, the trained first wake-up model not only has higher recognition accuracy, but also can be close to the use habit of the user, and the technical effect of custom-made for the user is achieved.
And carrying out accuracy check on the wake-up voice data identified by the second wake-up model according to the wake-up keywords. The wake-up keywords may be set as desired. If the similarity between the voice recognition result of the awakening voice data and the awakening keyword is greater than a preset similarity threshold value, determining that the accuracy check result is accurate; if the similarity between the voice recognition result of the wake-up voice data and the wake-up keyword is smaller than a preset similarity threshold, the accuracy check result can be determined to be inaccurate.
And 130, transmitting the optimization parameters to the target device so that the target device performs optimization iteration on the second wake-up model based on the optimization parameters.
Specifically, the wake-up model iteration device may derive the optimization parameters of the first wake-up model and send the optimization parameters to the target device. And after receiving the optimized parameters, the target equipment performs parameter iteration on the second wake-up model, namely, the parameter updating on the second wake-up model is realized.
The wake-up model iteration method can be triggered and executed according to a time period, for example, a preset time period can be set to be triggered and executed, and the preset time period can be 1 month, 6 months or 12 months; the execution may also be triggered according to the number of wake-up voice data in the user wake-up data set, for example, the execution may be triggered when the number of wake-up voice data reaches a preset number, where the preset number may be 100, 500, 1000, or the like.
The wake-up model iteration method provided by the embodiment of the application determines a first wake-up model; the first wake model is the same as the second wake model in the target device; training the first wake-up model according to the user wake-up data set to obtain optimization parameters of the first wake-up model; transmitting the optimization parameters to the target device so that the target device iterates the second wake-up model according to the optimization parameters; the model which is the same as the wake-up model in the target equipment is arranged in the remote server, and the parameters after optimization training are sent to the target equipment, so that parameter iteration of the wake-up model in the target equipment is realized, hardware resources of the target equipment are not required to be consumed, and the user wake-up data set is determined after accuracy verification and has higher reliability and contains the voice characteristics of the user corresponding to the target equipment.
Based on the above embodiment, the user wake-up data set is determined based on the following steps:
receiving wake-up voice data identified by a second wake-up model sent by target equipment;
Inputting the awakening voice data into an awakening voice verification model, and determining an accuracy verification result corresponding to the awakening voice data based on awakening keywords by the awakening voice verification model;
and determining a user awakening data set based on the awakening voice data and an accuracy check result corresponding to the awakening voice data.
Specifically, the target device may send the wake-up voice data identified by the second wake-up model to the intelligent home appliance remote monitoring server.
The wake-up model iteration device in the intelligent household appliance remote monitoring server inputs wake-up voice data into a wake-up voice verification model, the wake-up voice verification model extracts characteristics of voice signals in the wake-up voice data, a voice recognition result corresponding to the wake-up voice data is obtained according to the extracted characteristic recognition, the voice recognition result is compared with wake-up keywords, and an accuracy verification result is determined.
If the similarity between the voice recognition result and the wake-up keyword is greater than a preset similarity threshold, determining that the accuracy check result is accurate; if the similarity between the voice recognition result and the wake-up keyword is smaller than a preset similarity threshold, the accuracy check result can be determined to be inaccurate.
According to the plurality of wake-up voice data and the accuracy check result corresponding to each wake-up voice data, a user wake-up data set can be established.
The accuracy check result is correct wake-up voice data, can be used as a positive sample in a user wake-up data set, the accuracy check result is incorrect wake-up voice data, can be used as a negative sample in the user wake-up data set, and the wake-up model is optimally trained through the positive and negative samples, so that the recognition accuracy of the wake-up model can be improved.
Based on any of the above embodiments, the number of neural network layers of the wake-up voice calibration model is greater than the number of neural network layers of the second wake-up model, and/or the number of neurons of the neural network layers in the wake-up voice calibration model is greater than the number of neurons of the neural network layers in the second wake-up model.
Specifically, the wake-up voice verification model may be deployed in a remote monitoring server of the smart home appliance. Since the hardware configuration of the server is much higher than the target device, the wake-up voice verification model can select a neural network with a complex structure.
The wake-up voice verification model has a greater number of neural network layers than the second wake-up model, or a greater number of neurons in each neural network layer. The more the number of layers of the neural network is, the more neurons are, the more complex the model structure is, the stronger the learning ability of the model to the features is, and accordingly, the higher the identification accuracy is.
For example, the wake-up voice verification model may include a feature extraction layer, a feature recognition layer, and a feature comparison layer, which are connected in sequence. The feature extraction layer is used for extracting voice features of the awakening voice data, wherein the voice features can comprise time domain features, frequency domain features and the like. The feature recognition layer is used for recognizing according to the extracted voice features to obtain recognition texts corresponding to the awakening voice data. The feature comparison layer is used for comparing the similarity between the identification text and the awakening keywords and outputting an accuracy check result according to the similarity comparison result. Wherein, each layer can adopt the network structure of full connected layer, and the neuron quantity that contains can set up as required.
Based on any of the above embodiments, the user wake-up data set is augmented based on the following steps:
receiving network addresses sent by candidate devices; the candidate device and the target device are the same in device type;
determining a geographic location of each candidate device based on the network address of each candidate device;
determining wake-up language categories of the candidate devices based on the geographic locations of the candidate devices;
and adding the wake-up voice data sent by the candidate device with the same wake-up language type as the target device to a user wake-up data set.
Specifically, to improve the training effect of the user wake-up data set, the number of user wake-up data sets may be amplified.
A device of the same device type as the target device may be regarded as a candidate device. Since the device type of the candidate device is the same as the target device, the hardware configuration and the software configuration of the candidate device are similar or the same, and the candidate device has reference value.
Because each candidate device is connected with the intelligent household appliance remote monitoring server through the network, the intelligent household appliance remote monitoring server obtains the network address (such as the IP address) of each candidate device, and the geographic position of each candidate device is determined according to the attribution of the network address. And determining the wake-up language class of each candidate device according to the geographic position of each candidate device. For example, for candidate devices belonging to the same country, the wake-up voice types thereof may be the same language, such as english, german, japanese, etc.; for candidate devices belonging to the same administrative area, the wake-up language type of the candidate devices can be the same dialect, such as Sichuan, guangdong and Fujian.
The wake-up speech data sent by the candidate device of the same wake-up language class as the target device may have the same speech characteristics and thus may be used to augment the user wake-up data set. For example, speech features in the same language or in the same language family are more similar for the same wake-up keyword.
The amplified user awakening data set is adopted to train the awakening model, so that the robustness of the awakening model can be improved.
Based on any of the above embodiments, step 120 includes:
determining a basic training data set and a basic test data set;
training the first wake model based on the base training data set and the training data set in the user wake data set;
based on a basic test data set and a test data set in a user wake-up data set, testing a first wake-up model before training and a first wake-up model after training respectively, and determining the recognition accuracy before training and the recognition accuracy after training of the first wake-up model;
and when the difference between the recognition accuracy after training and the recognition accuracy before training is larger than or equal to a preset difference, determining the model parameters in the first wake-up model after training as the optimized parameters of the first wake-up model.
Specifically, the user wake-up data set may be divided into a training data set and a test data set according to a certain allocation ratio. The distribution ratio can be determined as needed.
The basic training data set and the basic test data set are each a speech sample set established from standard speech.
If training and testing are performed by only using the user wake-up data set, because the user wake-up data set is user data collected by the target device, more than one user corresponding to the target device may be available, the pronunciation difference between the users may be large, and the wake-up model may be over-fitted by using the user data. Therefore, the basic training data set and the training data set in the user wake-up data set can be combined, and the first wake-up model can be optimally trained.
After training, combining the basic test data set and the test data set in the user wake-up data set, testing the trained first wake-up model, and determining the trained recognition accuracy of the first wake-up model. As a comparison, the first wake-up model before training can be tested, and the recognition accuracy of the first wake-up model before training can be determined.
A preset difference value may be set for measuring a difference between the recognition accuracy after training and the recognition accuracy before training, i.e. a degree to which the recognition accuracy of the first wake-up model is improved.
If the difference between the recognition accuracy after training and the recognition accuracy before training is smaller than the preset difference, the recognition accuracy improvement degree of the optimized training is lower, and parameter iteration is not needed; if the difference between the recognition accuracy after training and the recognition accuracy before training is larger than or equal to the preset difference, which indicates that the recognition accuracy is improved to a higher degree by optimizing training, parameter iteration can be performed, and then the model parameters in the first wake-up model after training are determined as the optimized parameters of the first wake-up model.
Based on any of the above embodiments, the target device iterates the second wake-up model based on the steps of:
Receiving optimization parameters;
storing the current model parameters of the second wake-up model as historical model parameters, and iterating the optimization parameters into the current model parameters of the second wake-up model;
identifying voice data sent by a user based on the iterated second wake-up model, and determining the voice data as wake-up voice data under the condition that the identification is successful;
determining the number of received voice data and the number of recognized wake-up voice data, and determining the recognition success rate corresponding to the current model parameters based on the number of voice data and the number of wake-up voice data;
and under the condition that the recognition success rate corresponding to the current model parameter is lower than the recognition success rate corresponding to the historical model parameter, determining the historical model parameter as the current model parameter of the second wake-up model.
Specifically, after receiving the optimization parameters, the target device stores the current model parameters being used by the second wake-up model as historical model parameters, imports the optimization parameters, and iterates the second wake-up model to serve as the current model parameters.
And the iterated second wake-up model identifies the voice data sent by the user, and under the condition that the identification is successful, the voice data is determined to be wake-up voice data. Meanwhile, the target equipment also counts the received voice data and the number of wake-up voice data in a certain time range after iteration, and determines the recognition success rate corresponding to the current model parameters according to the number of voice data and the number of wake-up voice data. For example, the recognition success rate may be a ratio of the number of wake-up voice data to the number of voice data.
Comparing the recognition success rate corresponding to the current model parameter with the recognition success rate corresponding to the historical model parameter, and if the recognition success rate corresponding to the current model parameter is higher than the recognition success rate corresponding to the historical model parameter, indicating that better awakening service can be provided for the user after the second awakening model performs parameter iteration; if the recognition success rate corresponding to the current model parameters is lower than the recognition success rate corresponding to the historical model parameters, the fact that the second wake-up model performs parameter iteration is indicated, and poor wake-up service is provided for the user. In this case, the historical model parameters may be determined as current model parameters of the second wake model, i.e. a parameter rollback operation is implemented, bringing the second wake model back to the previous state.
By comparing the recognition success rate corresponding to the current model parameters with the recognition success rate corresponding to the historical model parameters, better use experience can be provided for the user.
Based on any of the above embodiments, step 130 includes:
sending a model iteration instruction containing optimization parameters to target equipment, and storing current model parameters of a second wake-up model as historical model parameters;
receiving the recognition success rate corresponding to the historical model parameters and the recognition success rate corresponding to the optimization parameters sent by the target equipment; the recognition success rate is determined based on the number of voice data received by the target device and the number of recognized wake-up voice data;
And sending a model iteration instruction containing the historical model parameters to the target equipment under the condition that the recognition success rate corresponding to the optimized parameters is lower than the recognition success rate corresponding to the historical model parameters.
In particular, the parameter rollback operation may also be triggered to be performed by the remote server.
The remote server sends model iteration instructions containing the optimization parameters to the target device and saves the current model parameters of the second wake model (parameters of the in-use but not optimized iterations) as historical model parameters.
Before the iterative optimization, the target device can determine the recognition success rate corresponding to the historical model parameters according to the received voice data quantity and the recognized wake-up voice data quantity, and send the recognition success rate to the remote server.
After receiving the optimization parameters, the target device can also determine the recognition success rate corresponding to the optimization parameters according to the received voice data quantity and the recognized wake-up voice data quantity, and send the recognition success rate to the remote server.
The remote server judges according to the recognition success rate before and after iteration, and determines that the optimization parameters provide poorer wake-up service for the user under the condition that the recognition success rate corresponding to the optimization parameters is lower than the recognition success rate corresponding to the historical model parameters, so that a model iteration instruction containing the historical model parameters can be sent to the target equipment, namely, the execution parameter rollback operation is triggered.
Based on any of the above embodiments, step 110 includes:
determining the quantity of wake-up voice data in a user wake-up data set;
sending a model parameter acquisition instruction to target equipment under the condition that the number of the awakening voice data is greater than or equal to the preset number;
receiving current model parameters of a second wake-up model sent by target equipment based on a model parameter acquisition instruction;
and constructing a first wake-up model based on the current model parameters and the model structure of the second wake-up model.
Specifically, the frequent iterative optimization consumes the hardware resources of the remote monitoring server and also consumes the hardware resources of the target device, and resource conflict is generated when other tasks are executed on the target device, so that the execution parameter iteration can be triggered according to the number of the wake-up voice data.
And under the condition that the number of the awakening voice data in the user awakening data set is larger than or equal to the preset number, the awakening model iteration device sends a model parameter acquisition instruction to the target equipment and receives the current model parameters of the second awakening model sent by the target equipment.
The wake-up model iteration device can construct a first wake-up model in the remote monitoring server according to the current model parameters and the model structure of the second wake-up model. The constructed first wake-up model is identical to the second wake-up model.
Based on any of the above embodiments, fig. 2 is a second flowchart of an iterative method of wake-up models provided in the present application, as shown in fig. 2, where the method includes:
step 210, performing voice interaction between a user and equipment, and generating wake-up data by a wake-up model, wherein the wake-up data is uploaded to a cloud end through a network;
step 220, after receiving the data, the cloud uses a more complex model to check the data, and judges whether the data is awakened or falsely awakened;
step 230, processing the checked data, storing the data in a database according to a required form (characteristic/audio), and dividing the data into a training set and a testing set according to a certain proportion;
step 240, judging whether the optimization condition is reached according to the wake-up data quantity;
step 250, when the optimization condition is reached, starting to utilize the basic training set and the user data training set to optimize the model, obtaining terminal model parameters, and performing iterative learning on the basis of the model;
260, testing the optimized model on a basic test set and a accumulated user data test set, comparing the optimized model with the original model, pushing the optimized model to the equipment end for iteration if the optimized model has an optimizing effect, and notifying a user of iteration;
step 270, the user feels difficult in the experience process, and can fall back to one of the previous two models stored on the device in the setting to experience a better one.
Based on any of the above embodiments, fig. 3 is a schematic structural diagram of a wake-up model iteration apparatus provided in the present application, as shown in fig. 3, where the apparatus includes:
a determining unit 310, configured to determine a first wake-up model; the first wake model is the same as the second wake model in the target device;
the training unit 320 is configured to perform optimization training on the first wake-up model based on the obtained user wake-up data set, so as to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is carried out on wake-up voice data identified by the second wake-up model in the target equipment based on the wake-up keywords;
and an iteration unit 330, configured to send the optimization parameter to the target device, so that the target device performs optimization iteration on the second wake-up model based on the optimization parameter.
The wake-up model iteration device provided by the embodiment of the application determines a first wake-up model; the first wake model is the same as the second wake model in the target device; training the first wake-up model according to the user wake-up data set to obtain optimization parameters of the first wake-up model; transmitting the optimization parameters to the target device so that the target device iterates the second wake-up model according to the optimization parameters; the model which is the same as the wake-up model in the target equipment is arranged in the remote server, and the parameters after optimization training are sent to the target equipment, so that parameter iteration of the wake-up model in the target equipment is realized, hardware resources of the target equipment are not required to be consumed, and the user wake-up data set is determined after accuracy verification and has higher reliability and contains the voice characteristics of the user corresponding to the target equipment.
Based on any of the above embodiments, the apparatus further comprises:
the user wake-up data set determining unit is used for receiving wake-up voice data identified by the second wake-up model sent by the target equipment;
inputting the awakening voice data into an awakening voice verification model, and determining an accuracy verification result corresponding to the awakening voice data based on awakening keywords by the awakening voice verification model;
and determining a user awakening data set based on the awakening voice data and an accuracy check result corresponding to the awakening voice data.
Based on any of the above embodiments, the number of neural network layers of the wake-up voice calibration model is greater than the number of neural network layers of the second wake-up model, and/or the number of neurons of the neural network layers in the wake-up voice calibration model is greater than the number of neurons of the neural network layers in the second wake-up model.
Based on any of the above embodiments, the apparatus further comprises:
the user awakening data set amplifying unit is used for receiving the network address sent by each candidate device; the candidate device and the target device are the same in device type;
determining a geographic location of each candidate device based on the network address of each candidate device;
determining wake-up language categories of the candidate devices based on the geographic locations of the candidate devices;
And adding the wake-up voice data sent by the candidate device with the same wake-up language type as the target device to a user wake-up data set.
Based on any of the above embodiments, the training unit is specifically configured to:
determining a basic training data set and a basic test data set;
training the first wake model based on the base training data set and the training data set in the user wake data set;
based on a basic test data set and a test data set in a user wake-up data set, testing a first wake-up model before training and a first wake-up model after training respectively, and determining the recognition accuracy before training and the recognition accuracy after training of the first wake-up model;
and under the condition that the difference between the recognition accuracy after training and the recognition accuracy before training is larger than or equal to a preset difference value, determining the model parameters in the first wake-up model after training as the optimized parameters of the first wake-up model.
Based on any of the above embodiments, the iteration unit is configured to:
sending a model iteration instruction containing optimization parameters to target equipment, and storing current model parameters of a second wake-up model as historical model parameters;
receiving the recognition success rate corresponding to the historical model parameters and the recognition success rate corresponding to the optimization parameters sent by the target equipment; the recognition success rate is determined based on the number of voice data received by the target device and the number of recognized wake-up voice data;
And sending a model iteration instruction containing the historical model parameters to the target equipment under the condition that the recognition success rate corresponding to the optimized parameters is lower than the recognition success rate corresponding to the historical model parameters.
Based on any of the above embodiments, the determining unit is configured to:
determining the quantity of wake-up voice data in a user wake-up data set;
sending a model parameter acquisition instruction to target equipment under the condition that the number of the awakening voice data is greater than or equal to the preset number;
receiving current model parameters of a second wake-up model sent by target equipment based on a model parameter acquisition instruction;
and constructing a first wake-up model based on the current model parameters and the model structure of the second wake-up model.
Based on any one of the above embodiments, the present application further provides a wake-up model iteration method. The wake-up model iteration method is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (Intelligence House) ecology and the like. In this embodiment, fig. 4 is a schematic diagram of a hardware environment of the wake-up model iteration method provided in the present application, where the wake-up model iteration method may be applied to a hardware environment formed by a terminal device 401 and a server 402 as shown in fig. 4. The server 402 is connected to the terminal device 401 through a network, and may be used to provide services (such as application services, etc.) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 402, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data operation services for the server 402.
The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 401 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock and the like.
Based on any one of the foregoing embodiments, fig. 5 is a schematic structural diagram of an electronic device provided in the present application, and as shown in fig. 5, the electronic device may include: processor (Processor) 510, communication interface (Communications Interface) 520, memory (Memory) 530, and communication bus (Communications Bus) 540, wherein Processor 510, communication interface 520, memory 530 complete communication with each other via communication bus 540. Processor 510 may invoke logic commands in memory 530 to perform the following method:
Determining a first wake-up model; the first wake model is the same as the second wake model in the target device; performing optimization training on the first wake-up model based on the acquired user wake-up data set to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is carried out on wake-up voice data identified by the second wake-up model in the target equipment based on the wake-up keywords; and sending the optimization parameters to the target device so that the target device performs optimization iteration on the second wake-up model based on the optimization parameters. In addition, the logic commands in the memory 530 may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a separate product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The processor in the electronic device provided by the embodiment of the present application may call the logic instruction in the memory to implement the above method, and the specific implementation manner of the processor is consistent with the implementation manner of the foregoing method, and may achieve the same beneficial effects, which are not described herein again.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods provided by the above embodiments.
The specific embodiment is consistent with the foregoing method embodiment, and the same beneficial effects can be achieved, and will not be described herein.
Embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A wake-up model iteration method, comprising:
determining a first wake-up model; the first wake-up model is the same as a second wake-up model in the target device;
performing optimization training on the first wake-up model based on the acquired user wake-up data set to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is performed on wake-up voice data identified by a second wake-up model in the target equipment based on wake-up keywords;
and sending the optimization parameters to the target equipment so that the target equipment performs optimization iteration on the second wake-up model based on the optimization parameters.
2. The wake model iterative method of claim 1, wherein the user wake data set is determined based on the steps of:
receiving wake-up voice data identified by a second wake-up model sent by target equipment;
inputting the awakening voice data into an awakening voice verification model, and determining an accuracy verification result corresponding to the awakening voice data based on awakening keywords by the awakening voice verification model;
and determining the user awakening data set based on the awakening voice data and an accuracy check result corresponding to the awakening voice data.
3. The wake model iteration method of claim 2, wherein a number of neural network layers of the wake speech verification model is greater than a number of neural network layers of the second wake model, and/or a number of neurons of a neural network layer in the wake speech verification model is greater than a number of neurons of a neural network layer in the second wake model.
4. The wake model iterative method of claim 2, wherein the user wake data set is augmented based on the steps of:
receiving network addresses sent by candidate devices; the candidate device is the same as the target device in device type;
determining a geographic location of each candidate device based on the network address of each candidate device;
determining wake-up language categories of the candidate devices based on the geographic locations of the candidate devices;
and adding the wake-up voice data sent by the candidate device with the same wake-up language class as the target device to the user wake-up data set.
5. The wake model iteration method of claim 1, wherein the performing optimization training on the first wake model based on the obtained user wake data set to obtain the optimization parameters of the first wake model includes:
Determining a basic training data set and a basic test data set;
training the first wake model based on the base training data set and a training data set in the user wake data set;
based on the basic test data set and the test data set in the user wake-up data set, testing a first wake-up model before training and a first wake-up model after training respectively, and determining the recognition accuracy before training and the recognition accuracy after training of the first wake-up model;
and determining model parameters in the first wake-up model after training as optimization parameters of the first wake-up model under the condition that the difference between the recognition accuracy after training and the recognition accuracy before training is larger than or equal to a preset difference value.
6. The wake model iterative method of claim 1, wherein the sending the optimization parameters to the target device comprises:
sending a model iteration instruction containing the optimization parameters to the target equipment, and storing the current model parameters of the second wake-up model as historical model parameters;
receiving the identification success rate corresponding to the historical model parameters and the identification success rate corresponding to the optimization parameters sent by the target equipment; the recognition success rate is determined based on the number of voice data received by the target device and the number of recognized wake-up voice data;
And sending a model iteration instruction containing the historical model parameters to the target equipment under the condition that the identification success rate corresponding to the optimized parameters is lower than the identification success rate corresponding to the historical model parameters.
7. The wake model iterative method of any of claims 1 to 6, wherein the determining a first wake model comprises:
determining the number of wake-up voice data in the user wake-up data set;
sending a model parameter acquisition instruction to the target equipment under the condition that the number of the awakening voice data is larger than or equal to the preset number;
receiving current model parameters of the second wake-up model sent by the target equipment based on the model parameter acquisition instruction;
and constructing the first awakening model based on the current model parameters and the model structure of the second awakening model.
8. A wake-up model iteration device, comprising:
a determining unit, configured to determine a first wake-up model; the first wake-up model is the same as a second wake-up model in the target device;
the training unit is used for carrying out optimization training on the first wake-up model based on the acquired user wake-up data set to obtain optimization parameters of the first wake-up model; the user wake-up data set is determined after accuracy verification is performed on wake-up voice data identified by a second wake-up model in the target equipment based on wake-up keywords;
And the iteration unit is used for sending the optimization parameters to the target equipment so that the target equipment performs optimization iteration on the second wake-up model based on the optimization parameters.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the wake model iteration method of any one of claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the wake-up model iteration method of any one of claims 1 to 7 by means of the computer program.
CN202210906691.6A 2022-07-29 2022-07-29 Wakeup model iteration method and device, storage medium and electronic device Pending CN117524200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210906691.6A CN117524200A (en) 2022-07-29 2022-07-29 Wakeup model iteration method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210906691.6A CN117524200A (en) 2022-07-29 2022-07-29 Wakeup model iteration method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN117524200A true CN117524200A (en) 2024-02-06

Family

ID=89765046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210906691.6A Pending CN117524200A (en) 2022-07-29 2022-07-29 Wakeup model iteration method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN117524200A (en)

Similar Documents

Publication Publication Date Title
AU2019351894B2 (en) System and methods of operation of a smart plug
CN114821236A (en) Smart home environment sensing method, system, storage medium and electronic device
CN117524200A (en) Wakeup model iteration method and device, storage medium and electronic device
CN114915514B (en) Method and device for processing intention, storage medium and electronic device
CN110970019A (en) Control method and device of intelligent home system
CN116072124A (en) User identity recognition method, storage medium and electronic device
CN116312624A (en) Voiceprint recognition-based test method, voiceprint recognition-based test system, storage medium and electronic device
CN211086175U (en) Bionic olfaction acquisition and recognition system
CN116389179A (en) Speech recognition method and device, storage medium and electronic device
CN115547331A (en) Voice processing method, processing device, storage medium and electronic device
CN117575619A (en) After-sales service time determining method, device and storage medium
CN117524231A (en) Voice person identification method, voice interaction method and device
CN116432658A (en) Voice data processing method and device, storage medium and electronic device
CN116524935A (en) Audio registration method and device, storage medium and electronic device
CN116306682A (en) Sentence recognition method and device, storage medium and electronic device
CN117010378A (en) Semantic conversion method and device, storage medium and electronic device
CN116483961A (en) Training method and device of dialogue model, storage medium and electronic equipment
CN116524922A (en) Distributed voice awakening method and device, storage medium and electronic device
CN117542355A (en) Distributed voice awakening method and device, storage medium and electronic device
CN115171699A (en) Wake-up parameter adjusting method and device, storage medium and electronic device
CN117914635A (en) Method and device for controlling smart home equipment to leave home based on vehicle terminal
CN116364079A (en) Equipment control method, device, storage medium and electronic device
CN117520292A (en) Rule matching method, storage medium and electronic device
CN117131860A (en) Address quality determining method and device based on knowledge graph and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination