CN113763920B - Air conditioner, voice generating method thereof, voice generating device and readable storage medium - Google Patents

Air conditioner, voice generating method thereof, voice generating device and readable storage medium Download PDF

Info

Publication number
CN113763920B
CN113763920B CN202010479791.6A CN202010479791A CN113763920B CN 113763920 B CN113763920 B CN 113763920B CN 202010479791 A CN202010479791 A CN 202010479791A CN 113763920 B CN113763920 B CN 113763920B
Authority
CN
China
Prior art keywords
voice
information
air conditioner
text
generation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010479791.6A
Other languages
Chinese (zh)
Other versions
CN113763920A (en
Inventor
钟鸿飞
罗彪
刘景春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midea Group Co Ltd
GD Midea Air Conditioning Equipment Co Ltd
Original Assignee
Midea Group Co Ltd
GD Midea Air Conditioning Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midea Group Co Ltd, GD Midea Air Conditioning Equipment Co Ltd filed Critical Midea Group Co Ltd
Priority to CN202010479791.6A priority Critical patent/CN113763920B/en
Publication of CN113763920A publication Critical patent/CN113763920A/en
Application granted granted Critical
Publication of CN113763920B publication Critical patent/CN113763920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/50Control or safety arrangements characterised by user interfaces or communication
    • F24F11/52Indication arrangements, e.g. displays
    • F24F11/526Indication arrangements, e.g. displays giving audible indications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a voice generating method of an air conditioner, which comprises the following steps: acquiring a text to be converted and a voice generation model corresponding to a target object; inputting the text to be converted into the voice generation model; and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object. The application also discloses a voice generating device, an air conditioner and a readable storage medium. The application aims to realize that the air conditioner can output the voice with the pronunciation characteristics of the user, so that the voice output by the air conditioner meets the personalized requirements of the user.

Description

Air conditioner, voice generating method thereof, voice generating device and readable storage medium
Technical Field
The present application relates to the field of air conditioning technologies, and in particular, to a voice generating method, a voice generating device, an air conditioner, and a readable storage medium.
Background
With the continuous development of technology and people's demands, the functions of air conditioners are more and more diversified. Besides responding to the voice control instruction sent by the user, some air conditioners are also provided with a voice broadcasting function, for example, the running state of the air conditioner is played through voice.
However, the voice played by the current air conditioner is generally preconfigured before the air conditioner leaves the factory, and the tone and the content are fixed, so that the voice output by the air conditioner may not match with the user requirement, and the user experience is seriously affected.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The application mainly aims to provide a voice generation method, which aims to realize that an air conditioner can output voice with the pronunciation characteristics of a user so that the voice output by the air conditioner meets the personalized requirements of the user.
In order to achieve the above object, the present application provides a voice generating method of an air conditioner, the voice generating method comprising the steps of:
acquiring a text to be converted and a voice generation model corresponding to a target object;
inputting the text to be converted into the voice generation model;
and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object.
Optionally, the step of using the output result of the speech generation model as target speech information includes:
acquiring characteristic information of an application scene and/or an action object corresponding to the target voice information;
and extracting data corresponding to the characteristic information from the output result as the target voice information.
Optionally, before the step of inputting the text to be converted into the speech generation model, the method further includes:
executing the labeling operation of the pronunciation characteristics of the text to be converted to obtain the text to be converted with the pronunciation characteristics labeling;
the step of inputting the text to be converted into the speech generation model comprises the following steps:
and inputting the text to be converted with the pronunciation characteristic label into the voice generation model.
Optionally, before the step of performing the labeling operation on the pronunciation characteristics of the text to be converted, the method further includes:
executing word segmentation operation on the text to be converted to obtain word segmentation results;
the step of executing the labeling operation of the pronunciation characteristics of the text to be converted comprises the following steps:
and executing the labeling operation of the pronunciation characteristics of the word segmentation result.
Optionally, after the step of using the output result of the speech generation model as the target speech information, the method further includes:
outputting the target voice information;
when a voice correction instruction is received, acquiring first voice data of a target object corresponding to the voice correction instruction;
determining second voice data corresponding to the voice correction instruction in the target voice information;
according to the data deviation between the first voice data and the second voice data, adjusting the voice generation model;
and returning to the step of inputting the text to be converted into the voice generation model.
Optionally, before the step of obtaining the text to be converted and the speech generation model corresponding to the target object, the method further includes:
acquiring third voice data of the target object, and acquiring an initial voice generation model;
extracting pronunciation characteristic information of the target object in the third voice data; the pronunciation characteristic information includes at least one of sound energy, sound frequency, and sound intensity;
and embedding the pronunciation characteristic information into the initial voice generation model to obtain the voice generation model.
Optionally, the step of acquiring the third voice data of the target object includes:
outputting a setting text;
acquiring voice data of the target object input based on the set text as the third voice data; or alternatively, the first and second heat exchangers may be,
acquiring voice interaction data of an air conditioner and voiceprint information of the target object;
extracting voice data matched with the voiceprint information from the voice interaction data to serve as third voice data; and/or the number of the groups of groups,
after the step of taking the output result of the voice generation model as target voice information, returning to the step of executing the third voice data of the target object to obtain an initial voice generation model;
before the step of extracting the pronunciation characteristic information of the target object in the third voice data, the method further includes:
extracting quality characteristic information corresponding to the third voice data; the quality characteristic information comprises at least one of intensity information, content information and quantity information;
and when the quality characteristic information meets the set quality condition, executing the step of extracting the pronunciation characteristic information of the target object in the third voice data.
In order to achieve the above object, the present application also provides a speech generating apparatus including: a memory, a processor, and a speech generating program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the speech generating method of the air conditioner as set forth in any one of the above.
In addition, in order to achieve the above object, the present application also proposes an air conditioner including:
the above-described voice generating means for generating target voice information having a target object pronunciation feature;
and the voice playing module is used for outputting the target voice information.
In addition, in order to achieve the above object, the present application also proposes a readable storage medium having stored thereon a speech generating program which, when executed by a processor, implements the steps of the speech generating method according to any one of the above.
The application provides a voice generating method of an air conditioner, which is characterized in that a text to be converted is input into a voice generating model corresponding to a target object, target voice information with the pronunciation characteristics of the target object is obtained based on the output result of the voice generating model, the target voice information output by the air conditioner is not fixed any more, and the air conditioner can output voice with the pronunciation characteristics of a user based on the generated target voice information, so that the voice output by the air conditioner meets the personalized requirements of the user.
Drawings
FIG. 1 is a schematic diagram of a hardware architecture involved in the operation of an embodiment of a speech generating device according to the present application;
FIG. 2 is a flowchart of a speech generating method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a speech generating method according to another embodiment of the present application;
fig. 4 is a flowchart of a speech generating method according to another embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The main solutions of the embodiments of the present application are: acquiring a text to be converted and a voice generation model corresponding to a target object; inputting the text to be converted into the voice generation model; and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object.
In the prior art, the voice played by the air conditioner is generally preconfigured before the air conditioner leaves the factory, and the tone and the content are fixed, so that the voice output by the air conditioner may not match with the user requirement, and the user experience is seriously affected.
The application provides the solution, and aims to realize that the air conditioner can output the voice with the pronunciation characteristics of the user, so that the voice output by the air conditioner meets the personalized requirements of the user.
The embodiment of the application provides an air conditioner.
In this embodiment, the air conditioner is specifically an air conditioner with a voice playing function. The air conditioner comprises a voice playing module 01. The air conditioner may output the target voice information through the voice playing module 01.
Further, the embodiment of the application also provides a voice generating device for generating the target voice information with the pronunciation characteristics of the target object. The voice generating device is built in the air conditioner and can be arranged independently of the air conditioner. Wherein, when the voice generating device is arranged independently of the air conditioner, the voice generating device is in communication connection with the air conditioner, and the air conditioner can acquire the target voice information generated by the voice generating device from the voice generating device and play the target voice information.
In an embodiment of the present application, referring to fig. 1, a voice generating apparatus includes: a processor 1001 (e.g., a CPU), a memory 1002, a voice acquisition module 1003, and the like. The memory 1002 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1002 may alternatively be a storage device separate from the processor 1001 described above.
The processor 1001 is respectively connected with the memory 1002, the voice acquisition module 1003 and the audio playing device 01 in a communication way. Wherein, the voice acquisition module 1003 can be used for acquiring voice data of the user. The voice acquisition module 1003 may be provided in a mobile terminal (such as a mobile phone, a smart watch, etc.) connected to the air conditioner, or may be provided in any other device. The processor 1001 may be configured to control the audio playback apparatus 01 to perform voice playback.
It will be appreciated by those skilled in the art that the device structure shown in fig. 1 is not limiting of the device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components.
As shown in fig. 1, a speech generation program may be included in a memory 1002 as a readable storage medium. In the apparatus shown in fig. 1, a processor 1001 may be used to invoke a speech generating program stored in a memory 1002 and to perform the relevant step operations of the speech generating method in the following embodiments.
The embodiment of the application also provides a voice generating method of the air conditioner, which is used for generating target voice information required to be output by the air conditioner.
Referring to fig. 2, an embodiment of the speech generating method of the present application is presented. In this embodiment, the voice generating method includes:
step S10, acquiring a text to be converted and a voice generation model corresponding to a target object;
the text to be converted specifically refers to text content corresponding to the individual voice to be played by the air conditioner. The text to be converted can be text information (such as an air conditioner control instruction, prompt information related to the operation of the air conditioner and the like) stored in the air conditioner in advance, can be text data acquired from other terminals based on operation requirements in the operation process of the air conditioner, and can also be text data input by a user based on own requirements. Based on the text to be converted can be obtained by reading the data of the set position in the memory; the method can also request the terminal to obtain the text to be converted when the air conditioner operates or the set state information is monitored; and when receiving the setting instruction, acquiring text information corresponding to the instruction to obtain the text to be converted. For example, when a parent needs an air conditioner to read a story for a child, a text input instruction containing the story to be read can be input through an application mode of the mobile terminal, and when the text input instruction is detected, story text in the text input instruction information can be analyzed to serve as a text to be converted.
The speech generation model herein refers specifically to a data processing model that can convert text information into target speech information having target object pronunciation characteristics. The speech generation model is specifically a machine learning model (e.g., a deep learning model), and the speech data of the target object is learned based on a set algorithm, thereby training the resulting model. The voice generation model can be generated and stored in the air conditioner, can be stored in the cloud after being generated in the server, and can be downloaded into a memory of the air conditioner after being generated in the server. Based on the above, the speech generation model can be read from the stored data of the air conditioner, and the speech generation model can also be obtained by sending a request to the cloud.
The target object herein refers to a user who needs the air conditioner to output a voice having its pronunciation characteristics. Different target objects correspond to different speech generation models. Specifically, identity characteristic information (such as voiceprint information, face information, identity identification information input by a user and the like) of the target object can be obtained, and a corresponding voice generation model is obtained based on the nerve characteristic information, so that users with different identities can generate required personalized voices based on requirements of the users.
Step S20, inputting the text to be converted into the voice generation model;
the voice generation model converts the input text to be converted to obtain target voice information which contains the content corresponding to the text to be converted and has the pronunciation characteristics of the target object.
And step S30, taking the output result of the voice generation model as target voice information.
The speech generation model converts the text to be converted into speech data with the pronunciation characteristics of the target object as an output result. And taking part or all of the output result of the voice generation model as target voice information with target object pronunciation characteristics. The pronunciation characteristics here specifically include tone color, pronunciation style, and the like of the target object.
Wherein, when the result output by the speech generation model is one, the result can be directly used as the speech information target speech information; when the number of the results output by the voice generation model is more than one, one of the results can be selected as voice information target voice information based on the set rule; or, the current speech generation model outputs more than one result, and when the current speech generation model is applied to different scenes, each speech information target speech information is identified based on an application scene (such as a corresponding running mode of an air conditioner) corresponding to each speech information target speech information, so that when the air conditioner subsequently needs to output the speech information target speech information, the speech information target speech information matched with the current application scene is acquired based on the identification and is output.
According to the voice generation method of the air conditioner, the text to be converted is input into the voice generation model corresponding to the target object, the target voice information with the voice characteristic of the target object is obtained based on the output result of the voice generation model, the target voice information output by the air conditioner is not fixed any more, the voice with the voice characteristic of the user can be output by the air conditioner based on the generated target voice information, the personalized requirements of the user can be met by the voice output by the air conditioner, and even the air conditioner can replace the user to output the voice (e.g. telling stories for children by using the voice of parents), so that the user is liberated.
Specifically, in the present embodiment, step S30 includes:
step S31, obtaining the feature information of an application scene and/or an action object corresponding to the target voice information;
the application scenario specifically refers to various aspects of the air conditioner, such as time, equipment, social interaction, user emotion and the like, in a space where the air conditioner is located when the air conditioner outputs target voice information, and information about the state of at least one aspect can be obtained as feature information of the application scenario. The acting object specifically refers to a user, an animal, etc. receiving the target voice information when the air conditioner outputs the target voice information, and information such as identity, age, sex, type, etc. of the acting object can be obtained as characteristic information of the acting object.
The application scene and/or the feature information corresponding to the action can be obtained by acquiring parameters input by a user, or can be determined based on analysis results after analyzing the content in the text to be converted. In addition, the air conditioner operation mode related to the text to be converted can be obtained to serve as the characteristic information of the application scene. For example, when the text to be converted contains content related to emotion soothing, the feature information of the application scene can be determined to be emotion soothing, and the feature information of the acting object can be determined to be an infant; when the text to be converted contains story-related content, the feature information of the application scene can be determined to be sleep, and the feature information of the acting object can be determined to be children (such as users over 1 year and under 7 years); when the text to be converted contains the name of the pet, the feature information of the application scene can be determined to be pet supervision, the feature information of the acting object can be determined to be the pet, and the like. For another example, when the text to be converted is associated with the sleep mode of the air conditioner, the target voice information corresponding to the text to be converted can be considered to be output in the sleep mode of the air conditioner, and the sleep mode can be used as the characteristic information of the application scene; when the text to be converted is associated with the non-sleep mode of the air conditioner, the target voice information corresponding to the text to be converted can be considered to be output in the non-sleep mode of the air conditioner, and the non-sleep mode can be used as the characteristic information of the application scene.
And step S32, extracting data corresponding to the characteristic information from the output result as the target voice information.
The output result of the speech generating model may include speech data of different application scenes and/or different action objects, where the speech data of different application scenes and/or action objects have the same content (content corresponding to the text to be converted) and have pronunciation characteristics of the target object, but different emotion characteristics such as mood, speed of speech, emotion, and the like. For example, when the acting object is an infant or the application scene is a pacifying or sleeping mode, the pronunciation characteristics corresponding to the voice data can be softer; when the acting object is an adult or the application scene is in a non-sleep mode, the pronunciation characteristics corresponding to the voice data can be brighter; and the acting object is a child, or when the application scene is a pre-sleep story, the pronunciation characteristics corresponding to the voice data can have richer emotion; and the acting object is an adult, or when the application scene is a state prompt, the pronunciation characteristics corresponding to the voice data can have a lighter emotion.
Based on the method, the data matched with the characteristic information is selected from the output result as a final result, so that the target voice information is obtained, the target voice information output by the air conditioner can be matched with an application scene and/or an action object, the matching degree of the output voice of the air conditioner and the user requirement is further improved, the interaction effect of voice output is enhanced, and the user experience is improved.
Further, after step S30, it may include:
step S40, outputting the target voice information;
specifically, the air conditioner may output the target voice information when the air conditioner monitors the setting scene information or receives a specific instruction output by the user.
Step S50, when a voice correction instruction is received, acquiring first voice data of a target object corresponding to the voice correction instruction;
after the user listens to the sound corresponding to the target voice information, if the voice output by the air conditioner is found to be inaccurate, a voice correction instruction can be sent out through the air conditioner or a mobile terminal connected with the air conditioner. And starting timing when the air conditioner or the mobile terminal receives the voice correction instruction, and taking the voice data as first voice data when the air conditioner detects the voice data of the target object within a set time length. And the voice data which is sent by the target object of the specific instruction of the first voice data and is collected by the voice collection module and is used for correcting the target voice information.
Step S60, determining second voice data corresponding to the voice correction instruction in the target voice information;
the voice correction instruction may specifically include an identification of voice data to be corrected (for example, content of the voice to be corrected, a position of the voice to be corrected in the target voice information, etc.), based on which, by parsing the voice correction instruction, the corresponding voice data in the target voice information may be determined as the second voice data.
Step S70, adjusting the voice generation model according to the data deviation between the first voice data and the second voice data.
And extracting pronunciation characteristic information in the first voice data and the second voice data, and performing difference analysis to obtain data deviation. Model parameters in the speech generation model are modified based on the data bias. The larger the data deviation, the larger the correction amplitude of the model parameters in the speech generation model.
After step S70, the process may return to step S20 to regenerate the target voice information. In addition, the speech generation model may also be stored for later recall after step S70.
In this embodiment, after the target voice information is output, the first voice data is obtained and the second voice data is determined based on the voice correction instruction, and the voice generation model is corrected based on the deviation between the two voice data, so that the result output by the corrected voice generation model is closer to the pronunciation characteristics of the target object, thereby ensuring that the voice generation model is more accurate, improving the accuracy of the air conditioner when outputting the voice with the pronunciation characteristics of the user, and further meeting the user requirements.
Further, based on the above embodiment, another embodiment of the air conditioner-based voice generating method of the present application is proposed. In this embodiment, referring to fig. 3, before step S20, the method further includes:
step S21, performing word segmentation operation on the text to be converted to obtain a word segmentation result;
specifically, word segmentation operation includes identifying word groups in a text to be converted and part of speech of each word group, and marking the part of speech of each word group. Taking the text to be converted with the part of speech label as a word segmentation result;
step S22, performing the labeling operation of the pronunciation characteristics of the word segmentation result to obtain a text to be converted with the pronunciation characteristics labeling;
the pronunciation characteristics specifically refer to characteristic information corresponding to standard pronunciation of a text in a specific language. For example, the pronunciation features may include pinyin, syllables, phonemes, and the like. And marking the pronunciation characteristics of the text to be converted after the word segmentation operation is performed, and obtaining the text to be converted with the pronunciation characteristic marking.
S23, inputting a text to be converted with pronunciation characteristic labels into the voice generation model;
inputting the text to be converted with the pronunciation characteristic label into a voice generation model.
In this embodiment, before inputting the text to be converted into the speech generation model, word segmentation operation and pronunciation feature labeling operation are performed first, and then the speech generation model is input, so that compared with the case that the text to be converted is directly input into the speech generation model without processing, the obtained output result is more accurate, the sound of the air conditioner when outputting the target speech information is more similar to the pronunciation of the target object when reading the text to be converted, and the air conditioner further meets the personalized requirements of the user.
It should be noted that, in other embodiments, the text to be converted may also be input into the speech generation model after only performing word segmentation operation or annotation operation of pronunciation features according to requirements.
Further, based on any one of the above embodiments, a further embodiment of the speech generating method of the present application is provided. In this embodiment, referring to fig. 4, before step S10, the method further includes:
step S01, an initial voice generation model is obtained, and third voice data of the target object is obtained;
the initial speech generation model here is specifically a model that can convert text into speech.
The initial speech generation model may be a speech generation model in which the output speech result does not have the pronunciation characteristics of the target object. Based on the method, training voice data of a large number of other users of non-target objects about the specific text can be collected, voice feature extraction is carried out on the training voice data, feature labeling such as part of speech and pronunciation is carried out on the specific text, the labeled specific text and the extracted voice features are used as training samples of the neural network, and an initial voice generation model is obtained through training.
The initial speech generation model may be a speech generation model in which the output speech result has the pronunciation characteristics of the target object. Here, the steps S01 to S30 may be performed in a loop, and at this time, the initial speech generation model may be specifically the speech generation model obtained by performing the steps S01 to S03 in the previous loop. In the case of performing step S01 for the first time, the initial speech generation model may be specifically a speech generation model whose output result does not have the pronunciation characteristics of the target object.
The third voice data can be acquired at present, or can be obtained by analyzing the voice data recorded before. Specifically, the step of obtaining the third voice data of the target object includes: outputting a setting text; and acquiring voice data of the target object input based on the set text as the third voice data. Alternatively, the step of acquiring the third voice data of the target object may further include: acquiring voice interaction data of an air conditioner and voiceprint information of the target object; and extracting voice data matched with the voiceprint information from the voice interaction data as the third voice data. Specifically, the third voice data may be obtained based on the output of the set text when the voice generation model of the target object is first generated; and acquiring third voice data based on the voice interaction data of the air conditioner when the voice generation model of the target object is not generated for the first time. Specifically, the target object may record voiceprint and identity information in advance by means of terminal application, etc., and associate the voiceprint information with the identity information. In the process of voice interaction between a user and the air conditioner, the air conditioner can record a voice file generated in the interaction process, call the recorded voiceprint to identify the voice file, and print identity information associated with the voiceprint based on voice data corresponding to different recorded voiceprints. And taking the data matched with the target voiceprint as third voice data in the voice file.
Wherein, in order to ensure the accuracy of the subsequently extracted pronunciation characteristic information, extracting the quality characteristic information corresponding to the third voice data; the quality characteristic information comprises at least one of intensity information, content information and quantity information; step S02 is executed when the quality characteristic information meets the set quality condition; and returning to the execution step S01 when the quality characteristic information does not meet the set quality condition. In particular, the intensity information may specifically include the sound intensity of the background noise, the sound intensity of the human voice, and/or the sound intensity of the human voice matching the target object voiceprint, and the like. The content information specifically includes semantics characterized by the speech data. The number information may specifically include the number of third voice data or the number of acquisitions, etc. For example, when the sound intensity of the background noise is smaller than or equal to a first intensity threshold value, and/or the sound intensity of the voice of the target object matching voice is larger than or equal to a second threshold value, and/or the semantic represented by the voice data matches with the output set text, and/or the number of the collected third voice data is larger than or equal to the set threshold value, it is determined that the quality feature information of the third voice data meets the set quality condition.
Wherein, after step S30, the process returns to step S01, and the loop can be implemented to optimize the speech generation model based on the speech data of the target object. Specifically, in the process of using the air conditioner, the voice data matched with the voiceprint information can be extracted from the voice interaction data continuously, the voice data is used as the third voice data, the quality characteristic information corresponding to the third voice data is extracted, whether the quality characteristic information meets the set quality condition is judged, the step of acquiring the initial voice generation model and the subsequent steps S02, S10, S20, S30 and the like are executed after the quality characteristic information is met, for example, when the number of the third voice data is greater than or equal to the set number, the third voice data is considered to meet the set quality condition, and the voice data meeting the set quality condition is optimized for training the voice generation model of the target object. By the mode, the air conditioner can be automatically and iteratively optimized in the using process, so that the target voice information output by the air conditioner can be ensured to be closer to the pronunciation characteristics of the target object.
Step S02, extracting pronunciation characteristic information of the target object in the third voice data;
specifically, voice data in the third voice data can be extracted based on voice prints of the target object, and operations such as noise reduction, blank voice removal, pre-emphasis, framing, windowing, voice feature extraction and the like are performed on the voice data to obtain pronunciation feature information of the target object. The pronunciation characteristic information includes at least one of sound energy, sound frequency, and sound intensity. Specifically, the MFCC algorithm or the FBank algorithm may be used to extract pronunciation characteristic information.
Step S03, embedding the pronunciation characteristic information into the initial voice generation model to obtain the voice generation model.
Specifically, pronunciation characteristic information is added to a specific position of the initial speech generation model as a model parameter to form a speech generation model that can generate pronunciation characteristics with a target object. Based on this, in step S70 of the above embodiment, when the speech generation model is adjusted, the pronunciation characteristic information corresponding to the second speech data can be recognized in a specific position in the model, and the recognized pronunciation characteristic information can be corrected in accordance with the data deviation in the above embodiment.
In this embodiment, the speech generating model is formed by embedding the pronunciation characteristic information of the third speech data of the target object into the initial speech generating model, so that the speech generating model for outputting the target speech information with the pronunciation characteristic of the target object can be quickly generated without collecting a large number of data samples and retraining the initial speech generating model and only a small number of speech data of the target object.
In addition, the embodiment of the application also provides a readable storage medium, wherein the readable storage medium stores a voice generation program, and the voice generation program realizes the relevant steps of any embodiment of the voice generation method when being executed by a processor.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (9)

1. A voice generating method of an air conditioner, the voice generating method comprising the steps of:
acquiring a text to be converted and a voice generation model corresponding to a target object;
inputting the text to be converted into the voice generation model;
taking the output result of the voice generation model as target voice information, wherein the target voice information has the pronunciation characteristics of the target object;
outputting the target voice information;
when a voice correction instruction is received, acquiring first voice data of a target object corresponding to the voice correction instruction;
determining second voice data corresponding to the voice correction instruction in the target voice information;
according to the data deviation between the first voice data and the second voice data, adjusting the voice generation model;
and returning to the step of inputting the text to be converted into the voice generation model.
2. The voice generating method of an air conditioner as claimed in claim 1, wherein the step of using an output result of the voice generating model as target voice information comprises:
acquiring characteristic information of an application scene and/or an action object corresponding to the target voice information;
and extracting data corresponding to the characteristic information from the output result as the target voice information.
3. The method for generating voice for an air conditioner according to claim 1, wherein before the step of inputting the text to be converted into the voice generation model, further comprising:
executing the labeling operation of the pronunciation characteristics of the text to be converted to obtain the text to be converted with the pronunciation characteristics labeling;
the step of inputting the text to be converted into the speech generation model comprises the following steps:
and inputting the text to be converted with the pronunciation characteristic label into the voice generation model.
4. The method for generating voice of an air conditioner as recited in claim 3, wherein before said step of performing a labeling operation of the pronunciation characteristics of said text to be converted, further comprising:
executing word segmentation operation on the text to be converted to obtain word segmentation results;
the step of executing the labeling operation of the pronunciation characteristics of the text to be converted comprises the following steps:
and executing the labeling operation of the pronunciation characteristics of the word segmentation result.
5. The method for generating voice of an air conditioner according to any one of claims 1 to 4, wherein before the step of obtaining the text to be converted and the voice generation model corresponding to the target object, further comprises:
acquiring third voice data of the target object, and acquiring an initial voice generation model;
extracting pronunciation characteristic information of the target object in the third voice data; the pronunciation characteristic information includes at least one of sound energy, sound frequency, and sound intensity;
and embedding the pronunciation characteristic information into the initial voice generation model to obtain the voice generation model.
6. The voice generating method of an air conditioner as claimed in claim 5, wherein the step of acquiring the third voice data of the target object comprises:
outputting a setting text;
acquiring voice data of the target object input based on the set text as the third voice data; or alternatively, the first and second heat exchangers may be,
acquiring voice interaction data of an air conditioner and voiceprint information of the target object;
extracting voice data matched with the voiceprint information from the voice interaction data to serve as third voice data; and/or the number of the groups of groups,
after the step of taking the output result of the voice generation model as target voice information, returning to the step of executing the third voice data of the target object to obtain an initial voice generation model;
before the step of extracting the pronunciation characteristic information of the target object in the third voice data, the method further includes:
extracting quality characteristic information corresponding to the third voice data; the quality characteristic information comprises at least one of intensity information, content information and quantity information;
and when the quality characteristic information meets the set quality condition, executing the step of extracting the pronunciation characteristic information of the target object in the third voice data.
7. A speech generating apparatus, characterized in that the speech generating apparatus comprises: a memory, a processor, and a speech generating program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the speech generating method of the air conditioner according to any one of claims 1 to 6.
8. An air conditioner, characterized in that the air conditioner comprises:
the speech generating apparatus according to claim 7, the speech generating apparatus being configured to generate target speech information having target object pronunciation characteristics;
and the voice playing module is used for outputting the target voice information.
9. A readable storage medium, wherein a speech generation program is stored on the readable storage medium, which when executed by a processor, implements the steps of the speech generation method of the air conditioner according to any one of claims 1 to 6.
CN202010479791.6A 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium Active CN113763920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010479791.6A CN113763920B (en) 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010479791.6A CN113763920B (en) 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium

Publications (2)

Publication Number Publication Date
CN113763920A CN113763920A (en) 2021-12-07
CN113763920B true CN113763920B (en) 2023-09-08

Family

ID=78782414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010479791.6A Active CN113763920B (en) 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium

Country Status (1)

Country Link
CN (1) CN113763920B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113948062B (en) * 2021-12-20 2022-08-16 阿里巴巴达摩院(杭州)科技有限公司 Data conversion method and computer storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1255011A (en) * 1998-11-03 2000-05-31 国际商业机器公司 Edition system for recording telephone message and method
JP2002023781A (en) * 2000-07-12 2002-01-25 Sanyo Electric Co Ltd Voice synthesizer, correction method for phrase units therein, rhythm pattern editing method therein, sound setting method therein, and computer-readable recording medium with voice synthesis program recorded thereon
CN105427855A (en) * 2015-11-09 2016-03-23 上海语知义信息技术有限公司 Voice broadcast system and voice broadcast method of intelligent software
CN108962217A (en) * 2018-07-28 2018-12-07 华为技术有限公司 Phoneme synthesizing method and relevant device
CN110797006A (en) * 2020-01-06 2020-02-14 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1255011A (en) * 1998-11-03 2000-05-31 国际商业机器公司 Edition system for recording telephone message and method
JP2002023781A (en) * 2000-07-12 2002-01-25 Sanyo Electric Co Ltd Voice synthesizer, correction method for phrase units therein, rhythm pattern editing method therein, sound setting method therein, and computer-readable recording medium with voice synthesis program recorded thereon
CN105427855A (en) * 2015-11-09 2016-03-23 上海语知义信息技术有限公司 Voice broadcast system and voice broadcast method of intelligent software
CN108962217A (en) * 2018-07-28 2018-12-07 华为技术有限公司 Phoneme synthesizing method and relevant device
CN110797006A (en) * 2020-01-06 2020-02-14 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium

Also Published As

Publication number Publication date
CN113763920A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN109523986B (en) Speech synthesis method, apparatus, device and storage medium
US11127416B2 (en) Method and apparatus for voice activity detection
CN108075892B (en) Voice processing method, device and equipment
JP6876752B2 (en) Response method and equipment
JP2019212288A (en) Method and device for outputting information
EP3803846A1 (en) Autonomous generation of melody
CN108899033B (en) Method and device for determining speaker characteristics
CN114708869A (en) Voice interaction method and device and electric appliance
CN112232276B (en) Emotion detection method and device based on voice recognition and image recognition
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium
CN114598933B (en) Video content processing method, system, terminal and storage medium
CN113763920B (en) Air conditioner, voice generating method thereof, voice generating device and readable storage medium
CN109065019B (en) Intelligent robot-oriented story data processing method and system
CN116564269A (en) Voice data processing method and device, electronic equipment and readable storage medium
CN116403583A (en) Voice data processing method and device, nonvolatile storage medium and vehicle
CN111554300B (en) Audio data processing method, device, storage medium and equipment
CN111182409B (en) Screen control method based on intelligent sound box, intelligent sound box and storage medium
CN110232911B (en) Singing following recognition method and device, storage medium and electronic equipment
US11430435B1 (en) Prompts for user feedback
CN109241331B (en) Intelligent robot-oriented story data processing method
CN110931020A (en) Voice detection method and device
CN110728165A (en) Method and system for analyzing intention and emotion of children
CN112530456B (en) Language category identification method and device, electronic equipment and storage medium
CN110444053B (en) Language learning method, computer device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant