CN113763920A - Air conditioner, voice generation method thereof, voice generation device and readable storage medium - Google Patents

Air conditioner, voice generation method thereof, voice generation device and readable storage medium Download PDF

Info

Publication number
CN113763920A
CN113763920A CN202010479791.6A CN202010479791A CN113763920A CN 113763920 A CN113763920 A CN 113763920A CN 202010479791 A CN202010479791 A CN 202010479791A CN 113763920 A CN113763920 A CN 113763920A
Authority
CN
China
Prior art keywords
voice
information
air conditioner
text
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010479791.6A
Other languages
Chinese (zh)
Other versions
CN113763920B (en
Inventor
钟鸿飞
罗彪
刘景春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midea Group Co Ltd
GD Midea Air Conditioning Equipment Co Ltd
Original Assignee
Midea Group Co Ltd
GD Midea Air Conditioning Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midea Group Co Ltd, GD Midea Air Conditioning Equipment Co Ltd filed Critical Midea Group Co Ltd
Priority to CN202010479791.6A priority Critical patent/CN113763920B/en
Publication of CN113763920A publication Critical patent/CN113763920A/en
Application granted granted Critical
Publication of CN113763920B publication Critical patent/CN113763920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/50Control or safety arrangements characterised by user interfaces or communication
    • F24F11/52Indication arrangements, e.g. displays
    • F24F11/526Indication arrangements, e.g. displays giving audible indications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Abstract

The invention discloses a voice generating method of an air conditioner, which comprises the following steps: acquiring a text to be converted and a voice generation model corresponding to a target object; inputting the text to be converted into the voice generation model; and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object. The invention also discloses a voice generating device, an air conditioner and a readable storage medium. The invention aims to realize that the air conditioner can output the voice with the user pronunciation characteristic, so that the voice output by the air conditioner meets the individual requirements of the user.

Description

Air conditioner, voice generation method thereof, voice generation device and readable storage medium
Technical Field
The present invention relates to the field of air conditioning technologies, and in particular, to a voice generation method, a voice generation device, an air conditioner, and a readable storage medium.
Background
With the continuous development of science and technology and people's demand, the function of air conditioner is more and more diversified. Besides responding to a voice control command sent by a user, some air conditioners are also provided with a voice broadcasting function, for example, the running state of the air conditioners are played through voice.
However, at present, the voice played by the air conditioner is generally pre-configured before the air conditioner leaves the factory, and the tone and the content are fixed, so that the voice output by the air conditioner may not be matched with the user requirement, and the user experience is seriously influenced.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a voice generating method, which aims to realize that an air conditioner can output voice with the pronunciation characteristics of a user so that the voice output by the air conditioner can meet the personalized requirements of the user.
In order to achieve the above object, the present invention provides a voice generating method of an air conditioner, the voice generating method comprising the steps of:
acquiring a text to be converted and a voice generation model corresponding to a target object;
inputting the text to be converted into the voice generation model;
and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object.
Optionally, the step of using the output result of the speech generation model as target speech information includes:
acquiring characteristic information of an application scene and/or an action object corresponding to the target voice information;
and extracting data corresponding to the characteristic information from the output result as the target voice information.
Optionally, before the step of inputting the text to be converted into the speech generation model, the method further includes:
executing the marking operation of the pronunciation characteristics of the text to be converted to obtain the text to be converted with pronunciation characteristic marks;
the step of inputting the text to be converted into the speech generation model comprises:
and inputting the text to be converted with the pronunciation feature label into the voice generation model.
Optionally, before the step of performing the operation of labeling the pronunciation features of the text to be converted, the method further includes:
executing word segmentation operation on the text to be converted to obtain a word segmentation result;
the step of executing the marking operation of the pronunciation characteristics of the text to be converted comprises the following steps:
and executing the marking operation of the pronunciation characteristics of the word segmentation result.
Optionally, after the step of using the output result of the speech generation model as the target speech information, the method further includes:
outputting the target voice information;
when a voice correction instruction is received, acquiring first voice data of a target object corresponding to the voice correction instruction;
determining second voice data corresponding to the voice correction instruction in the target voice information;
adjusting the voice generation model according to the data deviation between the first voice data and the second voice data;
and returning to the step of inputting the text to be converted into the speech generation model.
Optionally, before the step of obtaining the text to be converted and the speech generation model corresponding to the target object, the method further includes:
acquiring third voice data of the target object, and acquiring an initial voice generation model;
extracting pronunciation characteristic information of the target object in the third voice data; the pronunciation characteristic information comprises at least one of sound energy, sound frequency and sound intensity;
and embedding the pronunciation characteristic information into the initial voice generation model to obtain the voice generation model.
Optionally, the step of acquiring third speech data of the target object includes:
outputting a set text;
acquiring voice data of the target object input based on the set text as the third voice data; or the like, or, alternatively,
acquiring voice interaction data of an air conditioner and voiceprint information of the target object;
extracting voice data matched with the voiceprint information from the voice interaction data to serve as the third voice data; and/or the presence of a gas in the atmosphere,
after the step of taking the output result of the voice generation model as the target voice information, returning to execute the step of obtaining the third voice data of the target object and obtaining an initial voice generation model;
before the step of extracting pronunciation feature information of the target object in the third speech data, the method further includes:
extracting quality characteristic information corresponding to the third voice data; the quality characteristic information comprises at least one of intensity information, content information and quantity information;
and when the quality characteristic information meets a set quality condition, executing the step of extracting the pronunciation characteristic information of the target object in the third voice data.
In order to achieve the above object, the present application also provides a speech generating apparatus including: the air conditioner comprises a memory, a processor and a voice generating program which is stored on the memory and can run on the processor, wherein the voice generating program realizes the steps of the voice generating method of the air conditioner as described in any one of the above items when being executed by the processor.
Further, in order to achieve the above object, the present application also proposes an air conditioner including:
the speech generating apparatus as described above, the speech generating apparatus being configured to generate target speech information having a target object pronunciation feature;
and the voice playing module is used for outputting the target voice information.
Further, in order to achieve the above object, the present application also proposes a readable storage medium having stored thereon a speech generation program which, when executed by a processor, implements the steps of the speech generation method as recited in any one of the above.
The invention provides a voice generating method of an air conditioner, which inputs a text to be converted into a voice generating model corresponding to a target object, obtains target voice information with target object pronunciation characteristics based on an output result of the voice generating model, enables the target voice information output by the air conditioner not to be fixed any more, and enables the air conditioner to output voice with user pronunciation characteristics based on the generated target voice information, thereby realizing that the voice output by the air conditioner meets the personalized requirements of users.
Drawings
FIG. 1 is a diagram of a hardware configuration involved in the operation of an embodiment of the speech generator of the present invention;
FIG. 2 is a flowchart illustrating a speech generating method according to an embodiment of the present invention;
FIG. 3 is a flow chart of another embodiment of a speech generating method according to the present invention;
FIG. 4 is a flowchart illustrating a speech generating method according to another embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: acquiring a text to be converted and a voice generation model corresponding to a target object; inputting the text to be converted into the voice generation model; and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object.
In the prior art, the voice played by the air conditioner is generally pre-configured before the air conditioner leaves a factory, and the tone and the content are fixed, so that the voice output by the air conditioner may not be matched with the user requirement, and the user experience is seriously influenced.
The invention provides the solution, and aims to realize that the air conditioner can output the voice with the user pronunciation characteristic, so that the voice output by the air conditioner can meet the personalized requirements of the user.
The embodiment of the invention provides an air conditioner.
In this embodiment, the air conditioner is specifically an air conditioner with a voice playing function. The air conditioner includes a voice playing module 01. The air conditioner may output the target voice information through the voice playing module 01.
Furthermore, an embodiment of the present invention further provides a speech generating apparatus, configured to generate target speech information having pronunciation characteristics of a target object. The voice generator may be built in the air conditioner or may be provided independently of the air conditioner. When the voice generating device is arranged independently of the air conditioner, the voice generating device is in communication connection with the air conditioner, and the air conditioner can acquire and play target voice information generated by the air conditioner from the voice generating device.
In an embodiment of the present invention, referring to fig. 1, a speech generating apparatus includes: a processor 1001 (e.g., a CPU), a memory 1002, a voice capture module 1003, and the like. The memory 1002 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 1002 may alternatively be a storage device separate from the processor 1001.
The processor 1001 is respectively connected with the memory 1002, the voice collecting module 1003 and the audio playing device 01 in a communication mode. The voice collecting module 1003 can be used to collect voice data of the user. The voice acquisition module 1003 can be arranged in a mobile terminal (such as a mobile phone, a smart watch and the like) connected with the air conditioner, can be arranged in the air conditioner, and can also be arranged in any other equipment. The processor 1001 may be used to control the audio playback device 01 to perform voice playback.
Those skilled in the art will appreciate that the configuration of the device shown in fig. 1 is not intended to be limiting of the device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a speech generating program may be included in the memory 1002 as a readable storage medium. In the apparatus shown in fig. 1, the processor 1001 may be configured to call a speech generation program stored in the memory 1002 and perform operations of the relevant steps of the speech generation method in the following embodiments.
The embodiment of the invention also provides a voice generating method of the air conditioner, which is used for generating the target voice information required to be output by the air conditioner.
Referring to fig. 2, an embodiment of a speech generation method of the present application is provided. In this embodiment, the speech generation method includes:
step S10, acquiring a text to be converted and a voice generation model corresponding to a target object;
the text to be converted specifically refers to text content corresponding to the personalized voice required to be played by the air conditioner. The text to be converted may be text information (such as an air conditioner control instruction, prompt information related to air conditioner operation, and the like) pre-stored in the air conditioner, or may be text data acquired by the air conditioner from other terminals based on operation requirements during the operation process of the air conditioner, or may be text data input by a user based on own requirements. Based on the method, the text to be converted can be obtained by reading the data of the set position in the memory; or requesting the terminal to obtain a text to be converted when the air conditioner operates or monitors set state information; and when a setting instruction is received, acquiring text information corresponding to the instruction to obtain a text to be converted. For example, when a parent needs the air conditioner to read a story for a child, a text input instruction containing the story to be read can be input by means of an application of the mobile terminal and the like, and when the text input instruction is detected, a story text in the text input instruction information can be analyzed as a text to be converted.
The speech generation model herein refers specifically to a data processing model that can convert text information into target speech information having a target object pronunciation characteristic. The speech generation model is specifically a machine learning model (e.g., a deep learning model), and learns the speech data of the target object based on a set algorithm, thereby training the obtained model. The voice generation model can be generated and stored in the air conditioner, can also be generated in the server and then stored in the cloud, and can also be generated in the server and then downloaded to a memory of the air conditioner. Based on the method, the voice generation model can be read from the storage data of the air conditioner, and the voice generation model can also be obtained by sending a request to the cloud.
The target object herein refers to a user who needs the air conditioner to output a voice having its pronunciation characteristics. Different target objects correspond to different speech generation models. Specifically, the identity characteristic information (such as voiceprint information, face information, identification information input by the user, and the like) of the target object can be acquired, and the corresponding voice generation model is acquired based on the identity characteristic information, so that the users with different identities generate the required personalized voice based on the requirements of the users.
Step S20, inputting the text to be converted into the speech generation model;
and the voice generation model converts the input text to be converted to obtain target voice information which contains the content corresponding to the text to be converted and has the pronunciation characteristics of the target object.
Step S30, the output result of the speech generation model is used as target speech information.
The voice generation model converts the text to be converted into voice data with the pronunciation characteristics of the target object as an output result. And taking part or all of the output result of the speech generation model as target speech information with target object pronunciation characteristics. The pronunciation characteristics herein specifically include the tone, pronunciation manner, etc. of the target object.
When the output result of the speech generation model is one, the result can be directly used as the target speech information of the speech information; when more than one result is output by the voice generation model, one of the results can be selected as the target voice information of the voice information based on the set rule; or, when the current speech generation model outputs more than one result, and is applied to different scenes, each speech information target speech information is identified based on an application scene (such as an operation mode of a corresponding air conditioner) corresponding to each speech information target speech information, so that when the air conditioner needs to output the speech information target speech information subsequently, the speech information target speech information matched with the current application scene is acquired based on the identification and output.
The method for generating the voice of the air conditioner, which is provided by the embodiment of the invention, inputs the text to be converted into the voice generating model corresponding to the target object, obtains the target voice information with the pronunciation characteristics of the target object based on the output result of the voice generating model, so that the target voice information output by the air conditioner is not fixed any more, and based on the generated target voice information, the air conditioner can output the voice with the pronunciation characteristics of the user, thereby realizing that the voice output by the air conditioner meets the personalized requirements of the user, even enabling the air conditioner to replace the user to output the voice (for example, the parent speaks a story to a child), and freeing the user.
Specifically, in this embodiment, step S30 includes:
step S31, acquiring the characteristic information of the application scene and/or the action object corresponding to the target voice information;
the application scenario herein specifically refers to that when the air conditioner outputs the target voice information, information about the state of at least one of the time, the device, the social contact, the user emotion, and the like in the space where the air conditioner is located can be obtained as feature information of the application scenario. The action object specifically refers to a user, an animal and the like receiving the target voice information when the air conditioner outputs the target voice information, and information such as the identity, age, sex, type and the like of the action object can be acquired as characteristic information of the action object.
The characteristic information corresponding to the application scene and/or the action can be obtained by acquiring parameters input by a user, and can also be determined based on an analysis result after analyzing the content in the text to be converted. In addition, the air conditioner operation mode associated with the text to be converted can be obtained, and the air conditioner operation mode can be used as the characteristic information of the application scene. For example, when the text to be converted contains content related to emotion placation, it may be determined that the feature information of the application scene is emotion placation, and it may be determined that the feature information of the action object is an infant; when the text to be converted contains the story-related content, the characteristic information of the application scene can be determined to be sleeping, and the characteristic information of the action object can be determined to be children (such as users over 1 year old and under 7 years old); when the text to be converted contains the name of the pet, the characteristic information of the application scene can be determined to be pet supervision, and the characteristic information of the action object can be determined to be the pet, and the like. For another example, when the text to be converted is associated with the sleep mode of the air conditioner, it may be considered that the target voice information corresponding to the text to be converted needs to be output in the sleep mode of the air conditioner, and the sleep mode may be used as the feature information of the application scenario; when the text to be converted is associated with the non-sleep mode of the air conditioner, it can be considered that the target voice information corresponding to the text to be converted needs to be output in the non-sleep mode of the air conditioner, and then the non-sleep mode can be used as the characteristic information of the application scene.
Step S32, extracting data corresponding to the feature information as the target speech information from the output result.
The output result of the speech generation model may include speech data of different application scenes and/or different action objects, and the speech data of different application scenes and/or action objects have the same content (all the content corresponding to the text to be converted) and have pronunciation characteristics of the target object, but emotional characteristics such as tone, speech rate, emotion, etc. are different. For example, when the acting object is an infant, or the application scene is a pacifying or sleeping mode, the pronunciation characteristics corresponding to the voice data can be softer; when the action object is an adult or the application scene is a non-sleep mode, the pronunciation characteristics corresponding to the voice data can be brighter; the action object is a child, or when the application scene is speaking a story before sleep, the pronunciation characteristics corresponding to the voice data can have rich emotion; and the action object is an adult, or when the application scene is a state prompt, the pronunciation characteristics corresponding to the voice data can have a relatively flat emotion.
Based on the method, data matched with the characteristic information is selected from the output results to serve as a final result, and the target voice information is obtained, so that the target voice information output by the air conditioner can be matched with the application scene and/or the action object, the matching degree of the output voice of the air conditioner and the user requirement is further improved, the interaction effect of voice output is enhanced, and the user experience is improved.
Further, after step S30, the method may include:
step S40, outputting the target voice information;
specifically, when the air conditioner monitors setting scene information or receives a specific instruction output by a user, the air conditioner outputs target voice information.
Step S50, when a voice correction instruction is received, acquiring first voice data of a target object corresponding to the voice correction instruction;
after the user receives the sound corresponding to the target voice information, if the voice output by the air conditioner is found to be inaccurate, the voice correction instruction can be sent out through the air conditioner or a mobile terminal connected with the air conditioner. The air conditioner or the mobile terminal starts timing when receiving the voice correction instruction, and when the air conditioner detects the voice data of the target object within the set time length, the voice data can be used as first voice data. And the voice data which are sent by the target object of the specific instruction of the first voice data, collected by the voice collection module and formed and used for correcting the target voice information.
Step S60, determining second voice data corresponding to the voice correction instruction in the target voice information;
the voice modification instruction may specifically include an identifier of the voice data to be modified (e.g., content of the voice to be modified, location of the voice to be modified in the target voice information, etc.), and based on this, the corresponding voice data in the target voice information may be determined as the second voice data by parsing the voice modification instruction.
Step S70, adjusting the speech generation model according to the data deviation between the first speech data and the second speech data.
And extracting pronunciation characteristic information in the first voice data and the second voice data, and performing difference analysis to obtain data deviation. Model parameters in the speech generation model are corrected based on the data bias. The larger the data deviation, the larger the magnitude of the modification of the model parameters in the speech generating model.
After step S70, execution of step S20 may be returned to, and the target speech information is regenerated. In addition, step S70 may be followed by storing the speech generating model for later recall.
In this embodiment, after the target voice information is output, the first voice data is acquired and the second voice data is determined based on the voice correction instruction, the voice generation model is corrected based on the deviation between the two voice data, so that the result output by the corrected voice generation model is closer to the pronunciation characteristic of the target object, the voice generation model is more accurate, the accuracy of the air conditioner in outputting the voice with the user pronunciation characteristic is improved, and the user requirements are further met.
Further, based on the above embodiments, another embodiment of the air conditioner-based speech generation method of the present application is provided. In this embodiment, referring to fig. 3, before step S20, the method further includes:
step S21, executing word segmentation operation on the text to be converted to obtain word segmentation results;
specifically, the word segmentation operation includes identifying phrases in the text to be converted and the part of speech of each phrase, and performing part of speech tagging on each phrase. Taking the text to be converted with part-of-speech labels as a word segmentation result;
step S22, executing the labeling operation of the pronunciation characteristics of the word segmentation result to obtain the text to be converted with pronunciation characteristic labels;
the pronunciation characteristics specifically refer to characteristic information corresponding to standard pronunciation of a text in a specific language. For example, the pronunciation characteristics may specifically include pinyin, syllables, phonemes, or the like. And marking the pronunciation characteristics of the text to be converted after the word segmentation operation is performed to obtain the text to be converted with the pronunciation characteristic marks.
Step S23, inputting the text to be converted with pronunciation feature label into the speech generating model;
and inputting the text to be converted with the pronunciation feature label into a speech generation model.
In this embodiment, before the text to be converted is input into the speech generation model, word segmentation operation and pronunciation feature labeling operation are performed, and then the speech generation model is input, so that compared with the case that the text to be converted is directly input into the speech generation model without processing, the obtained output result is more accurate, the sound when the air conditioner outputs the target speech information is more similar to the pronunciation when the target object reads the text to be converted aloud, and the air conditioner further meets the personalized requirements of the user.
It should be noted that, in other embodiments, the text to be converted may also be input into the speech generation model only after performing a word segmentation operation or a pronunciation feature tagging operation according to the requirement.
Further, based on any of the above embodiments, a further embodiment of the speech generation method of the present application is provided. In this embodiment, referring to fig. 4, before the step S10, the method further includes:
step S01, acquiring an initial voice generation model and acquiring third voice data of the target object;
the initial speech generation model here is specifically a model that can convert text to speech.
The initial speech generation model may be a speech generation model in which the output speech results do not have the pronunciation characteristics of the target object. Based on the method, a large amount of training voice data of other users who are not target objects about the specific text can be collected, voice feature extraction is carried out on the training voice data, feature labels such as part of speech, pronunciation and the like are carried out on the specific text, the labeled specific text and the extracted voice features are used as training samples of the neural network, and an initial voice generation model is obtained through training.
The initial speech generation model may be a speech generation model in which the output speech result has the pronunciation characteristics of the target object. In this case, the steps S01 to S30 may be performed in a loop, and the initial speech generating model may be the speech generating model obtained by performing the steps S01 to S03 in the previous loop. However, when step S01 is executed for the first time, the initial speech generation model may be a speech generation model whose output result does not have the pronunciation characteristics of the target object.
The third voice data can be collected at present or obtained by analyzing the voice data recorded before. Specifically, the step of acquiring the third speech data of the target object includes: outputting a set text; acquiring voice data of the target object input based on the set text as the third voice data. Alternatively, the step of acquiring the third voice data of the target object may further include: acquiring voice interaction data of an air conditioner and voiceprint information of the target object; and extracting voice data matched with the voiceprint information from the voice interaction data to serve as the third voice data. Specifically, the third speech data may be acquired based on the output of the set text when the speech generation model of the target object is generated for the first time; and acquiring third voice data based on voice interaction data of the air conditioner when the voice generation model of the target object is not generated for the first time. Specifically, the target object may enter the voiceprint and the identity information in advance through a terminal application or the like, and associate the voiceprint information with the identity information. In the process of voice interaction between a user and the air conditioner, the air conditioner can record a voice file generated in the interaction process, call the recorded voiceprint to identify the voice file, and print the identity information associated with the voiceprint based on the recorded voice data corresponding to different voiceprints. And taking the data matched with the voice print of the target object in the voice file as third voice data.
In order to ensure the accuracy of the subsequently extracted pronunciation characteristic information, extracting quality characteristic information corresponding to the third voice data; the quality characteristic information comprises at least one of intensity information, content information and quantity information; step S02 is executed when the quality characteristic information satisfies the set quality condition; and returning to the step S01 when the quality characteristic information does not satisfy the set quality condition. Specifically, the intensity information may specifically include the sound intensity of the background noise, the sound intensity of the human voice, and/or the sound intensity of the human voice matching the target object voiceprint, and the like. The content information specifically comprises semantics represented by the speech data. The quantity information may specifically include the number of times of collection or the like of the third voice data. For example, when the sound intensity of the background noise is less than or equal to the first intensity threshold, and/or the sound intensity of the human voice matched with the voiceprint of the target object is greater than or equal to the second threshold, and/or the semantic meaning represented by the voice data is matched with the output set text, and/or the number of the collected third voice data is greater than or equal to the set threshold, it is determined that the quality characteristic information of the third voice data meets the set quality condition.
Wherein, after step S30, returning to step S01, the target object based speech data of the loop can be implemented to optimize the speech generation model thereof. Specifically, during the use of the air conditioner, the voice data matching the voiceprint information may be extracted from the voice interaction data continuously, the voice data may be used as the third voice data, the quality feature information corresponding to the third voice data may be extracted, it may be determined whether the quality feature information satisfies a set quality condition, the step of obtaining the initial voice generation model and the subsequent steps S02, S10, S20, S30 and the like may be performed after the quality feature information satisfies the set quality condition, for example, when the number of the third voice data is greater than or equal to the set number, the third voice data may be considered to satisfy the set quality condition, and the voice data satisfying the set quality condition may be optimally trained on the voice generation model of the target object. By the method, automatic iterative optimization can be performed in the using process of the air conditioner, so that the target voice information output by the air conditioner can be ensured to be closer to the pronunciation characteristics of the target object.
Step S02, extracting pronunciation feature information of the target object in the third speech data;
specifically, the voice data in the third voice data may be extracted based on the voiceprint of the target object, and the operation of noise reduction, blank voice removal, pre-emphasis, framing, windowing, voice feature extraction, and the like may be performed on the voice data to obtain the pronunciation feature information of the target object. The pronunciation characteristic information includes at least one of sound energy, sound frequency, and sound intensity. Specifically, the MFCC algorithm or the FBank algorithm can be adopted to extract pronunciation characteristic information.
And step S03, embedding the pronunciation characteristic information into the initial voice generation model to obtain the voice generation model.
Specifically, pronunciation feature information is added to a specific location of the initial speech generation model as model parameters to form a speech generation model that can generate pronunciation features of the target object. In step S70 of the above embodiment, when the speech generation model is adjusted, the utterance feature information corresponding to the second speech data is recognized at a specific position in the model, and the recognized utterance feature information is corrected according to the data deviation in the above embodiment.
In this embodiment, the pronunciation feature information of the third speech data of the target object is embedded into the initial speech generation model to form the speech generation model, so that the speech generation model for outputting the target speech information with the pronunciation feature of the target object can be quickly generated by using only a small amount of speech data of the target object without collecting a large amount of data samples and retraining the initial speech generation model.
Furthermore, an embodiment of the present invention further provides a readable storage medium, where a speech generating program is stored, and the speech generating program, when executed by a processor, implements the relevant steps of any embodiment of the above speech generating method.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A voice generating method of an air conditioner, characterized in that the voice generating method comprises the steps of:
acquiring a text to be converted and a voice generation model corresponding to a target object;
inputting the text to be converted into the voice generation model;
and taking an output result of the voice generation model as target voice information, wherein the target voice information has pronunciation characteristics of the target object.
2. The voice generating method of an air conditioner according to claim 1, wherein the step of using the output result of the voice generating model as the target voice information comprises:
acquiring characteristic information of an application scene and/or an action object corresponding to the target voice information;
and extracting data corresponding to the characteristic information from the output result as the target voice information.
3. The speech generating method of an air conditioner as claimed in claim 1, wherein the step of inputting the text to be converted into the speech generating model is preceded by the step of:
executing the marking operation of the pronunciation characteristics of the text to be converted to obtain the text to be converted with pronunciation characteristic marks;
the step of inputting the text to be converted into the speech generation model comprises:
and inputting the text to be converted with the pronunciation feature label into the voice generation model.
4. The speech generating method of an air conditioner according to claim 3, wherein the step of performing the labeling operation for the pronunciation characteristics of the text to be converted further comprises:
executing word segmentation operation on the text to be converted to obtain a word segmentation result;
the step of executing the marking operation of the pronunciation characteristics of the text to be converted comprises the following steps:
and executing the marking operation of the pronunciation characteristics of the word segmentation result.
5. The voice generating method of an air conditioner according to claim 1, further comprising, after the step of using the output result of the voice generating model as the target voice information:
outputting the target voice information;
when a voice correction instruction is received, acquiring first voice data of a target object corresponding to the voice correction instruction;
determining second voice data corresponding to the voice correction instruction in the target voice information;
adjusting the voice generation model according to the data deviation between the first voice data and the second voice data;
and returning to the step of inputting the text to be converted into the speech generation model.
6. The method for generating speech of an air conditioner according to any one of claims 1 to 5, wherein the step of obtaining the text to be converted and the speech generation model corresponding to the target object is preceded by the steps of:
acquiring third voice data of the target object, and acquiring an initial voice generation model;
extracting pronunciation characteristic information of the target object in the third voice data; the pronunciation characteristic information comprises at least one of sound energy, sound frequency and sound intensity;
and embedding the pronunciation characteristic information into the initial voice generation model to obtain the voice generation model.
7. The voice generating method of an air conditioner according to claim 6, wherein the step of acquiring the third voice data of the target object comprises:
outputting a set text;
acquiring voice data of the target object input based on the set text as the third voice data; or the like, or, alternatively,
acquiring voice interaction data of an air conditioner and voiceprint information of the target object;
extracting voice data matched with the voiceprint information from the voice interaction data to serve as the third voice data; and/or the presence of a gas in the atmosphere,
after the step of taking the output result of the voice generation model as the target voice information, returning to execute the step of obtaining the third voice data of the target object and obtaining an initial voice generation model;
before the step of extracting pronunciation feature information of the target object in the third speech data, the method further includes:
extracting quality characteristic information corresponding to the third voice data; the quality characteristic information comprises at least one of intensity information, content information and quantity information;
and when the quality characteristic information meets a set quality condition, executing the step of extracting the pronunciation characteristic information of the target object in the third voice data.
8. A speech generating apparatus, characterized in that the speech generating apparatus comprises: a memory, a processor, and a voice generation program stored on the memory and executable on the processor, the voice generation program, when executed by the processor, implementing the steps of the voice generation method of the air conditioner as claimed in any one of claims 1 to 7.
9. An air conditioner, characterized in that the air conditioner comprises:
the speech generating apparatus according to claim 8, said speech generating apparatus being configured to generate target speech information having a target object pronunciation characteristic;
and the voice playing module is used for outputting the target voice information.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a voice generation program, which when executed by a processor, implements the steps of the voice generation method of an air conditioner according to any one of claims 1 to 7.
CN202010479791.6A 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium Active CN113763920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010479791.6A CN113763920B (en) 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010479791.6A CN113763920B (en) 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium

Publications (2)

Publication Number Publication Date
CN113763920A true CN113763920A (en) 2021-12-07
CN113763920B CN113763920B (en) 2023-09-08

Family

ID=78782414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010479791.6A Active CN113763920B (en) 2020-05-29 2020-05-29 Air conditioner, voice generating method thereof, voice generating device and readable storage medium

Country Status (1)

Country Link
CN (1) CN113763920B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023116243A1 (en) * 2021-12-20 2023-06-29 阿里巴巴达摩院(杭州)科技有限公司 Data conversion method and computer storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1255011A (en) * 1998-11-03 2000-05-31 国际商业机器公司 Edition system for recording telephone message and method
JP2002023781A (en) * 2000-07-12 2002-01-25 Sanyo Electric Co Ltd Voice synthesizer, correction method for phrase units therein, rhythm pattern editing method therein, sound setting method therein, and computer-readable recording medium with voice synthesis program recorded thereon
CN105427855A (en) * 2015-11-09 2016-03-23 上海语知义信息技术有限公司 Voice broadcast system and voice broadcast method of intelligent software
CN108962217A (en) * 2018-07-28 2018-12-07 华为技术有限公司 Phoneme synthesizing method and relevant device
CN110797006A (en) * 2020-01-06 2020-02-14 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1255011A (en) * 1998-11-03 2000-05-31 国际商业机器公司 Edition system for recording telephone message and method
JP2002023781A (en) * 2000-07-12 2002-01-25 Sanyo Electric Co Ltd Voice synthesizer, correction method for phrase units therein, rhythm pattern editing method therein, sound setting method therein, and computer-readable recording medium with voice synthesis program recorded thereon
CN105427855A (en) * 2015-11-09 2016-03-23 上海语知义信息技术有限公司 Voice broadcast system and voice broadcast method of intelligent software
CN108962217A (en) * 2018-07-28 2018-12-07 华为技术有限公司 Phoneme synthesizing method and relevant device
CN110797006A (en) * 2020-01-06 2020-02-14 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023116243A1 (en) * 2021-12-20 2023-06-29 阿里巴巴达摩院(杭州)科技有限公司 Data conversion method and computer storage medium

Also Published As

Publication number Publication date
CN113763920B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN109523986B (en) Speech synthesis method, apparatus, device and storage medium
JP6394709B2 (en) SPEAKER IDENTIFYING DEVICE AND FEATURE REGISTRATION METHOD FOR REGISTERED SPEECH
CN102568478B (en) Video play control method and system based on voice recognition
CN109410664B (en) Pronunciation correction method and electronic equipment
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN102404278A (en) Song request system based on voiceprint recognition and application method thereof
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
CN108305611B (en) Text-to-speech method, device, storage medium and computer equipment
CN110691258A (en) Program material manufacturing method and device, computer storage medium and electronic equipment
CN111986675A (en) Voice conversation method, device and computer readable storage medium
CN109346057A (en) A kind of speech processing system of intelligence toy for children
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
CN114708869A (en) Voice interaction method and device and electric appliance
CN112309406A (en) Voiceprint registration method, voiceprint registration device and computer-readable storage medium
CN113763920B (en) Air conditioner, voice generating method thereof, voice generating device and readable storage medium
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium
CN112820281B (en) Voice recognition method, device and equipment
CN111128127A (en) Voice recognition processing method and device
CN112466287B (en) Voice segmentation method, device and computer readable storage medium
CN110808050A (en) Voice recognition method and intelligent equipment
CN111554300B (en) Audio data processing method, device, storage medium and equipment
CN112309396A (en) AI virtual robot state dynamic setting system
CN113056908B (en) Video subtitle synthesis method and device, storage medium and electronic equipment
Biagetti et al. Distributed speech and speaker identification system for personalized domotic control
CN116913245A (en) Speech synthesis method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant