CN111210809B - Voice training data adaptation method and device, voice data conversion method and electronic equipment - Google Patents

Voice training data adaptation method and device, voice data conversion method and electronic equipment Download PDF

Info

Publication number
CN111210809B
CN111210809B CN201811400134.7A CN201811400134A CN111210809B CN 111210809 B CN111210809 B CN 111210809B CN 201811400134 A CN201811400134 A CN 201811400134A CN 111210809 B CN111210809 B CN 111210809B
Authority
CN
China
Prior art keywords
data
channel
model
training
conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811400134.7A
Other languages
Chinese (zh)
Other versions
CN111210809A (en
Inventor
张平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811400134.7A priority Critical patent/CN111210809B/en
Publication of CN111210809A publication Critical patent/CN111210809A/en
Application granted granted Critical
Publication of CN111210809B publication Critical patent/CN111210809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Abstract

The embodiment of the invention provides a voice training data adaptation method and device, a voice data conversion method and electronic equipment. The voice training data adaptation method comprises the following steps: acquiring original voice data for data conversion, wherein the original voice data has audio data information in various directions; and converting the original voice data through a channel conversion algorithm to obtain training data applicable to different channels. According to the embodiment of the invention, the existing original voice data is converted through the channel conversion algorithm to obtain the training data adapting to different channels, so that a large number of voice data acquisition of a new voice recognition product for training each time is avoided, and the training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, thereby improving the modeling efficiency of the new voice matching model and saving the labor cost.

Description

Voice training data adaptation method and device, voice data conversion method and electronic equipment
Technical Field
The invention relates to the technical field of intelligent home, in particular to a voice training data adaptation method and device, a voice data conversion method and electronic equipment.
Background
The intelligent sound box is an upgrade product of the sound box, is a tool for a household consumer to acquire songs, weather forecast, news and the like from the cloud through voice input, and can also control other intelligent household equipment, such as opening a curtain through voice input, setting the temperature of a refrigerator, heating a water heater in advance and the like.
Different intelligent sound box products have differences in microphone setting and voice signal processing technology. The service provider (used for providing services such as songs, weather, news and the like) needs to set a voice database matched with the intelligent sound box of different models, voice data in the voice database is used as training data to train a matching model suitable for the intelligent sound boxes of various models, after a user inputs voice by using the intelligent sound box of a certain model, matching operations in aspects such as voiceprint, voice and the like are carried out through the corresponding matching model, and therefore voiceprint recognition or voice recognition is achieved.
In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art: with the upgrading and development of technology, new speech recognition products are continuously introduced in the market. After the new product is released, since the stock voice data in the existing voice database is not matched with the new product, the service provider needs to collect a large amount of voice data for the new product, and acquire voice training data suitable for the model voice recognition product for modeling, and the acquisition efficiency is very low.
Disclosure of Invention
The embodiment of the invention provides a voice training data adaptation method and device, a voice data conversion method and electronic equipment, and aims to overcome the defect of low training data acquisition efficiency in the prior art.
To achieve the above objective, an embodiment of the present invention provides a method for adapting voice training data, including:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in various directions;
and converting the original voice data through a channel conversion algorithm to obtain training data applicable to different channels.
The embodiment of the invention also provides a voice data conversion method, which comprises the following steps:
converting original voice data through a channel conversion algorithm matched with playing equipment to obtain training data suitable for the playing equipment, wherein the original voice data has audio data information in all directions;
model training is carried out according to the training data, and a data conversion model is obtained;
and converting the data to be output of the playing equipment according to the data conversion model so as to obtain the playing data suitable for the playing equipment.
The embodiment of the invention also provides a voice training data adapting device, which comprises:
an original voice data acquisition module for acquiring original voice data for data conversion, the original voice data having audio data information in various directions;
and the data conversion module is used for carrying out conversion processing on the original voice data through a channel conversion algorithm so as to obtain training data applicable to different channels.
The embodiment of the invention also provides electronic equipment, which comprises:
a memory for storing a program;
a processor for running the program stored in the memory for:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in various directions;
and converting the original voice data through a channel conversion algorithm to obtain training data applicable to different channels.
According to the voice training data adaptation method and device, the voice data conversion method and the electronic equipment, the existing original voice data are converted through the channel conversion algorithm, so that training data adapting to different channels are obtained, the condition that a large number of voice data acquisition is carried out on a new voice recognition product for training each time is avoided, the training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, and therefore modeling efficiency of a new voice matching model is improved, and meanwhile labor cost is saved.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a system block diagram of a service system provided in an embodiment of the present invention;
FIG. 2 is a flowchart of one embodiment of a method for adapting speech training data provided by the present invention;
FIG. 3 is a flowchart of another embodiment of a method for adapting speech training data according to the present invention;
FIG. 4 is a schematic structural diagram of an embodiment of a voice training data adaptation apparatus according to the present invention;
fig. 5 is a schematic structural diagram of another embodiment of a voice training data adapting device provided by the present invention;
FIG. 6 is a flowchart of an embodiment of a voice data conversion method according to the present invention;
fig. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the prior art, there are different speech recognition products (e.g., smart speaker products) that differ in microphone setup and speech signal processing techniques. The service provider needs to provide a voice database matched with the intelligent sound box of different models, and trains a matching model applicable to various types of voice recognition products by taking voice data in the voice database as training data. After a user inputs voice by using a voice recognition product of a certain model, matching operations in terms of voiceprint, voice and the like can be performed through a corresponding matching model, so that voiceprint recognition or voice recognition is realized. When a new speech recognition product is introduced, since the stock speech data in the existing speech database is not matched with the new product, the service provider needs to collect a large amount of speech data for the new product, and acquire training data suitable for the speech recognition product of the model for modeling, and the acquisition efficiency is very low. Therefore, the application proposes a voice training data adaptation scheme, the main principle of which is: the existing or pre-acquired original voice data (i.e. voice data with audio data information in all directions, such as more complete channel information, more abundant high-frequency information, noise-removed voice data, etc.) are converted through a channel conversion algorithm to obtain training data applicable to different channels (such as two-microphone, four-microphone, six-microphone, etc.), so that a large amount of voice data acquisition is avoided for training a new voice recognition product each time, and training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, thereby improving the modeling efficiency of a matching model of the new voice recognition product and saving the labor cost.
The method provided by the embodiment of the invention can be applied to any business system with voice data processing capability. Fig. 1 is a system block diagram of a service system provided by an embodiment of the present invention, and the structure shown in fig. 1 is only one example of a service system to which the technical solution of the present invention can be applied. As shown in fig. 1, the service system includes a training data adapting device. The device comprises: the raw speech data acquisition module and the data conversion module may be used to perform the process flows shown in fig. 2 and 3 described below.
In the service system, first, original voice data for data conversion having audio data information in various directions is acquired; then, the acquired original voice data is converted by a channel conversion algorithm to obtain training data applicable to different channels. Specifically, the existing original voice data (namely, the high-quality voice data with noise can be removed and the channel information is complete and the high-frequency information is rich) can be directly obtained; the existing stock data can be recorded with high fidelity, so that the original voice data can be obtained; in addition, the voice of the recorder can be recorded by the high-fidelity recording equipment for the data which is not contained in the existing data so as to supplement the voice of the recorder. After the conversion processing is performed by the channel conversion algorithm, training data (such as two-wheat data, four-wheat data, six-wheat data and the like) suitable for different channels are obtained so as to be respectively used for training different matching models (such as two-wheat models, four-wheat models, six-wheat models and the like).
The foregoing embodiments are illustrative of the technical principles and exemplary application frameworks of embodiments of the present invention, and the detailed description of specific technical solutions of the embodiments of the present invention will be further described below by means of a plurality of embodiments.
Example 1
Fig. 2 is a flowchart of an embodiment of a voice training data adaptation method provided by the present invention, where the execution body of the method may be the service system, or may be various server devices with voice data processing capabilities, or may be a device or a chip integrated on these server devices. As shown in fig. 2, the voice training data adaptation method includes the following steps:
s201, original voice data for data conversion is acquired.
In an embodiment of the present invention, the original voice data has audio data information in various directions. The original voice data obtained by recording the existing stock data by the high-fidelity recording device can be obtained in the first database, the original voice data obtained by recording the existing stock data by the high-fidelity recording device can be obtained in the second database, and the original voice data obtained by recording the recording personnel by the high-fidelity recording device can be obtained in the third database.
S202, converting the original voice data through a channel conversion algorithm to obtain training data applicable to different channels.
In the embodiment of the present invention, step S201, i.e., the acquisition process of the original voice data, is independent of the data conversion process. The raw speech data is used as input to the channel conversion algorithm, and the acquisition step is a pre-processed data preparation process. Step S202, i.e. the data conversion process, may be performed whenever corresponding training data is required.
According to the voice training data adaptation method provided by the embodiment of the invention, the channel conversion algorithm is used for carrying out conversion processing operation on the existing original voice data so as to obtain training data adapting to different channels, so that a large number of voice data acquisition of a new voice recognition product is avoided for training each time, the training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, the modeling efficiency of the new voice matching model is improved, and the labor cost is saved.
Example two
Fig. 3 is a flowchart of another embodiment of a voice training data adaptation method provided by the present invention. As shown in fig. 3, on the basis of the embodiment shown in fig. 2, the voice training data adaptation method provided in this embodiment may further include the following steps:
s301, acquiring the existing original voice data in a first database.
S302, obtaining original voice data obtained by recording the existing stock data through high-fidelity recording equipment in a second database.
S303, obtaining the original voice data obtained by recording the voice recorder through the high-fidelity recording equipment in the third database.
In the embodiment of the present invention, the execution sequence of steps S301 to S303 is not limited to sequential order, and may be performed simultaneously, or may be performed sequentially in any order, or, of course, one or two of the three steps may be optionally performed.
In addition, in the voice training data adaptation method provided by the embodiment of the present invention, an acquisition step of a channel conversion algorithm may be further included, as shown in the following steps S304 to S305.
S304, recording data aiming at the fixed text under different channels is obtained.
In the embodiment of the invention, a section of fixed text can be set first, and when a channel conversion algorithm is acquired, recording is performed on the section of fixed text under different channels, for example, under the channel environments of two-microphone, four-microphone, six-microphone and the like and original voice, so as to acquire different recording data.
Further, for the same channel environment, data acquisition at different distances can be performed, and recording data for the fixed text at different distances can be obtained.
S305, obtaining a channel conversion algorithm according to the different parameter distribution functions of different recording data.
In the embodiment of the invention, aiming at recording data under different channels, a channel conversion algorithm can be obtained according to a Gaussian distribution function of the recording data; aiming at recording data under different distances, a channel conversion algorithm can be obtained according to the energy distribution function of the recording data, and finally the channel conversion algorithm which can be used for data conversion is obtained.
S306, converting the original voice data through a channel conversion algorithm to obtain training data applicable to different channels.
In the embodiment of the present invention, steps S301 to S303 (i.e., the acquisition process of the original voice data) are independent of steps S304 to S305 (i.e., the acquisition process of the channel conversion algorithm), the original voice data is taken as the input of the channel conversion algorithm, and the acquisition process thereof can be regarded as a pre-processed data preparation process; the process of obtaining the channel conversion algorithm needs to be executed each time a new smart speaker is generated, so as to update and maintain the old channel conversion algorithm.
According to the voice training data adaptation method provided by the embodiment of the invention, the channel conversion algorithm is used for carrying out conversion processing operation on the existing original voice data so as to obtain training data adapting to different channels, so that a large number of voice data acquisition of a new voice recognition product is avoided for training each time, the training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, the modeling efficiency of the new voice matching model is improved, and the labor cost is saved.
Example III
Fig. 4 is a schematic structural diagram of an embodiment of a voice training data adapting device according to the present invention, which may be used to perform the method steps shown in fig. 2. As shown in fig. 4, the voice training data adaptation apparatus may include: the raw speech data acquisition module 41 and the data conversion module 42.
Wherein, the original voice data obtaining module 41 may be used for obtaining original voice data for data conversion; the data conversion module 42 may be configured to perform conversion processing on the original voice data acquired by the original voice data acquisition module 41 through a channel conversion algorithm, so as to obtain training data applicable to different channels.
In an embodiment of the present invention, the original voice data has audio data information in various directions. After the original voice data is acquired by the original voice data acquisition module 41, the data conversion module 42 may perform conversion processing on the original voice data acquired by the original voice data acquisition module 41 through a channel conversion algorithm, so as to obtain training data applicable to different channels. The process of acquiring the original voice data by the original voice data acquisition module 41 is independent of the data conversion process of the data conversion module 42. The raw speech data is used as input to the channel conversion algorithm, and the acquisition step is a pre-processed data preparation process. The data conversion process can be implemented whenever corresponding training data is needed.
According to the voice training data adapting device provided by the embodiment of the invention, the channel conversion algorithm is used for converting the existing original voice data to obtain the training data adapting to different channels, so that a large number of voice data acquisition of a new voice recognition product for training each time is avoided, the training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, the modeling efficiency of the new voice matching model is improved, and the labor cost is saved.
Example IV
Fig. 5 is a schematic structural diagram of another embodiment of the voice training data adapting apparatus provided in the present invention, which may be used to perform the method steps shown in fig. 3. As shown in fig. 5, on the basis of the embodiment shown in fig. 4, the voice training data adapting device provided by the embodiment of the present invention may further include: the algorithm acquisition module 51. The algorithm obtaining module 51 may be configured to obtain recording data for a fixed text under different channels, and obtain a channel conversion algorithm according to a difference parameter distribution function of the different recording data.
In the embodiment of the present invention, a section of fixed text may be set first, and when the channel conversion algorithm is acquired, the algorithm acquisition module 51 may record the section of fixed text under different channels, for example, under two-wheat, four-wheat, six-wheat and other channel environments and high-fidelity channel environments, to acquire different recording data.
Further, the algorithm acquisition module 51 may also be used to acquire recording data for the fixed text at different distances for the same channel environment.
In the embodiment of the present invention, the algorithm acquisition module 51 may acquire a channel conversion algorithm according to a gaussian distribution function for recording data under different channels; aiming at recording data under different distances, a channel conversion algorithm can be obtained according to the energy distribution function of the recording data, and finally the channel conversion algorithm which can be used for data conversion is obtained.
In the embodiment of the present invention, the process algorithm acquiring module 51 of the original voice data acquiring module 41 acquires the process of the channel conversion algorithm, the original voice data is used as the input of the channel conversion algorithm, and the acquiring process can be regarded as a preprocessed data preparing process; the process of obtaining the channel conversion algorithm needs to be executed each time a new smart speaker is generated, so as to update and maintain the old channel conversion algorithm.
Still further, the original voice data acquisition module 41 may include: a first acquisition unit 411, the first acquisition unit 411 may be configured to acquire existing original voice data in a first database.
The original voice data acquisition module 41 may further include: a second obtaining unit 412, where the second obtaining unit 412 may be configured to obtain, in a second database, original voice data obtained by recording existing stock data by a high-fidelity recording device.
The original voice data acquisition module 41 may further include: a third obtaining unit 413, where the third obtaining unit 413 may be configured to obtain, in a third database, original voice data obtained by recording a recording person by a hi-fi recording device.
In the embodiment of the present invention, the acquisition order of the first acquisition unit 411, the second acquisition unit 412, and the third acquisition unit 413 is not separately and sequentially, and may be executed simultaneously, or may be executed sequentially in any order, or may, of course, be executed in any one or two of the three units.
According to the voice training data adapting device provided by the embodiment of the invention, the channel conversion algorithm is used for converting the existing original voice data to obtain the training data adapting to different channels, so that a large number of voice data acquisition of a new voice recognition product for training each time is avoided, the training data adapting to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm, the modeling efficiency of the new voice matching model is improved, and the labor cost is saved.
Example five
Fig. 6 is a flowchart of an embodiment of a voice data conversion method according to the present invention. The execution subject of the method may be various server devices with voice data processing capability, or may be devices or chips integrated on these server devices. As shown in fig. 6, the voice data conversion method includes the steps of:
s601, converting the original voice data through a channel conversion algorithm matched with the playing device to obtain training data suitable for the playing device.
In the embodiment of the present invention, the original voice data refers to voice data having audio data information in various directions.
Regarding the acquisition of the original voice data, the existing original voice data may be acquired in the first database, the original voice data obtained by recording the existing stock data by the hi-fi recording device may be acquired in the second database, and the original voice data obtained by recording the recording person by the hi-fi recording device may be acquired in the third database.
A voice playing device, which needs To play voice according To the configured voice database when playing TTS (Text To Speech). And for different models of playing equipment, voice databases of different channels need to be configured. According to the voice data conversion method provided by the embodiment of the invention, when a new playing device is generated, a server providing support for the playing device can acquire the channel conversion matched with the playing device according to the belief type of the playing device so as to acquire training data suitable for the playing device.
Specifically, when the channel conversion algorithm matched with the playing device is acquired, the following steps may be taken: acquiring recording data aiming at a fixed text under different channels, wherein the recording data comprises recording data aiming at the fixed text by playing equipment; and then, obtaining a channel conversion algorithm according to the difference parameter distribution function of different recording data.
In the embodiment of the invention, a section of fixed text can be set first, and when a channel conversion algorithm is acquired, recording is performed on the section of fixed text under different channels, for example, under the channel environments of two-microphone, four-microphone, six-microphone and the like and original voice, so as to acquire different recording data.
Aiming at recording data under different channels, a channel conversion algorithm can be obtained according to a Gaussian distribution function of the recording data.
S602, performing model training according to the training data to obtain a data conversion model.
And S603, converting the data to be output of the playing device according to the data conversion model so as to obtain playing data suitable for the playing device.
In the embodiment of the invention, after obtaining the training data suitable for the playing device, the server performs model training, so as to obtain a data conversion model.
When the playing device plays the voice, the data to be output can be sent to the server, the server inputs the data to be output into the data conversion model, and the model automatically outputs the playing data suitable for the playing device. When the playing device receives the playing data from the server, the playing device can play the playing data.
According to the voice data conversion method provided by the embodiment of the invention, the existing original voice data is converted and processed through the channel conversion algorithm matched with the playing equipment to obtain the training data matched with the playing equipment, so that a large amount of voice data acquisition of a new voice playing product can be avoided each time, the training data matched with the voice playing product can be obtained only by updating and maintaining the channel conversion algorithm, thereby training a data conversion model, realizing the conversion of the data to be played of a new product, improving the voice playing quality and saving the labor cost in data acquisition.
Example six
The internal functions and structures of a speech training data adaptation apparatus are described above, which may be implemented as an electronic device. Fig. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present invention. As shown in fig. 7, the electronic device includes a memory 71 and a processor 72.
A memory 71 for storing a program. In addition to the programs described above, the memory 71 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like.
The memory 71 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
A processor 72 coupled to the memory 71, executing a program stored in the memory 71 for:
acquiring original voice data for data conversion, the original voice data having audio data information in various directions;
and converting the acquired original voice data through a channel conversion algorithm to acquire training data applicable to different channels.
Further, as shown in fig. 7, the electronic device may further include: communication component 73, power component 74, audio component 75, display 76, and the like. Only some of the components are schematically shown in fig. 7, which does not mean that the electronic device only comprises the components shown in fig. 7.
The communication component 73 is configured to facilitate communication between the electronic device and other devices, either wired or wireless. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 73 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 73 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
A power supply assembly 74 provides power to the various components of the electronic device. The power components 74 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices.
The audio component 75 is configured to output and/or input audio signals. For example, the audio component 75 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 71 or transmitted via the communication component 73. In some embodiments, the audio component 75 further comprises a speaker for outputting audio signals.
The display 76 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (11)

1. A method for adapting speech training data, comprising:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in various directions;
recording data aiming at a fixed text under different channels are obtained, wherein the channels comprise a two-wheat channel, a four-wheat channel, a six-wheat channel and a high-fidelity channel;
obtaining a channel conversion algorithm according to different difference parameter distribution functions of the recording data, wherein the difference parameter distribution functions comprise Gaussian distribution functions;
and converting the original voice data through the channel conversion algorithm to obtain training data suitable for different channels, wherein the training data are used for model training of a data conversion model so as to convert data to be output of playing equipment according to the data conversion model trained by the model to obtain playing data suitable for the playing equipment, and the data conversion model comprises a two-microphone model, a four-microphone model and/or a six-microphone model.
2. The method for adapting speech training data according to claim 1, further comprising:
recording data aiming at the fixed text under different distances is obtained.
3. The method for adapting speech training data according to claim 1, wherein the variance parameter distribution function of the recorded data under different channels is a gaussian distribution function.
4. The speech training data adaptation method of claim 2, wherein the difference parameter distribution function of the recorded data at different distances is an energy distribution function.
5. The method for adapting speech training data according to any one of claims 1 to 4, wherein the obtaining the original speech data for data conversion comprises:
existing raw speech data is obtained in a first database.
6. The method for adapting speech training data according to any one of claims 1 to 4, wherein the obtaining the original speech data for data conversion comprises:
and acquiring the original voice data obtained by recording the existing stock data through the high-fidelity recording equipment in the second database.
7. The method for adapting speech training data according to any one of claims 1 to 4, wherein the obtaining the original speech data for data conversion comprises:
and acquiring the original voice data obtained by recording the voice recorder through the high-fidelity recording equipment in a third database.
8. A method for converting voice data, comprising:
recording data aiming at a fixed text under different channels are obtained, wherein the channels comprise a two-wheat channel, a four-wheat channel, a six-wheat channel and a high-fidelity channel;
obtaining a channel conversion algorithm according to different difference parameter distribution functions of the recording data, wherein the difference parameter distribution functions comprise Gaussian distribution functions;
converting original voice data through a channel conversion algorithm matched with playing equipment to obtain training data suitable for the playing equipment, wherein the original voice data has audio data information in all directions;
model training is carried out according to the training data to obtain a data conversion model, wherein the data conversion model comprises a two-wheat model, a four-wheat model and/or a six-wheat model;
and converting the data to be output of the playing equipment according to the data conversion model so as to obtain the playing data suitable for the playing equipment.
9. The voice data conversion method of claim 8, wherein the recording data comprises recording data of the playback device for the fixed text.
10. A speech training data adaptation apparatus, comprising:
an original voice data acquisition module for acquiring original voice data for data conversion, the original voice data having audio data information in various directions;
the recording data acquisition module is used for acquiring recording data aiming at fixed texts under different channels, wherein the channels comprise a two-wheat channel, a four-wheat channel, a six-wheat channel and a high-fidelity channel;
the channel conversion algorithm acquisition module is used for acquiring a channel conversion algorithm according to different difference parameter distribution functions of the recording data, wherein the difference parameter distribution functions comprise Gaussian distribution functions;
the data conversion module is used for converting the original voice data through the channel conversion algorithm to obtain training data suitable for different channels, the training data are used for model training of a data conversion model, the data to be output of the playing device are converted according to the data conversion model after model training to obtain playing data suitable for the playing device, and the data conversion model comprises a two-wheat model, a four-wheat model and/or a six-wheat model.
11. An electronic device, comprising:
a memory for storing a program;
a processor for running the program stored in the memory for:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in various directions;
recording data aiming at a fixed text under different channels are obtained, wherein the channels comprise a two-wheat channel, a four-wheat channel, a six-wheat channel and a high-fidelity channel;
obtaining a channel conversion algorithm according to different difference parameter distribution functions of the recording data, wherein the difference parameter distribution functions comprise Gaussian distribution functions;
and converting the original voice data through the channel conversion algorithm to obtain training data suitable for different channels, wherein the training data are used for model training of a data conversion model so as to convert data to be output of playing equipment according to the data conversion model trained by the model to obtain playing data suitable for the playing equipment, and the data conversion model comprises a two-microphone model, a four-microphone model and/or a six-microphone model.
CN201811400134.7A 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment Active CN111210809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811400134.7A CN111210809B (en) 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811400134.7A CN111210809B (en) 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment

Publications (2)

Publication Number Publication Date
CN111210809A CN111210809A (en) 2020-05-29
CN111210809B true CN111210809B (en) 2024-03-19

Family

ID=70789391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811400134.7A Active CN111210809B (en) 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment

Country Status (1)

Country Link
CN (1) CN111210809B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456697B1 (en) * 1998-09-23 2002-09-24 Industrial Technology Research Institute Device and method of channel effect compensation for telephone speech recognition
US6502070B1 (en) * 2000-04-28 2002-12-31 Nortel Networks Limited Method and apparatus for normalizing channel specific speech feature elements
CN101674087A (en) * 2009-09-27 2010-03-17 电子科技大学 Method for obtaining channel mismatching error of time alternative ADC system
KR20100081165A (en) * 2009-01-05 2010-07-14 경희대학교 산학협력단 Method for calculating security capacity of gaussian mimo wiretap channel
CN102129859A (en) * 2010-01-18 2011-07-20 盛乐信息技术(上海)有限公司 Voiceprint authentication system and method for rapid channel compensation
CN104064204A (en) * 2013-03-22 2014-09-24 宏达国际电子股份有限公司 Audio playback system and method used in handheld electronic device
JP2014204316A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
CN104464786A (en) * 2014-11-21 2015-03-25 西安诺瓦电子科技有限公司 Audio frequency controlling device and method
CN106941007A (en) * 2017-05-12 2017-07-11 北京理工大学 A kind of audio event model composite channel adaptive approach
CN107123432A (en) * 2017-05-12 2017-09-01 北京理工大学 A kind of Self Matching Top N audio events recognize channel self-adapted method
CN107481723A (en) * 2017-08-28 2017-12-15 清华大学 A kind of channel matched method and its device for Application on Voiceprint Recognition
CN107889044A (en) * 2017-12-19 2018-04-06 维沃移动通信有限公司 The processing method and processing device of voice data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7453794B2 (en) * 2003-12-16 2008-11-18 University Of Florida Research Foundation, Inc. Channel estimation and synchronization with preamble using polyphase code
US20070239441A1 (en) * 2006-03-29 2007-10-11 Jiri Navratil System and method for addressing channel mismatch through class specific transforms
US8364084B2 (en) * 2009-10-30 2013-01-29 Action Star Enterprise, Co. Ltd. Audio broadcasting system and method for broadcasting the same
JP6234060B2 (en) * 2013-05-09 2017-11-22 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Generation method, generation apparatus, and generation program for target domain learning voice data

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456697B1 (en) * 1998-09-23 2002-09-24 Industrial Technology Research Institute Device and method of channel effect compensation for telephone speech recognition
US6502070B1 (en) * 2000-04-28 2002-12-31 Nortel Networks Limited Method and apparatus for normalizing channel specific speech feature elements
KR20100081165A (en) * 2009-01-05 2010-07-14 경희대학교 산학협력단 Method for calculating security capacity of gaussian mimo wiretap channel
CN101674087A (en) * 2009-09-27 2010-03-17 电子科技大学 Method for obtaining channel mismatching error of time alternative ADC system
CN102129859A (en) * 2010-01-18 2011-07-20 盛乐信息技术(上海)有限公司 Voiceprint authentication system and method for rapid channel compensation
CN104064204A (en) * 2013-03-22 2014-09-24 宏达国际电子股份有限公司 Audio playback system and method used in handheld electronic device
JP2014204316A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
CN104464786A (en) * 2014-11-21 2015-03-25 西安诺瓦电子科技有限公司 Audio frequency controlling device and method
CN106941007A (en) * 2017-05-12 2017-07-11 北京理工大学 A kind of audio event model composite channel adaptive approach
CN107123432A (en) * 2017-05-12 2017-09-01 北京理工大学 A kind of Self Matching Top N audio events recognize channel self-adapted method
CN107481723A (en) * 2017-08-28 2017-12-15 清华大学 A kind of channel matched method and its device for Application on Voiceprint Recognition
CN107889044A (en) * 2017-12-19 2018-04-06 维沃移动通信有限公司 The processing method and processing device of voice data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
说话人识别中基于最大后验概率的通道补偿方法;高荣春;韩纪庆;张磊;;通信学报(03);全文 *

Also Published As

Publication number Publication date
CN111210809A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
CN109378006B (en) Cross-device voiceprint recognition method and system
US20140350933A1 (en) Voice recognition apparatus and control method thereof
EP4114011A1 (en) Display apparatus and method for controlling the display apparatus
US20130339015A1 (en) Terminal apparatus and control method thereof
JP6783339B2 (en) Methods and devices for processing audio
CN104904227A (en) Display apparatus and method for controlling the same
CN103517094B (en) Server and the method for controlling the server
CN104123938A (en) Voice control system, electronic device and voice control method
CN103516711A (en) Display apparatus, method for controlling display apparatus, and interactive system
CN107864410B (en) Multimedia data processing method and device, electronic equipment and storage medium
DE202013012886U1 (en) Display device, electronic device and interactive system
CN108959634A (en) Video recommendation method, device, equipment and storage medium
CN109961786A (en) Products Show method, apparatus, equipment and storage medium based on speech analysis
CN103220425B (en) A kind of way of recording based on multiple mobile terminal and system
CN103546763A (en) Method for providing contents information and broadcast receiving apparatus
CN111640434A (en) Method and apparatus for controlling voice device
CN105549876A (en) Method and apparatus for performing input in input box
CN105208189A (en) Audio processing method and mobile terminal
WO2019101099A1 (en) Video program identification method and device, terminal, system, and storage medium
CN111862965A (en) Awakening processing method and device, intelligent sound box and electronic equipment
CN111210809B (en) Voice training data adaptation method and device, voice data conversion method and electronic equipment
CN106331393A (en) Control method and control device
CN107483993B (en) Voice input method of television, television and computer readable storage medium
CN103208062B (en) Messaging device and information processing method
CN104317404A (en) Voice-print-control audio playing equipment, control system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant