CN113409815A - Voice alignment method based on multi-source voice data - Google Patents
Voice alignment method based on multi-source voice data Download PDFInfo
- Publication number
- CN113409815A CN113409815A CN202110591658.4A CN202110591658A CN113409815A CN 113409815 A CN113409815 A CN 113409815A CN 202110591658 A CN202110591658 A CN 202110591658A CN 113409815 A CN113409815 A CN 113409815A
- Authority
- CN
- China
- Prior art keywords
- voice
- data
- voice data
- frame
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000004458 analytical method Methods 0.000 claims abstract description 39
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a voice alignment method based on multi-source voice data, belongs to the field of voice processing, relates to a voice alignment technology, and is used for aligning a starting point through a voice alignment method, realizing alignment of voice data, and avoiding the problems that a manual alignment mode costs a large amount of time and processing efficiency and alignment accuracy is low; the method comprises the following steps: the voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module; processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module; performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module; and carrying out voice combination on the aligned voice data through a voice combination module.
Description
Technical Field
The invention belongs to the field of voice processing, relates to a voice alignment technology, and particularly relates to a voice alignment method based on multi-source voice data.
Background
Generally, for the voice of the same speaker in the same recording scene, a plurality of roadbed devices are required to collect voice data, and the starting points of the voice data collected by different recording devices cannot be guaranteed to be completely consistent. Therefore, in order to ensure consistency of the collection starting points of the voice data collected by a plurality of recording devices, and in order to facilitate subsequent processing such as synthesis of the voice data, it is a technical problem how to align the voices.
In the prior art, the alignment operation is generally performed on the voice data in a manual manner. For example, when facing voice data of different collection starting points, technicians need to manually compare sound waves of the voice data and align the starting points to achieve alignment of the voice data. The processing method of manual alignment needs a lot of time, has low processing efficiency and alignment accuracy, and is not beneficial to processing voice data with large data volume.
Therefore, a voice alignment method based on multi-source voice data is provided.
Disclosure of Invention
The invention provides a voice alignment method based on multi-source voice data, which is used for aligning starting points through the voice alignment method, realizing alignment of voice data and avoiding the problems that a large amount of time and processing efficiency are consumed in a manual alignment mode and the alignment accuracy is low. The voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module; processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module; performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module; the voice analysis module carries out data arrangement on the data characteristic coefficient TZij of the acquired single-frame voice data according to different frame numbers and different voice acquisition modules, and the voice analysis module randomly selects the voice data acquired by one data acquisition module as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij; processing the rest single-frame voice data in the same way to obtain different contrast values; combining different contrast values into different number sequences, comparing Dij in different number sequences with Dij in a reference number sequence respectively, when the contrast values continuously exceed 10 bits and are consistent or the quotient of the contrast values is within (0.95-1.05), indicating that single-frame voice data can be adopted, and marking the adopted single-frame voice data as single-frame voice data to be aligned; and carrying out voice combination on the aligned voice data through a voice combination module.
The purpose of the invention can be realized by the following technical scheme:
a voice alignment method based on multi-source voice data comprises a voice alignment system based on the multi-source voice data, and the voice alignment system comprises a plurality of voice acquisition modules, a voice analysis module, a voice processing module and a voice combination module, wherein the voice acquisition modules are respectively positioned around a sound source and used for acquiring voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module;
the voice processing module is used for processing the voice data sent by the voice acquisition modules; the processed voice data are sent to a voice analysis module;
the voice analysis module is used for carrying out voice alignment on the processed voice data; sending the aligned voice data to a voice combination module;
and the voice combination module performs voice combination on the aligned voice data.
It should be noted that the voice acquisition module is specifically some devices with a recording function or a microphone; the voice acquisition modules are distributed around the sound source, have different spatial distances with the sound source and are defaulted to be the same equipment;
the voice acquisition modules send acquired voice data to the voice processing module;
the voice processing module numbers the voice acquisition modules and marks the number as i, wherein the i represents the number of the voice acquisition module; 1,2 … … n;
the voice processing module acquires the space linear distance between the voice acquisition module and the sound source, and marks the space linear distance between the voice acquisition module and the sound source as Li;
the voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data, acquires an amplitude value and a frequency value, and marks the amplitude value and the frequency value as Zfij and Plij respectively; where j denotes a number of a single frame of voice data, j is 1,2 … … m;
the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula isWherein c is a proportionality coefficient, and c is related to the timbre of the sound source;
the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module;
the voice analysis module is used for analyzing the data characteristic coefficient TZij of the single-frame voice data, and the specific analysis process comprises the following steps:
the voice analysis module acquires a spatial linear distance Li between the voice acquisition module and a sound source; the voice analysis module acquires a data characteristic coefficient TZij of single-frame voice data;
the voice analysis module carries out data arrangement on the acquired data characteristic coefficient TZij of the single-frame voice data according to different frame numbers and different voice acquisition modules, and the arrangement form is as follows:
TZ11、TZ12、TZ13、TZ14、TZ15……TZ1m;
TZ21、TZ22、TZ23、TZ24、TZ25……TZ2m;
……
TZn1、TZn2、TZn3、TZn4、TZn5……TZnm;
it should be noted that, when different collected voice data are processed into single-frame voice data for different voice collection modules, the total amount of the single-frame voice data may be different, that is, the values of different voice collection modules m may be different;
the voice analysis module randomly selects voice data acquired by one of the data acquisition modules as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij;
processing the rest single-frame voice data in the same way to obtain different contrast values;
combining different contrast values into different number sequences, namely a reference number sequence, a number sequence 1 and a number sequence 2 … … number sequence n-1;
d11, D12, D13, D14, D15 … … D1 m-1; (reference series)
D21, D22, D23, D24, D25 … … D2 m-1; (array 1)
……
Dn1, Dn2, Dn3, Dn4, Dn5 … … Dnm-1; (array n-1)
Comparing Dij in the number sequence 1 and the number sequence 2 … …, the number sequence n-1 with Dij in the reference number sequence respectively, when the contrast value continuously exceeding 10 bits is consistent or the quotient of the contrast value is within (0.95-1.05), indicating that the single-frame voice data can be adopted, and marking the adopted single-frame voice data as the single-frame voice data to be aligned;
the voice analysis module sends the single-frame voice data to be aligned to the voice combination module; the voice combination module obtains a first comparison value continuously exceeding 10-bit comparison values and being consistent or the quotient of the comparison values being within (0.95-1.05), further obtains the position of corresponding single-frame voice data, takes the single-frame voice data as an alignment standard, starts to carry out voice combination one by one from the single-frame voice data, and finally completes voice alignment.
Compared with the prior art, the invention has the beneficial effects that:
1. the voice acquisition module is specifically equipment with a recording function or a microphone; the voice acquisition modules are distributed around the sound source, have different spatial distances with the sound source and are defaulted to be the same equipment; the consistency of the voice data of the sound source is guaranteed, inaccuracy of later-stage voice alignment caused by different acquisition devices is avoided, and accuracy of voice alignment is improved.
2. The voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data to acquire an amplitude value and a frequency value, and respectively marks the amplitude value and the frequency value as Zfij and Plij; the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula isc is related to the timbre of the sound source; the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module; by processing the speech data, late stage speech alignment is facilitated.
3. The voice analysis module of the invention arbitrarily selects the voice data collected by one data collection module as the reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij; processing the rest single-frame voice data in the same way to obtain different contrast values; combining different contrast values into different number sequences, namely a reference number sequence, a number sequence 1 and a number sequence 2 … … number sequence n-1;
d11, D12, D13, D14, D15 … … D1 m-1; (reference series)
D21, D22, D23, D24, D25 … … D2 m-1; (array 1)
……
Dn1, Dn2, Dn3, Dn4, Dn5 … … Dnm-1; (array n-1)
Comparing Dij in the sequence 1 and the sequence 2 … … and the sequence n-1 with Dij in the reference sequence respectively, when the contrast value continuously exceeding 10 bits is consistent or the quotient of the contrast value is within (0.95-1.05), indicating that the single frame voice data can be adopted, and marking the adopted single frame voice data as the single frame voice data to be aligned. And the alignment of the voice is realized in an array mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of a speech alignment method based on multi-source speech data according to the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a voice alignment method based on multi-source voice data includes a voice alignment system based on multi-source voice data, including a plurality of voice acquisition modules, a plurality of voice analysis modules, a plurality of voice processing modules and a voice combination module, where the voice acquisition modules are respectively located around a sound source, and the voice acquisition modules are configured to acquire voice data of different positions of the same sound source and send the acquired voice data of the sound source to the voice processing module;
the voice processing module is used for processing the voice data sent by the voice acquisition modules; the processed voice data are sent to a voice analysis module;
the voice analysis module is used for carrying out voice alignment on the processed voice data; sending the aligned voice data to a voice combination module;
and the voice combination module performs voice combination on the aligned voice data.
It should be noted that the voice acquisition module is specifically some devices with a recording function or a microphone; the voice acquisition modules are distributed around the sound source, have different spatial distances with the sound source and are defaulted to be the same equipment;
the voice acquisition modules send acquired voice data to the voice processing module;
the voice processing module numbers the voice acquisition modules and marks the number as i, wherein the i represents the number of the voice acquisition module; 1,2 … … n;
the voice processing module acquires the space linear distance between the voice acquisition module and the sound source, and marks the space linear distance between the voice acquisition module and the sound source as Li;
the voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data, acquires an amplitude value and a frequency value, and marks the amplitude value and the frequency value as Zfij and Plij respectively; where j denotes a number of a single frame of voice data, j is 1,2 … … m;
the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula isWherein c is a proportionality coefficient, and c is related to the timbre of the sound source;
the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module;
the voice analysis module is used for analyzing the data characteristic coefficient TZij of the single-frame voice data, and the specific analysis process comprises the following steps:
the voice analysis module acquires a spatial linear distance Li between the voice acquisition module and a sound source; the voice analysis module acquires a data characteristic coefficient TZij of single-frame voice data;
the voice analysis module carries out data arrangement on the acquired data characteristic coefficient TZij of the single-frame voice data according to different frame numbers and different voice acquisition modules, and the arrangement form is as follows:
TZ11、TZ12、TZ13、TZ14、TZ15……TZ1m;
TZ21、TZ22、TZ23、TZ24、TZ25……TZ2m;
……
TZn1、TZn2、TZn3、TZn4、TZn5……TZnm;
it should be noted that, when different collected voice data are processed into single-frame voice data for different voice collection modules, the total amount of the single-frame voice data may be different, that is, the values of different voice collection modules m may be different;
the voice analysis module randomly selects voice data acquired by one of the data acquisition modules as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij;
processing the rest single-frame voice data in the same way to obtain different contrast values;
combining different contrast values into different number sequences, namely a reference number sequence, a number sequence 1 and a number sequence 2 … … number sequence n-1;
d11, D12, D13, D14, D15 … … D1 m-1; (reference series)
D21, D22, D23, D24, D25 … … D2 m-1; (array 1)
……
Dn1, Dn2, Dn3, Dn4, Dn5 … … Dnm-1; (array n-1)
Comparing Dij in the number sequence 1 and the number sequence 2 … …, the number sequence n-1 with Dij in the reference number sequence respectively, when the contrast value continuously exceeding 10 bits is consistent or the quotient of the contrast value is within (0.95-1.05), indicating that the single-frame voice data can be adopted, and marking the adopted single-frame voice data as the single-frame voice data to be aligned;
the voice analysis module sends the single-frame voice data to be aligned to the voice combination module; the voice combination module obtains a first comparison value continuously exceeding 10-bit comparison values and being consistent or the quotient of the comparison values being within (0.95-1.05), further obtains the position of corresponding single-frame voice data, takes the single-frame voice data as an alignment standard, starts to carry out voice combination one by one from the single-frame voice data, and finally completes voice alignment.
The above formulas are all calculated by removing dimensions and taking numerical values thereof, the formula is a formula which is obtained by acquiring a large amount of data and performing software simulation to obtain the closest real situation, and the preset parameters and the preset threshold value in the formula are set by the technical personnel in the field according to the actual situation or obtained by simulating a large amount of data.
The working principle of the invention is as follows: the voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module; processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module; performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module; the voice analysis module carries out data arrangement on the data characteristic coefficient TZij of the acquired single-frame voice data according to different frame numbers and different voice acquisition modules, and the voice analysis module randomly selects the voice data acquired by one data acquisition module as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij; processing the rest single-frame voice data in the same way to obtain different contrast values; combining different contrast values into different number sequences, comparing Dij in different number sequences with Dij in a reference number sequence respectively, when the contrast values continuously exceed 10 bits and are consistent or the quotient of the contrast values is within (0.95-1.05), indicating that single-frame voice data can be adopted, and marking the adopted single-frame voice data as single-frame voice data to be aligned; and carrying out voice combination on the aligned voice data through a voice combination module.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and there may be other divisions when the actual implementation is performed; the modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the method of the embodiment.
It will also be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above examples are only intended to illustrate the technical process of the present invention and not to limit the same, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical process of the present invention without departing from the spirit and scope of the technical process of the present invention.
Claims (5)
1. A voice alignment method based on multi-source voice data is characterized by comprising the following steps:
the method comprises the following steps: the voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module;
step two: processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module;
step three: performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module;
the voice analysis module carries out data arrangement on the data characteristic coefficient TZij of the acquired single-frame voice data according to different frame numbers and different voice acquisition modules, and the voice analysis module randomly selects the voice data acquired by one data acquisition module as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij;
processing the rest single-frame voice data in the same way to obtain different contrast values;
combining different contrast values into different number sequences, comparing Dij in different number sequences with Dij in a reference number sequence respectively, when the contrast values continuously exceed 10 bits and are consistent or the quotient of the contrast values is within (0.95-1.05), indicating that single-frame voice data can be adopted, and marking the adopted single-frame voice data as single-frame voice data to be aligned;
step four: and carrying out voice combination on the aligned voice data through a voice combination module.
2. The method according to claim 1, wherein the voice acquisition module is specifically a device with a recording function; the voice acquisition modules are distributed around the sound source and have different spatial distances with the sound source.
3. The method of claim 1, wherein the voice processing module numbers a plurality of voice acquisition modules as i, wherein i represents a number of the voice acquisition modules; 1,2 … … n;
the voice processing module acquires the space linear distance between the voice acquisition module and the sound source, and marks the space linear distance between the voice acquisition module and the sound source as Li;
the voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data, acquires an amplitude value and a frequency value, and marks the amplitude value and the frequency value as Zfij and Plij respectively; where j denotes a number of a single frame of voice data, j is 1,2 … … m;
the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula isWherein c is a proportionality coefficient, and c is related to the timbre of the sound source;
and the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module.
4. The method according to claim 3, wherein for different speech acquisition modules, when different collected speech data are processed into single-frame speech data, the total amount of the single-frame speech data is different, that is, the values of different speech acquisition modules m are different.
5. The method of claim 1, wherein the speech combination module obtains a first contrast value that is continuously more than 10 contrast values consistent or a quotient of the contrast values is within (0.95-1.05), and further obtains a position of a corresponding single frame of speech data, and uses the single frame of speech data as an alignment standard, and performs speech combination one by one from the single frame of speech data, and finally completes speech alignment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110591658.4A CN113409815B (en) | 2021-05-28 | 2021-05-28 | Voice alignment method based on multi-source voice data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110591658.4A CN113409815B (en) | 2021-05-28 | 2021-05-28 | Voice alignment method based on multi-source voice data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113409815A true CN113409815A (en) | 2021-09-17 |
CN113409815B CN113409815B (en) | 2022-02-11 |
Family
ID=77674998
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110591658.4A Active CN113409815B (en) | 2021-05-28 | 2021-05-28 | Voice alignment method based on multi-source voice data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113409815B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030220789A1 (en) * | 2002-05-21 | 2003-11-27 | Kepuska Veton K. | Dynamic time warping of speech |
CN105989846A (en) * | 2015-06-12 | 2016-10-05 | 乐视致新电子科技(天津)有限公司 | Multi-channel speech signal synchronization method and device |
US9697849B1 (en) * | 2016-07-25 | 2017-07-04 | Gopro, Inc. | Systems and methods for audio based synchronization using energy vectors |
CN107657947A (en) * | 2017-09-20 | 2018-02-02 | 百度在线网络技术(北京)有限公司 | Method of speech processing and its device based on artificial intelligence |
CN108682436A (en) * | 2018-05-11 | 2018-10-19 | 北京海天瑞声科技股份有限公司 | Voice alignment schemes and device |
CN109192223A (en) * | 2018-09-20 | 2019-01-11 | 广州酷狗计算机科技有限公司 | The method and apparatus of audio alignment |
EP3573059A1 (en) * | 2018-05-25 | 2019-11-27 | Dolby Laboratories Licensing Corp. | Dialogue enhancement based on synthesized speech |
CN111276156A (en) * | 2020-01-20 | 2020-06-12 | 深圳市数字星河科技有限公司 | Real-time voice stream monitoring method |
CN111383658A (en) * | 2018-12-29 | 2020-07-07 | 广州市百果园信息技术有限公司 | Method and device for aligning audio signals |
CN211628033U (en) * | 2019-07-15 | 2020-10-02 | 兰州工业学院 | Container anti-drop monitoring and transmission system |
US20210065676A1 (en) * | 2019-08-28 | 2021-03-04 | International Business Machines Corporation | Speech characterization using a synthesized reference audio signal |
-
2021
- 2021-05-28 CN CN202110591658.4A patent/CN113409815B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030220789A1 (en) * | 2002-05-21 | 2003-11-27 | Kepuska Veton K. | Dynamic time warping of speech |
CN105989846A (en) * | 2015-06-12 | 2016-10-05 | 乐视致新电子科技(天津)有限公司 | Multi-channel speech signal synchronization method and device |
US9697849B1 (en) * | 2016-07-25 | 2017-07-04 | Gopro, Inc. | Systems and methods for audio based synchronization using energy vectors |
CN107657947A (en) * | 2017-09-20 | 2018-02-02 | 百度在线网络技术(北京)有限公司 | Method of speech processing and its device based on artificial intelligence |
CN108682436A (en) * | 2018-05-11 | 2018-10-19 | 北京海天瑞声科技股份有限公司 | Voice alignment schemes and device |
EP3573059A1 (en) * | 2018-05-25 | 2019-11-27 | Dolby Laboratories Licensing Corp. | Dialogue enhancement based on synthesized speech |
CN109192223A (en) * | 2018-09-20 | 2019-01-11 | 广州酷狗计算机科技有限公司 | The method and apparatus of audio alignment |
CN111383658A (en) * | 2018-12-29 | 2020-07-07 | 广州市百果园信息技术有限公司 | Method and device for aligning audio signals |
CN211628033U (en) * | 2019-07-15 | 2020-10-02 | 兰州工业学院 | Container anti-drop monitoring and transmission system |
US20210065676A1 (en) * | 2019-08-28 | 2021-03-04 | International Business Machines Corporation | Speech characterization using a synthesized reference audio signal |
CN111276156A (en) * | 2020-01-20 | 2020-06-12 | 深圳市数字星河科技有限公司 | Real-time voice stream monitoring method |
Non-Patent Citations (2)
Title |
---|
JENNIFER LISTGARTEN ET AL: "Multiple Alignment of Continuous Time Series", 《ADVANCES IN NEURAL INFORMATION》 * |
赖家豪: "基于深度学习的语音转换研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113409815B (en) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
ES2774018T3 (en) | Method and system for evaluating the sound quality of a human voice | |
CN109599093B (en) | Intelligent quality inspection keyword detection method, device and equipment and readable storage medium | |
CN108198562A (en) | A kind of method and system for abnormal sound in real-time positioning identification animal house | |
CN105469807B (en) | A kind of more fundamental frequency extracting methods and device | |
CN115358718A (en) | Noise pollution classification and real-time supervision method based on intelligent monitoring front end | |
CN106375780A (en) | Method and apparatus for generating multimedia file | |
CN117095694A (en) | Bird song recognition method based on tag hierarchical structure attribute relationship | |
CN106571146A (en) | Noise signal determining method, and voice de-noising method and apparatus | |
CN111508524A (en) | Method and system for identifying voice source equipment | |
CN105845126A (en) | Method for automatic English subtitle filling of English audio image data | |
CN113409815B (en) | Voice alignment method based on multi-source voice data | |
CN114157023B (en) | Distribution transformer early warning information acquisition method | |
CN102184733B (en) | Audio attention-based audio quality evaluation system and method | |
CN113270110A (en) | ZPW-2000A track circuit transmitter and receiver fault diagnosis method | |
CN109299312B (en) | Music rhythm analysis method based on big data | |
CN111179972A (en) | Human voice detection algorithm based on deep learning | |
CN111081222A (en) | Speech recognition method, speech recognition apparatus, storage medium, and electronic apparatus | |
CN108010533A (en) | The automatic identifying method and device of voice data code check | |
CN102820037A (en) | Chinese initial and final visualization method based on combination feature | |
CN108271017A (en) | The audio loudness measuring system and method for digital broadcast television | |
CN108769874B (en) | Method and device for separating audio in real time | |
CN107025902A (en) | Data processing method and device | |
Li et al. | Output-based objective speech quality measurement using continuous Hidden Markov Models | |
CN114372513A (en) | Training method, classification method, equipment and medium of bird sound recognition model | |
CN111341321A (en) | Matlab-based spectrogram generating and displaying method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |