CN108182945A - A kind of more voice cents based on vocal print feature are from method and device - Google Patents
A kind of more voice cents based on vocal print feature are from method and device Download PDFInfo
- Publication number
- CN108182945A CN108182945A CN201810201281.5A CN201810201281A CN108182945A CN 108182945 A CN108182945 A CN 108182945A CN 201810201281 A CN201810201281 A CN 201810201281A CN 108182945 A CN108182945 A CN 108182945A
- Authority
- CN
- China
- Prior art keywords
- audio
- voice
- source file
- institute
- audio source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Abstract
The invention discloses a kind of more voice cents based on vocal print feature from method and device, wherein method includes:S1, acquisition include the audio source file of at least 2 voice sounds;S2, by the format conversion of audio source file be pcm forms audio file;S3, the audio file of pcm forms is cut into several voice units according to default step-length and default Cutting Length, wherein, default step-length is less than default Cutting Length;S4, speech characteristic parameter in each voice unit is extracted successively;S5, the speech characteristic parameter for comparing all voice units two-by-two successively, and calculate the matching value between the speech characteristic parameter of two voice units;S6, whether matching value between the speech characteristic parameters of two voice units is judged higher than predetermined threshold value, if so, preserving in order two voice units to same audio collection;S7, voice unit splicings all in same audio collection as single audio subfile and are preserved.
Description
Technical field
The present invention relates to voice separation technology field more particularly to a kind of more voice sound separation methods based on vocal print feature
And device.
Background technology
Present many momentous conferences are recorded, and have the record of many forms, such as voice, word etc., in this way
Meeting review or meeting playback can be carried out afterwards.But some scenes are frequently encountered, need everyone sound list
It solely out preserves, in this way convenient for preserving, accomplishes fluently label etc., can be played back later for individual.
Word can separate the record of each people with a part in a conference by the record of different people, but voice is but done not
It arrives, because at meeting scene, all speak, and proprietary sound can be all entered into a section audio, in this way in audio
Later stage is difficult people is marked, such as I wants to listen what someone has said at that time, we can only go to look by word
That section of voice is looked for, such processing is time-consuming and laborious, and cannot exclude the probability of error.
Present minutes, the record processing in later stage is more artificial treatment, and record when needs manually to go using text
Word record, even if having used recording or videograph, record later the later stage a large amount of manpower is still needed to go to handle, can just do
It is marked to a section audio by people, not only expends a large amount of manpower and materials in this way, since the resolution of human ear has error,
The receptible sound frequency of human ear is conditional, and the effect found out has certain subjective effect, can be to the result separated
Have an impact, result in the technical issues of voice separating resulting error is larger.
Invention content
The present invention provides a kind of more voice cents based on vocal print feature from method and device, solve at present to meeting
In the recording processing of record, need to expend a large amount of manpower and materials, and since the resolution of human ear has error, human ear is receptible
Sound frequency is conditional, and the effect found out has certain subjective effect, can be had an impact to the result separated, caused
The technical issues of voice separating resulting error is larger.
The present invention provides a kind of more voice sound separation methods based on vocal print feature, including:
S1, acquisition include the audio source file of at least 2 voice sounds;
S2, by the format conversion of the audio source file be pcm forms audio file;
S3, the audio file of the pcm forms is cut into several voices according to default step-length and default Cutting Length
Unit, wherein, the default step-length is less than the default Cutting Length;
S4, speech characteristic parameter in each institute's speech units is extracted successively;
S5, the speech characteristic parameter for comparing all institute's speech units two-by-two successively, and calculate two voices
Matching value between the speech characteristic parameter of unit;
S6, judge matching value between the speech characteristic parameters of two institute's speech units whether higher than default threshold
Value, if so, preserving in order two institute's speech units to same audio collection;
S7, institute's speech units all in same audio collection are spliced into as single audio subfile and preserved.
Optionally, the step S2 is specifically included:
It reads byte length, sample rate and the channel information of the audio source file and stores into information bank;
The byte length of the audio source file, sample rate and channel information are removed, and are converted to the audio of pcm forms
File.
Optionally, it after the step S2, is further included before the step S3:
Byte length, sample rate and the channel information of the audio source file in described information storehouse, described in removal
Blank parts in audio source file.
Optionally, the step S7 is specifically included:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according to described information storehouse
In the audio source file byte length, sample rate and channel information to the single audio subfile into row information add
After preserve.
Optionally, it after the step S1, is further included before the step S2:
The audio source file is carried out at sampling processing and/or preemphasis processing and/or pre-filtering processing and/or adding window
Reason and/or end-point detection processing.
The present invention provides a kind of more voice sound separators based on vocal print feature, including:
Acquiring unit, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit, for by the format conversion of the audio source file be pcm forms audio file;
Cutter unit, for cutting into the audio file of the pcm forms according to default step-length and default Cutting Length
Several voice units, wherein, the default step-length is less than the default Cutting Length;
Feature extraction unit, for extracting the speech characteristic parameter in each institute's speech units successively;
Contrast conting unit for comparing the speech characteristic parameter of all institute's speech units two-by-two successively, and is counted
Calculate the matching value between the speech characteristic parameter of two institute's speech units;
Judging unit, for judging whether the matching value between the speech characteristic parameter of two institute's speech units is high
In predetermined threshold value, if so, preserving in order two institute's speech units to same audio collection;
Splice storage unit, for being spliced into institute's speech units all in same audio collection for single audio subfile
And it preserves.
Optionally, the format conversion unit specifically includes:
Reading subunit, for read byte length, sample rate and the channel information of the audio source file and store to
In information bank;
Conversion subunit for the byte length of the audio source file, sample rate and channel information to be removed, and is converted
Audio file for pcm forms.
Optionally, a kind of more voice sound separators based on vocal print feature provided by the invention further include:
Blank cell is removed, for byte length, the sample rate harmony of the audio source file in described information storehouse
Road information removes the blank parts in the audio source file.
Optionally, splicing storage unit is additionally operable to:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according to described information storehouse
In the audio source file byte length, sample rate and channel information to the single audio subfile into row information add
After preserve.
Optionally, a kind of more voice sound separators based on vocal print feature provided by the invention further include:
Pretreatment unit, for carrying out sampling processing and/or preemphasis processing and/or pre-filtering to the audio source file
Processing and/or windowing process and/or end-point detection processing.
As can be seen from the above technical solutions, the present invention has the following advantages:
The present invention provides a kind of more voice sound separation methods based on vocal print feature, including:S1, it obtains comprising at least 2
The audio source file of voice sound;S2, by the format conversion of the audio source file be pcm forms audio file;S3, according to pre-
If the audio file of the pcm forms is cut into several voice units by step-length and default Cutting Length, wherein, it is described default
Step-length is less than the default Cutting Length;S4, speech characteristic parameter in each institute's speech units is extracted successively;S5, successively
The speech characteristic parameter of all institute's speech units is compared two-by-two, and the voice for calculating two institute's speech units is special
Levy the matching value between parameter;S6, judge whether is matching value between the speech characteristic parameters of two institute's speech units
Higher than predetermined threshold value, if so, preserving in order two institute's speech units to same audio collection;S7, by same audio
All institute's speech units are spliced into as single audio subfile and preserve in collection.
In the present invention, by by audio-source file division into several voice units, and successively extract voice unit language
Sound characteristic parameter by comparing the speech characteristic parameter of voice unit two-by-two, and calculates matching value, judges whether matching value is higher than
Predetermined threshold value determines whether two sections of voice units belong to the voice of same person, is as a result, more people by audio-source file process
Single audio subfile, solve in handling at present the recording of minutes, need to expend a large amount of manpower and materials, and due to
The resolution of human ear has error, and the receptible sound frequency of human ear is conditional, and the effect found out has certain subjective effect
The technical issues of fruit can have an impact the result separated, and caused voice separating resulting error is larger.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used also
To obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow of one embodiment of more voice sound separation methods based on vocal print feature provided by the invention
Schematic diagram;
Fig. 2 is a kind of stream of another embodiment of more voice sound separation methods based on vocal print feature provided by the invention
Journey schematic diagram;
Fig. 3 is a kind of structure of one embodiment of more voice sound separators based on vocal print feature provided by the invention
Schematic diagram;
Fig. 4 is a kind of structure of one embodiment of more voice sound separators based on vocal print feature provided by the invention
Schematic diagram.
Specific embodiment
An embodiment of the present invention provides a kind of more voice cents based on vocal print feature from method and device, solve at present
To in the recording processing of minutes, needing to expend a large amount of manpower and materials, and since the resolution of human ear has error, human ear energy
The sound frequency of receiving is conditional, and the effect found out has certain subjective effect, can be had an impact to the result separated,
The technical issues of caused voice separating resulting error is larger.
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention
Attached drawing in embodiment is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that disclosed below
Embodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field
All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention
Range.
Referring to Fig. 1, the present invention provides a kind of one embodiment of more voice sound separation methods based on vocal print feature,
Including:
101st, the audio source file for including at least 2 voice sounds is obtained;
102nd, by audio file of the format conversion of audio source file for pcm forms;
103rd, the audio file of pcm forms is cut into several voice lists according to default step-length and default Cutting Length
Member, wherein, default step-length is less than default Cutting Length;
104th, the speech characteristic parameter in each voice unit is extracted successively;
105th, the speech characteristic parameter of all voice units is compared two-by-two successively, and the voice for calculating two voice units is special
Levy the matching value between parameter;
106th, whether the matching value between the speech characteristic parameter of two voice units is judged higher than predetermined threshold value, if so,
Then two voice units are preserved in order to same audio collection;
107th, voice unit splicings all in same audio collection as single audio subfile and are preserved.
In the embodiment of the present invention, by by audio-source file division into several voice units, and successively extract voice list
The speech characteristic parameter of member, by comparing the speech characteristic parameter of voice unit two-by-two, and calculates matching value, judges that matching value is
It is no to determine whether two sections of voice units belong to the voice of same person higher than predetermined threshold value, as a result, by audio-source file process
For the single audio subfile of more people, solve in handling at present the recording of minutes, need to expend a large amount of manpower and materials,
And since the resolution of human ear has error, the receptible sound frequency of human ear is conditional, and the effect found out has centainly
The technical issues of subjective effect can have an impact the result separated, and caused voice separating resulting error is larger.
It is a kind of saying for one embodiment of more voice sound separation methods based on vocal print feature provided by the invention above
It is bright, a kind of another embodiment of more voice sound separation methods based on vocal print feature provided by the invention will be said below
It is bright.
Referring to Fig. 2, a kind of another implementation the present invention provides more voice sound separation methods based on vocal print feature
Example, including:
201st, the audio source file for including at least 2 voice sounds is obtained;
It should be noted that when the audio source file of processing is minutes or report recording, lead in audio source file
Often the sound including at least 2 people just needs to carry out voice separating treatment.
202nd, audio source file is carried out at sampling processing and/or preemphasis processing and/or pre-filtering processing and/or adding window
Reason and/or end-point detection processing;
It should be noted that it obtains comprising after at least audio source file of 2 voice sounds, needing to carry out audio source file
Sampling processing and/or preemphasis processing and/or pre-filtering processing and/or windowing process and/or end-point detection processing pretreatment.
203rd, byte length, sample rate and the channel information of audio source file are read and is stored into information bank;
It should be noted that after carrying out pretreatment operation to audio source file, read the byte length of audio source file, adopt
Sample rate and channel information, and all information are stored into information bank so as to subsequent processing.
204th, the byte length of audio source file, sample rate and channel information are removed, and is converted to the audio of pcm forms
File;
It should be noted that the byte length of audio source file, sample rate and channel information are got rid of, and be converted to
The audio file of pcm forms removes the audio file of the pcm forms of head.
205th, byte length, sample rate and the channel information of the audio source file in information bank removes audio source document
Blank parts in part;
It should be noted that byte length, sample rate and the channel information of the audio source file in information bank, to sound
Frequency source file carries out space management, removes the blank information part in audio source file.
206th, the audio file of pcm forms is cut into several voice lists according to default step-length and default Cutting Length
Member, wherein, default step-length is less than default Cutting Length;
It should be noted that after the blank parts in eliminating audio source file, according to default step-length and default cutting
The audio file of pcm forms is cut into several voice units by length, wherein, default step-length is less than default Cutting Length, i.e.,
Redundancy is cut, and is avoided in cutting process, and the sound of a people is cut off or a word cuts into two syllables.
207th, the speech characteristic parameter in each voice unit is extracted successively;
It should be noted that extract the speech characteristic parameter in each voice unit successively, speech characteristic parameter include but
It is not limited to mel-frequency cepstrum coefficient.
208th, the speech characteristic parameter of all voice units is compared two-by-two successively, and the voice for calculating two voice units is special
Levy the matching value between parameter;
It should be noted that after the speech characteristic parameter in being extracted each voice unit, compare two-by-two successively all
The speech characteristic parameter of voice unit, and the matching value between the speech characteristic parameter of two voice units is calculated, for example, separation
Go out 5 sections of voice units, be then compared successively, need to carry out 5+4+3+2+1 comparison, while two voices of calculating ratio centering
Matching value between the speech characteristic parameter of unit.
209th, whether the matching value between the speech characteristic parameter of two voice units is judged higher than predetermined threshold value, if so,
Then two voice units are preserved in order to same audio collection;
It should be noted that whether the matching value between judging the speech characteristic parameters of two voice units is higher than default threshold
Value, if so, meaning that two voice units belong to the sound of same person, two voice units are preserved in order to same
In audio collection, i.e., in the audio collection of one people.
210th, voice unit splicings all in same audio collection are become into single audio subfile, and according in information bank
Byte length, sample rate and the channel information of audio source file preserve after being added to single audio subfile into row information;
It should be noted that by voice unit splicings all in same audio collection as single audio subfile, and according to
After byte length, sample rate and the channel information of audio source file in information bank add single audio subfile into row information
It preserves.
It is to a kind of another embodiment of more voice sound separation methods based on vocal print feature provided by the invention above
The explanation of progress, below will be to a kind of one embodiment of more voice sound separators based on vocal print feature provided by the invention
It illustrates.
Referring to Fig. 3, the present invention provides a kind of one embodiment of more voice sound separators based on vocal print feature,
Including:
Acquiring unit 301, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit 302, for by the format conversion of audio source file be pcm forms audio file;
Cutter unit 303, if for cutting into the audio file of pcm forms according to default step-length and default Cutting Length
Dry voice unit, wherein, default step-length is less than default Cutting Length;
Feature extraction unit 304, for extracting the speech characteristic parameter in each voice unit successively;
Contrast conting unit 305 for comparing the speech characteristic parameter of all voice units two-by-two successively, and calculates two
Matching value between the speech characteristic parameter of voice unit;
Judging unit 306, for judging the matching value between the speech characteristic parameter of two voice units whether higher than pre-
If threshold value, if so, preserving in order two voice units to same audio collection;
Splice storage unit 307, for voice unit splicings all in same audio collection to be become single audio subfile
And it preserves.
In the embodiment of the present invention, by cutter unit 303 by audio-source file division into several voice units, and pass through
Feature extraction unit 304 extracts the speech characteristic parameter of voice unit successively, compares voice two-by-two by contrast conting unit 305
The speech characteristic parameter of unit, and matching value is calculated, it is true that last judging unit 306 judges whether matching value comes higher than predetermined threshold value
Whether fixed two sections of voice units belong to the voice of same person, as a result, by single audio of the audio-source file process for more people
File solves in handling at present the recording of minutes, needs to expend a large amount of manpower and materials, and since the resolution of human ear is
There is error, the receptible sound frequency of human ear is conditional, and the effect found out has certain subjective effect, can be to isolating
The technical issues of result come has an impact, and caused voice separating resulting error is larger.
Be above to a kind of one embodiment of more voice sound separators based on vocal print feature provided by the invention into
Capable explanation, below will be to a kind of another embodiment of more voice sound separators based on vocal print feature provided by the invention
It illustrates.
Referring to Fig. 4, a kind of another implementation the present invention provides more voice sound separators based on vocal print feature
Example, including:
Acquiring unit 401, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit 402, for by the format conversion of audio source file be pcm forms audio file;
Format conversion unit 402 specifically includes:
Reading subunit 4021, for read the byte length of audio source file, sample rate and channel information and store to
In information bank;
Conversion subunit 4022 for the byte length of audio source file, sample rate and channel information to be removed, and is converted
Audio file for pcm forms;
Blank cell 403 is removed, is believed for the byte length of the audio source file in information bank, sample rate and sound channel
Breath removes the blank parts in audio source file;
Cutter unit 404, if for cutting into the audio file of pcm forms according to default step-length and default Cutting Length
Dry voice unit, wherein, default step-length is less than default Cutting Length;
Feature extraction unit 405, for extracting the speech characteristic parameter in each voice unit successively;
Contrast conting unit 406 for comparing the speech characteristic parameter of all voice units two-by-two successively, and calculates two
Matching value between the speech characteristic parameter of voice unit;
Judging unit 407, for judging the matching value between the speech characteristic parameter of two voice units whether higher than pre-
If threshold value, if so, preserving in order two voice units to same audio collection;
Splice storage unit 408, for voice unit splicings all in same audio collection to be become single audio subfile,
And byte length, sample rate and the channel information of the audio source file in information bank to single audio subfile into row information
It is preserved after addition.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit can refer to the corresponding process in preceding method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of division of logic function can have other dividing mode, such as multiple units or component in actual implementation
It may be combined or can be integrated into another system or some features can be ignored or does not perform.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit
It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or uses
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially
The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products
It embodies, which is stored in a storage medium, is used including some instructions so that a computer
Equipment (can be personal computer, server or the network equipment etc.) performs the complete of each embodiment the method for the present invention
Portion or part steps.And aforementioned storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before
Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding
The technical solution recorded in each embodiment is stated to modify or carry out equivalent replacement to which part technical characteristic;And these
Modification is replaced, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of more voice sound separation methods based on vocal print feature, which is characterized in that including:
S1, acquisition include the audio source file of at least 2 voice sounds;
S2, by the format conversion of the audio source file be pcm forms audio file;
S3, the audio file of the pcm forms is cut into several voice units according to default step-length and default Cutting Length,
Wherein, the default step-length is less than the default Cutting Length;
S4, speech characteristic parameter in each institute's speech units is extracted successively;
S5, the speech characteristic parameter for comparing all institute's speech units two-by-two successively, and calculate two institute's speech units
The speech characteristic parameter between matching value;
S6, whether matching value between the speech characteristic parameters of two institute's speech units is judged higher than predetermined threshold value, if
It is then to preserve two institute's speech units to same audio collection in order;
S7, institute's speech units all in same audio collection are spliced into as single audio subfile and preserved.
2. more voice sound separation methods according to claim 1 based on vocal print feature, which is characterized in that the step S2
It specifically includes:
It reads byte length, sample rate and the channel information of the audio source file and stores into information bank;
The byte length of the audio source file, sample rate and channel information are removed, and are converted to the audio text of pcm forms
Part.
3. more voice sound separation methods according to claim 2 based on vocal print feature, which is characterized in that the step S2
Later, it is further included before the step S3:
Byte length, sample rate and the channel information of the audio source file in described information storehouse, remove the audio
Blank parts in source file.
4. more voice sound separation methods according to claim 3 based on vocal print feature, which is characterized in that the step S7
It specifically includes:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according in described information storehouse
Byte length, sample rate and the channel information of the audio source file are protected after being added to the single audio subfile into row information
It deposits.
5. more voice sound separation methods according to claim 1 based on vocal print feature, which is characterized in that the step S1
Later, it is further included before the step S2:
Sampling processing and/or preemphasis processing and/or pre-filtering processing and/or windowing process are carried out to the audio source file
And/or end-point detection processing.
6. a kind of more voice sound separators based on vocal print feature, which is characterized in that including:
Acquiring unit, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit, for by the format conversion of the audio source file be pcm forms audio file;
Cutter unit, it is several for cutting into the audio file of the pcm forms according to default step-length and default Cutting Length
A voice unit, wherein, the default step-length is less than the default Cutting Length;
Feature extraction unit, for extracting the speech characteristic parameter in each institute's speech units successively;
Contrast conting unit for comparing the speech characteristic parameter of all institute's speech units two-by-two successively, and calculates two
Matching value between the speech characteristic parameter of a institute's speech units;
Judging unit, for judging the matching value between the speech characteristic parameter of two institute's speech units whether higher than pre-
If threshold value, if so, preserving in order two institute's speech units to same audio collection;
Splice storage unit, for institute's speech units all in same audio collection to be spliced into as single audio subfile and protected
It deposits.
7. more voice sound separators according to claim 6 based on vocal print feature, which is characterized in that the form turns
Unit is changed to specifically include:
Reading subunit, for reading byte length, sample rate and the channel information of the audio source file and storing to information
In library;
Conversion subunit for the byte length of the audio source file, sample rate and channel information to be removed, and is converted to
The audio file of pcm forms.
8. more voice sound separators according to claim 7 based on vocal print feature, which is characterized in that further include:
Blank cell is removed, is believed for the byte length of the audio source file in described information storehouse, sample rate and sound channel
Breath, removes the blank parts in the audio source file.
9. more voice sound separators according to claim 8 based on vocal print feature, which is characterized in that splicing preserves single
Member is additionally operable to:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according in described information storehouse
Byte length, sample rate and the channel information of the audio source file are protected after being added to the single audio subfile into row information
It deposits.
10. more voice sound separators according to claim 6 based on vocal print feature, which is characterized in that further include:
Pretreatment unit is handled for carrying out sampling processing and/or preemphasis processing and/or pre-filtering to the audio source file
And/or windowing process and/or end-point detection are handled.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810201281.5A CN108182945A (en) | 2018-03-12 | 2018-03-12 | A kind of more voice cents based on vocal print feature are from method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810201281.5A CN108182945A (en) | 2018-03-12 | 2018-03-12 | A kind of more voice cents based on vocal print feature are from method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108182945A true CN108182945A (en) | 2018-06-19 |
Family
ID=62553436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810201281.5A Pending CN108182945A (en) | 2018-03-12 | 2018-03-12 | A kind of more voice cents based on vocal print feature are from method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108182945A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065023A (en) * | 2018-08-23 | 2018-12-21 | 广州势必可赢网络科技有限公司 | A kind of voice identification method, device, equipment and computer readable storage medium |
CN109147831A (en) * | 2018-09-26 | 2019-01-04 | 深圳壹账通智能科技有限公司 | A kind of voice connection playback method, terminal device and computer readable storage medium |
CN109346107A (en) * | 2018-10-10 | 2019-02-15 | 中山大学 | A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved |
CN109410934A (en) * | 2018-10-19 | 2019-03-01 | 深圳魔听文化科技有限公司 | A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature |
CN110322872A (en) * | 2019-06-05 | 2019-10-11 | 平安科技(深圳)有限公司 | Conference voice data processing method, device, computer equipment and storage medium |
CN110827849A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Human voice separation method and device for database building, terminal and readable storage medium |
CN110875036A (en) * | 2019-11-11 | 2020-03-10 | 广州国音智能科技有限公司 | Voice classification method, device, equipment and computer readable storage medium |
CN111105801A (en) * | 2019-12-03 | 2020-05-05 | 云知声智能科技股份有限公司 | Role voice separation method and device |
CN112863491A (en) * | 2021-03-12 | 2021-05-28 | 云知声智能科技股份有限公司 | Voice transcription method and device and electronic equipment |
CN113593578A (en) * | 2021-09-03 | 2021-11-02 | 北京紫涓科技有限公司 | Conference voice data acquisition method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105719659A (en) * | 2016-02-03 | 2016-06-29 | 努比亚技术有限公司 | Recording file separation method and device based on voiceprint identification |
CN106782565A (en) * | 2016-11-29 | 2017-05-31 | 重庆重智机器人研究院有限公司 | A kind of vocal print feature recognition methods and system |
CN107004427A (en) * | 2014-12-12 | 2017-08-01 | 华为技术有限公司 | Strengthen the signal processing apparatus of speech components in multi-channel audio signal |
-
2018
- 2018-03-12 CN CN201810201281.5A patent/CN108182945A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107004427A (en) * | 2014-12-12 | 2017-08-01 | 华为技术有限公司 | Strengthen the signal processing apparatus of speech components in multi-channel audio signal |
CN105719659A (en) * | 2016-02-03 | 2016-06-29 | 努比亚技术有限公司 | Recording file separation method and device based on voiceprint identification |
CN106782565A (en) * | 2016-11-29 | 2017-05-31 | 重庆重智机器人研究院有限公司 | A kind of vocal print feature recognition methods and system |
Non-Patent Citations (2)
Title |
---|
张效藩: "基于语音分离的声纹识别技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
郑燕琳,杨晓炯,许星宇: "电话语音中基于多说话人的声纹识别系统", 《电信科学》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065023A (en) * | 2018-08-23 | 2018-12-21 | 广州势必可赢网络科技有限公司 | A kind of voice identification method, device, equipment and computer readable storage medium |
CN109147831A (en) * | 2018-09-26 | 2019-01-04 | 深圳壹账通智能科技有限公司 | A kind of voice connection playback method, terminal device and computer readable storage medium |
CN109346107A (en) * | 2018-10-10 | 2019-02-15 | 中山大学 | A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved |
CN109346107B (en) * | 2018-10-10 | 2022-09-30 | 中山大学 | LSTM-based method for inversely solving pronunciation of independent speaker |
CN109410934A (en) * | 2018-10-19 | 2019-03-01 | 深圳魔听文化科技有限公司 | A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature |
CN110322872A (en) * | 2019-06-05 | 2019-10-11 | 平安科技(深圳)有限公司 | Conference voice data processing method, device, computer equipment and storage medium |
CN110875036A (en) * | 2019-11-11 | 2020-03-10 | 广州国音智能科技有限公司 | Voice classification method, device, equipment and computer readable storage medium |
CN110827849B (en) * | 2019-11-11 | 2022-07-26 | 广州国音智能科技有限公司 | Human voice separation method and device for database building, terminal and readable storage medium |
CN110827849A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Human voice separation method and device for database building, terminal and readable storage medium |
CN111105801A (en) * | 2019-12-03 | 2020-05-05 | 云知声智能科技股份有限公司 | Role voice separation method and device |
CN111105801B (en) * | 2019-12-03 | 2022-04-01 | 云知声智能科技股份有限公司 | Role voice separation method and device |
CN112863491A (en) * | 2021-03-12 | 2021-05-28 | 云知声智能科技股份有限公司 | Voice transcription method and device and electronic equipment |
CN113593578A (en) * | 2021-09-03 | 2021-11-02 | 北京紫涓科技有限公司 | Conference voice data acquisition method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108182945A (en) | A kind of more voice cents based on vocal print feature are from method and device | |
US10593332B2 (en) | Diarization using textual and audio speaker labeling | |
CN103035247B (en) | Based on the method and device that voiceprint is operated to audio/video file | |
US10026405B2 (en) | Method for speaker diarization | |
CN103500579B (en) | Audio recognition method, Apparatus and system | |
US6697564B1 (en) | Method and system for video browsing and editing by employing audio | |
CN105845129A (en) | Method and system for dividing sentences in audio and automatic caption generation method and system for video files | |
CN109065023A (en) | A kind of voice identification method, device, equipment and computer readable storage medium | |
CN111128223A (en) | Text information-based auxiliary speaker separation method and related device | |
CN108307250B (en) | Method and device for generating video abstract | |
CN108257592A (en) | A kind of voice dividing method and system based on shot and long term memory models | |
US20100057452A1 (en) | Speech interfaces | |
CN107967912A (en) | A kind of voice dividing method and device | |
CN104781862A (en) | Real-time traffic detection | |
CN106372653A (en) | Stack type automatic coder-based advertisement identification method | |
CN104410973A (en) | Recognition method and system for tape played phone fraud | |
US7349477B2 (en) | Audio-assisted video segmentation and summarization | |
CN109410934A (en) | A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature | |
Venkatesan et al. | Automatic language identification using machine learning techniques | |
US20070083367A1 (en) | Method and system for bandwidth efficient and enhanced concatenative synthesis based communication | |
EP1197952B1 (en) | Coding method of the prosody for a very low bit rate speech encoder | |
CN112579744A (en) | Method for controlling risk in online psychological consultation | |
US7571093B1 (en) | Method of identifying duplicate voice recording | |
CN111010484A (en) | Automatic quality inspection method for call recording | |
CN113921011A (en) | Audio processing method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180619 |
|
RJ01 | Rejection of invention patent application after publication |