CN108182945A - A kind of more voice cents based on vocal print feature are from method and device - Google Patents

A kind of more voice cents based on vocal print feature are from method and device Download PDF

Info

Publication number
CN108182945A
CN108182945A CN201810201281.5A CN201810201281A CN108182945A CN 108182945 A CN108182945 A CN 108182945A CN 201810201281 A CN201810201281 A CN 201810201281A CN 108182945 A CN108182945 A CN 108182945A
Authority
CN
China
Prior art keywords
audio
voice
source file
institute
audio source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810201281.5A
Other languages
Chinese (zh)
Inventor
黎智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Network Technology Co Ltd
Original Assignee
Guangzhou Speakin Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Network Technology Co Ltd filed Critical Guangzhou Speakin Network Technology Co Ltd
Priority to CN201810201281.5A priority Critical patent/CN108182945A/en
Publication of CN108182945A publication Critical patent/CN108182945A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention discloses a kind of more voice cents based on vocal print feature from method and device, wherein method includes:S1, acquisition include the audio source file of at least 2 voice sounds;S2, by the format conversion of audio source file be pcm forms audio file;S3, the audio file of pcm forms is cut into several voice units according to default step-length and default Cutting Length, wherein, default step-length is less than default Cutting Length;S4, speech characteristic parameter in each voice unit is extracted successively;S5, the speech characteristic parameter for comparing all voice units two-by-two successively, and calculate the matching value between the speech characteristic parameter of two voice units;S6, whether matching value between the speech characteristic parameters of two voice units is judged higher than predetermined threshold value, if so, preserving in order two voice units to same audio collection;S7, voice unit splicings all in same audio collection as single audio subfile and are preserved.

Description

A kind of more voice cents based on vocal print feature are from method and device
Technical field
The present invention relates to voice separation technology field more particularly to a kind of more voice sound separation methods based on vocal print feature And device.
Background technology
Present many momentous conferences are recorded, and have the record of many forms, such as voice, word etc., in this way Meeting review or meeting playback can be carried out afterwards.But some scenes are frequently encountered, need everyone sound list It solely out preserves, in this way convenient for preserving, accomplishes fluently label etc., can be played back later for individual.
Word can separate the record of each people with a part in a conference by the record of different people, but voice is but done not It arrives, because at meeting scene, all speak, and proprietary sound can be all entered into a section audio, in this way in audio Later stage is difficult people is marked, such as I wants to listen what someone has said at that time, we can only go to look by word That section of voice is looked for, such processing is time-consuming and laborious, and cannot exclude the probability of error.
Present minutes, the record processing in later stage is more artificial treatment, and record when needs manually to go using text Word record, even if having used recording or videograph, record later the later stage a large amount of manpower is still needed to go to handle, can just do It is marked to a section audio by people, not only expends a large amount of manpower and materials in this way, since the resolution of human ear has error, The receptible sound frequency of human ear is conditional, and the effect found out has certain subjective effect, can be to the result separated Have an impact, result in the technical issues of voice separating resulting error is larger.
Invention content
The present invention provides a kind of more voice cents based on vocal print feature from method and device, solve at present to meeting In the recording processing of record, need to expend a large amount of manpower and materials, and since the resolution of human ear has error, human ear is receptible Sound frequency is conditional, and the effect found out has certain subjective effect, can be had an impact to the result separated, caused The technical issues of voice separating resulting error is larger.
The present invention provides a kind of more voice sound separation methods based on vocal print feature, including:
S1, acquisition include the audio source file of at least 2 voice sounds;
S2, by the format conversion of the audio source file be pcm forms audio file;
S3, the audio file of the pcm forms is cut into several voices according to default step-length and default Cutting Length Unit, wherein, the default step-length is less than the default Cutting Length;
S4, speech characteristic parameter in each institute's speech units is extracted successively;
S5, the speech characteristic parameter for comparing all institute's speech units two-by-two successively, and calculate two voices Matching value between the speech characteristic parameter of unit;
S6, judge matching value between the speech characteristic parameters of two institute's speech units whether higher than default threshold Value, if so, preserving in order two institute's speech units to same audio collection;
S7, institute's speech units all in same audio collection are spliced into as single audio subfile and preserved.
Optionally, the step S2 is specifically included:
It reads byte length, sample rate and the channel information of the audio source file and stores into information bank;
The byte length of the audio source file, sample rate and channel information are removed, and are converted to the audio of pcm forms File.
Optionally, it after the step S2, is further included before the step S3:
Byte length, sample rate and the channel information of the audio source file in described information storehouse, described in removal Blank parts in audio source file.
Optionally, the step S7 is specifically included:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according to described information storehouse In the audio source file byte length, sample rate and channel information to the single audio subfile into row information add After preserve.
Optionally, it after the step S1, is further included before the step S2:
The audio source file is carried out at sampling processing and/or preemphasis processing and/or pre-filtering processing and/or adding window Reason and/or end-point detection processing.
The present invention provides a kind of more voice sound separators based on vocal print feature, including:
Acquiring unit, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit, for by the format conversion of the audio source file be pcm forms audio file;
Cutter unit, for cutting into the audio file of the pcm forms according to default step-length and default Cutting Length Several voice units, wherein, the default step-length is less than the default Cutting Length;
Feature extraction unit, for extracting the speech characteristic parameter in each institute's speech units successively;
Contrast conting unit for comparing the speech characteristic parameter of all institute's speech units two-by-two successively, and is counted Calculate the matching value between the speech characteristic parameter of two institute's speech units;
Judging unit, for judging whether the matching value between the speech characteristic parameter of two institute's speech units is high In predetermined threshold value, if so, preserving in order two institute's speech units to same audio collection;
Splice storage unit, for being spliced into institute's speech units all in same audio collection for single audio subfile And it preserves.
Optionally, the format conversion unit specifically includes:
Reading subunit, for read byte length, sample rate and the channel information of the audio source file and store to In information bank;
Conversion subunit for the byte length of the audio source file, sample rate and channel information to be removed, and is converted Audio file for pcm forms.
Optionally, a kind of more voice sound separators based on vocal print feature provided by the invention further include:
Blank cell is removed, for byte length, the sample rate harmony of the audio source file in described information storehouse Road information removes the blank parts in the audio source file.
Optionally, splicing storage unit is additionally operable to:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according to described information storehouse In the audio source file byte length, sample rate and channel information to the single audio subfile into row information add After preserve.
Optionally, a kind of more voice sound separators based on vocal print feature provided by the invention further include:
Pretreatment unit, for carrying out sampling processing and/or preemphasis processing and/or pre-filtering to the audio source file Processing and/or windowing process and/or end-point detection processing.
As can be seen from the above technical solutions, the present invention has the following advantages:
The present invention provides a kind of more voice sound separation methods based on vocal print feature, including:S1, it obtains comprising at least 2 The audio source file of voice sound;S2, by the format conversion of the audio source file be pcm forms audio file;S3, according to pre- If the audio file of the pcm forms is cut into several voice units by step-length and default Cutting Length, wherein, it is described default Step-length is less than the default Cutting Length;S4, speech characteristic parameter in each institute's speech units is extracted successively;S5, successively The speech characteristic parameter of all institute's speech units is compared two-by-two, and the voice for calculating two institute's speech units is special Levy the matching value between parameter;S6, judge whether is matching value between the speech characteristic parameters of two institute's speech units Higher than predetermined threshold value, if so, preserving in order two institute's speech units to same audio collection;S7, by same audio All institute's speech units are spliced into as single audio subfile and preserve in collection.
In the present invention, by by audio-source file division into several voice units, and successively extract voice unit language Sound characteristic parameter by comparing the speech characteristic parameter of voice unit two-by-two, and calculates matching value, judges whether matching value is higher than Predetermined threshold value determines whether two sections of voice units belong to the voice of same person, is as a result, more people by audio-source file process Single audio subfile, solve in handling at present the recording of minutes, need to expend a large amount of manpower and materials, and due to The resolution of human ear has error, and the receptible sound frequency of human ear is conditional, and the effect found out has certain subjective effect The technical issues of fruit can have an impact the result separated, and caused voice separating resulting error is larger.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used also To obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow of one embodiment of more voice sound separation methods based on vocal print feature provided by the invention Schematic diagram;
Fig. 2 is a kind of stream of another embodiment of more voice sound separation methods based on vocal print feature provided by the invention Journey schematic diagram;
Fig. 3 is a kind of structure of one embodiment of more voice sound separators based on vocal print feature provided by the invention Schematic diagram;
Fig. 4 is a kind of structure of one embodiment of more voice sound separators based on vocal print feature provided by the invention Schematic diagram.
Specific embodiment
An embodiment of the present invention provides a kind of more voice cents based on vocal print feature from method and device, solve at present To in the recording processing of minutes, needing to expend a large amount of manpower and materials, and since the resolution of human ear has error, human ear energy The sound frequency of receiving is conditional, and the effect found out has certain subjective effect, can be had an impact to the result separated, The technical issues of caused voice separating resulting error is larger.
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that disclosed below Embodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention Range.
Referring to Fig. 1, the present invention provides a kind of one embodiment of more voice sound separation methods based on vocal print feature, Including:
101st, the audio source file for including at least 2 voice sounds is obtained;
102nd, by audio file of the format conversion of audio source file for pcm forms;
103rd, the audio file of pcm forms is cut into several voice lists according to default step-length and default Cutting Length Member, wherein, default step-length is less than default Cutting Length;
104th, the speech characteristic parameter in each voice unit is extracted successively;
105th, the speech characteristic parameter of all voice units is compared two-by-two successively, and the voice for calculating two voice units is special Levy the matching value between parameter;
106th, whether the matching value between the speech characteristic parameter of two voice units is judged higher than predetermined threshold value, if so, Then two voice units are preserved in order to same audio collection;
107th, voice unit splicings all in same audio collection as single audio subfile and are preserved.
In the embodiment of the present invention, by by audio-source file division into several voice units, and successively extract voice list The speech characteristic parameter of member, by comparing the speech characteristic parameter of voice unit two-by-two, and calculates matching value, judges that matching value is It is no to determine whether two sections of voice units belong to the voice of same person higher than predetermined threshold value, as a result, by audio-source file process For the single audio subfile of more people, solve in handling at present the recording of minutes, need to expend a large amount of manpower and materials, And since the resolution of human ear has error, the receptible sound frequency of human ear is conditional, and the effect found out has centainly The technical issues of subjective effect can have an impact the result separated, and caused voice separating resulting error is larger.
It is a kind of saying for one embodiment of more voice sound separation methods based on vocal print feature provided by the invention above It is bright, a kind of another embodiment of more voice sound separation methods based on vocal print feature provided by the invention will be said below It is bright.
Referring to Fig. 2, a kind of another implementation the present invention provides more voice sound separation methods based on vocal print feature Example, including:
201st, the audio source file for including at least 2 voice sounds is obtained;
It should be noted that when the audio source file of processing is minutes or report recording, lead in audio source file Often the sound including at least 2 people just needs to carry out voice separating treatment.
202nd, audio source file is carried out at sampling processing and/or preemphasis processing and/or pre-filtering processing and/or adding window Reason and/or end-point detection processing;
It should be noted that it obtains comprising after at least audio source file of 2 voice sounds, needing to carry out audio source file Sampling processing and/or preemphasis processing and/or pre-filtering processing and/or windowing process and/or end-point detection processing pretreatment.
203rd, byte length, sample rate and the channel information of audio source file are read and is stored into information bank;
It should be noted that after carrying out pretreatment operation to audio source file, read the byte length of audio source file, adopt Sample rate and channel information, and all information are stored into information bank so as to subsequent processing.
204th, the byte length of audio source file, sample rate and channel information are removed, and is converted to the audio of pcm forms File;
It should be noted that the byte length of audio source file, sample rate and channel information are got rid of, and be converted to The audio file of pcm forms removes the audio file of the pcm forms of head.
205th, byte length, sample rate and the channel information of the audio source file in information bank removes audio source document Blank parts in part;
It should be noted that byte length, sample rate and the channel information of the audio source file in information bank, to sound Frequency source file carries out space management, removes the blank information part in audio source file.
206th, the audio file of pcm forms is cut into several voice lists according to default step-length and default Cutting Length Member, wherein, default step-length is less than default Cutting Length;
It should be noted that after the blank parts in eliminating audio source file, according to default step-length and default cutting The audio file of pcm forms is cut into several voice units by length, wherein, default step-length is less than default Cutting Length, i.e., Redundancy is cut, and is avoided in cutting process, and the sound of a people is cut off or a word cuts into two syllables.
207th, the speech characteristic parameter in each voice unit is extracted successively;
It should be noted that extract the speech characteristic parameter in each voice unit successively, speech characteristic parameter include but It is not limited to mel-frequency cepstrum coefficient.
208th, the speech characteristic parameter of all voice units is compared two-by-two successively, and the voice for calculating two voice units is special Levy the matching value between parameter;
It should be noted that after the speech characteristic parameter in being extracted each voice unit, compare two-by-two successively all The speech characteristic parameter of voice unit, and the matching value between the speech characteristic parameter of two voice units is calculated, for example, separation Go out 5 sections of voice units, be then compared successively, need to carry out 5+4+3+2+1 comparison, while two voices of calculating ratio centering Matching value between the speech characteristic parameter of unit.
209th, whether the matching value between the speech characteristic parameter of two voice units is judged higher than predetermined threshold value, if so, Then two voice units are preserved in order to same audio collection;
It should be noted that whether the matching value between judging the speech characteristic parameters of two voice units is higher than default threshold Value, if so, meaning that two voice units belong to the sound of same person, two voice units are preserved in order to same In audio collection, i.e., in the audio collection of one people.
210th, voice unit splicings all in same audio collection are become into single audio subfile, and according in information bank Byte length, sample rate and the channel information of audio source file preserve after being added to single audio subfile into row information;
It should be noted that by voice unit splicings all in same audio collection as single audio subfile, and according to After byte length, sample rate and the channel information of audio source file in information bank add single audio subfile into row information It preserves.
It is to a kind of another embodiment of more voice sound separation methods based on vocal print feature provided by the invention above The explanation of progress, below will be to a kind of one embodiment of more voice sound separators based on vocal print feature provided by the invention It illustrates.
Referring to Fig. 3, the present invention provides a kind of one embodiment of more voice sound separators based on vocal print feature, Including:
Acquiring unit 301, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit 302, for by the format conversion of audio source file be pcm forms audio file;
Cutter unit 303, if for cutting into the audio file of pcm forms according to default step-length and default Cutting Length Dry voice unit, wherein, default step-length is less than default Cutting Length;
Feature extraction unit 304, for extracting the speech characteristic parameter in each voice unit successively;
Contrast conting unit 305 for comparing the speech characteristic parameter of all voice units two-by-two successively, and calculates two Matching value between the speech characteristic parameter of voice unit;
Judging unit 306, for judging the matching value between the speech characteristic parameter of two voice units whether higher than pre- If threshold value, if so, preserving in order two voice units to same audio collection;
Splice storage unit 307, for voice unit splicings all in same audio collection to be become single audio subfile And it preserves.
In the embodiment of the present invention, by cutter unit 303 by audio-source file division into several voice units, and pass through Feature extraction unit 304 extracts the speech characteristic parameter of voice unit successively, compares voice two-by-two by contrast conting unit 305 The speech characteristic parameter of unit, and matching value is calculated, it is true that last judging unit 306 judges whether matching value comes higher than predetermined threshold value Whether fixed two sections of voice units belong to the voice of same person, as a result, by single audio of the audio-source file process for more people File solves in handling at present the recording of minutes, needs to expend a large amount of manpower and materials, and since the resolution of human ear is There is error, the receptible sound frequency of human ear is conditional, and the effect found out has certain subjective effect, can be to isolating The technical issues of result come has an impact, and caused voice separating resulting error is larger.
Be above to a kind of one embodiment of more voice sound separators based on vocal print feature provided by the invention into Capable explanation, below will be to a kind of another embodiment of more voice sound separators based on vocal print feature provided by the invention It illustrates.
Referring to Fig. 4, a kind of another implementation the present invention provides more voice sound separators based on vocal print feature Example, including:
Acquiring unit 401, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit 402, for by the format conversion of audio source file be pcm forms audio file;
Format conversion unit 402 specifically includes:
Reading subunit 4021, for read the byte length of audio source file, sample rate and channel information and store to In information bank;
Conversion subunit 4022 for the byte length of audio source file, sample rate and channel information to be removed, and is converted Audio file for pcm forms;
Blank cell 403 is removed, is believed for the byte length of the audio source file in information bank, sample rate and sound channel Breath removes the blank parts in audio source file;
Cutter unit 404, if for cutting into the audio file of pcm forms according to default step-length and default Cutting Length Dry voice unit, wherein, default step-length is less than default Cutting Length;
Feature extraction unit 405, for extracting the speech characteristic parameter in each voice unit successively;
Contrast conting unit 406 for comparing the speech characteristic parameter of all voice units two-by-two successively, and calculates two Matching value between the speech characteristic parameter of voice unit;
Judging unit 407, for judging the matching value between the speech characteristic parameter of two voice units whether higher than pre- If threshold value, if so, preserving in order two voice units to same audio collection;
Splice storage unit 408, for voice unit splicings all in same audio collection to be become single audio subfile, And byte length, sample rate and the channel information of the audio source file in information bank to single audio subfile into row information It is preserved after addition.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit can refer to the corresponding process in preceding method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function can have other dividing mode, such as multiple units or component in actual implementation It may be combined or can be integrated into another system or some features can be ignored or does not perform.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or uses When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products It embodies, which is stored in a storage medium, is used including some instructions so that a computer Equipment (can be personal computer, server or the network equipment etc.) performs the complete of each embodiment the method for the present invention Portion or part steps.And aforementioned storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding The technical solution recorded in each embodiment is stated to modify or carry out equivalent replacement to which part technical characteristic;And these Modification is replaced, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of more voice sound separation methods based on vocal print feature, which is characterized in that including:
S1, acquisition include the audio source file of at least 2 voice sounds;
S2, by the format conversion of the audio source file be pcm forms audio file;
S3, the audio file of the pcm forms is cut into several voice units according to default step-length and default Cutting Length, Wherein, the default step-length is less than the default Cutting Length;
S4, speech characteristic parameter in each institute's speech units is extracted successively;
S5, the speech characteristic parameter for comparing all institute's speech units two-by-two successively, and calculate two institute's speech units The speech characteristic parameter between matching value;
S6, whether matching value between the speech characteristic parameters of two institute's speech units is judged higher than predetermined threshold value, if It is then to preserve two institute's speech units to same audio collection in order;
S7, institute's speech units all in same audio collection are spliced into as single audio subfile and preserved.
2. more voice sound separation methods according to claim 1 based on vocal print feature, which is characterized in that the step S2 It specifically includes:
It reads byte length, sample rate and the channel information of the audio source file and stores into information bank;
The byte length of the audio source file, sample rate and channel information are removed, and are converted to the audio text of pcm forms Part.
3. more voice sound separation methods according to claim 2 based on vocal print feature, which is characterized in that the step S2 Later, it is further included before the step S3:
Byte length, sample rate and the channel information of the audio source file in described information storehouse, remove the audio Blank parts in source file.
4. more voice sound separation methods according to claim 3 based on vocal print feature, which is characterized in that the step S7 It specifically includes:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according in described information storehouse Byte length, sample rate and the channel information of the audio source file are protected after being added to the single audio subfile into row information It deposits.
5. more voice sound separation methods according to claim 1 based on vocal print feature, which is characterized in that the step S1 Later, it is further included before the step S2:
Sampling processing and/or preemphasis processing and/or pre-filtering processing and/or windowing process are carried out to the audio source file And/or end-point detection processing.
6. a kind of more voice sound separators based on vocal print feature, which is characterized in that including:
Acquiring unit, for obtaining the audio source file for including at least 2 voice sounds;
Format conversion unit, for by the format conversion of the audio source file be pcm forms audio file;
Cutter unit, it is several for cutting into the audio file of the pcm forms according to default step-length and default Cutting Length A voice unit, wherein, the default step-length is less than the default Cutting Length;
Feature extraction unit, for extracting the speech characteristic parameter in each institute's speech units successively;
Contrast conting unit for comparing the speech characteristic parameter of all institute's speech units two-by-two successively, and calculates two Matching value between the speech characteristic parameter of a institute's speech units;
Judging unit, for judging the matching value between the speech characteristic parameter of two institute's speech units whether higher than pre- If threshold value, if so, preserving in order two institute's speech units to same audio collection;
Splice storage unit, for institute's speech units all in same audio collection to be spliced into as single audio subfile and protected It deposits.
7. more voice sound separators according to claim 6 based on vocal print feature, which is characterized in that the form turns Unit is changed to specifically include:
Reading subunit, for reading byte length, sample rate and the channel information of the audio source file and storing to information In library;
Conversion subunit for the byte length of the audio source file, sample rate and channel information to be removed, and is converted to The audio file of pcm forms.
8. more voice sound separators according to claim 7 based on vocal print feature, which is characterized in that further include:
Blank cell is removed, is believed for the byte length of the audio source file in described information storehouse, sample rate and sound channel Breath, removes the blank parts in the audio source file.
9. more voice sound separators according to claim 8 based on vocal print feature, which is characterized in that splicing preserves single Member is additionally operable to:
Institute's speech units all in same audio collection are spliced into for single audio subfile, and according in described information storehouse Byte length, sample rate and the channel information of the audio source file are protected after being added to the single audio subfile into row information It deposits.
10. more voice sound separators according to claim 6 based on vocal print feature, which is characterized in that further include:
Pretreatment unit is handled for carrying out sampling processing and/or preemphasis processing and/or pre-filtering to the audio source file And/or windowing process and/or end-point detection are handled.
CN201810201281.5A 2018-03-12 2018-03-12 A kind of more voice cents based on vocal print feature are from method and device Pending CN108182945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810201281.5A CN108182945A (en) 2018-03-12 2018-03-12 A kind of more voice cents based on vocal print feature are from method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810201281.5A CN108182945A (en) 2018-03-12 2018-03-12 A kind of more voice cents based on vocal print feature are from method and device

Publications (1)

Publication Number Publication Date
CN108182945A true CN108182945A (en) 2018-06-19

Family

ID=62553436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810201281.5A Pending CN108182945A (en) 2018-03-12 2018-03-12 A kind of more voice cents based on vocal print feature are from method and device

Country Status (1)

Country Link
CN (1) CN108182945A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065023A (en) * 2018-08-23 2018-12-21 广州势必可赢网络科技有限公司 A kind of voice identification method, device, equipment and computer readable storage medium
CN109147831A (en) * 2018-09-26 2019-01-04 深圳壹账通智能科技有限公司 A kind of voice connection playback method, terminal device and computer readable storage medium
CN109346107A (en) * 2018-10-10 2019-02-15 中山大学 A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved
CN109410934A (en) * 2018-10-19 2019-03-01 深圳魔听文化科技有限公司 A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature
CN110322872A (en) * 2019-06-05 2019-10-11 平安科技(深圳)有限公司 Conference voice data processing method, device, computer equipment and storage medium
CN110827849A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Human voice separation method and device for database building, terminal and readable storage medium
CN110875036A (en) * 2019-11-11 2020-03-10 广州国音智能科技有限公司 Voice classification method, device, equipment and computer readable storage medium
CN111105801A (en) * 2019-12-03 2020-05-05 云知声智能科技股份有限公司 Role voice separation method and device
CN112863491A (en) * 2021-03-12 2021-05-28 云知声智能科技股份有限公司 Voice transcription method and device and electronic equipment
CN113593578A (en) * 2021-09-03 2021-11-02 北京紫涓科技有限公司 Conference voice data acquisition method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719659A (en) * 2016-02-03 2016-06-29 努比亚技术有限公司 Recording file separation method and device based on voiceprint identification
CN106782565A (en) * 2016-11-29 2017-05-31 重庆重智机器人研究院有限公司 A kind of vocal print feature recognition methods and system
CN107004427A (en) * 2014-12-12 2017-08-01 华为技术有限公司 Strengthen the signal processing apparatus of speech components in multi-channel audio signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107004427A (en) * 2014-12-12 2017-08-01 华为技术有限公司 Strengthen the signal processing apparatus of speech components in multi-channel audio signal
CN105719659A (en) * 2016-02-03 2016-06-29 努比亚技术有限公司 Recording file separation method and device based on voiceprint identification
CN106782565A (en) * 2016-11-29 2017-05-31 重庆重智机器人研究院有限公司 A kind of vocal print feature recognition methods and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张效藩: "基于语音分离的声纹识别技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
郑燕琳,杨晓炯,许星宇: "电话语音中基于多说话人的声纹识别系统", 《电信科学》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065023A (en) * 2018-08-23 2018-12-21 广州势必可赢网络科技有限公司 A kind of voice identification method, device, equipment and computer readable storage medium
CN109147831A (en) * 2018-09-26 2019-01-04 深圳壹账通智能科技有限公司 A kind of voice connection playback method, terminal device and computer readable storage medium
CN109346107A (en) * 2018-10-10 2019-02-15 中山大学 A method of independent speaker's sound pronunciation based on LSTM is inverse to be solved
CN109346107B (en) * 2018-10-10 2022-09-30 中山大学 LSTM-based method for inversely solving pronunciation of independent speaker
CN109410934A (en) * 2018-10-19 2019-03-01 深圳魔听文化科技有限公司 A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature
CN110322872A (en) * 2019-06-05 2019-10-11 平安科技(深圳)有限公司 Conference voice data processing method, device, computer equipment and storage medium
CN110875036A (en) * 2019-11-11 2020-03-10 广州国音智能科技有限公司 Voice classification method, device, equipment and computer readable storage medium
CN110827849B (en) * 2019-11-11 2022-07-26 广州国音智能科技有限公司 Human voice separation method and device for database building, terminal and readable storage medium
CN110827849A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Human voice separation method and device for database building, terminal and readable storage medium
CN111105801A (en) * 2019-12-03 2020-05-05 云知声智能科技股份有限公司 Role voice separation method and device
CN111105801B (en) * 2019-12-03 2022-04-01 云知声智能科技股份有限公司 Role voice separation method and device
CN112863491A (en) * 2021-03-12 2021-05-28 云知声智能科技股份有限公司 Voice transcription method and device and electronic equipment
CN113593578A (en) * 2021-09-03 2021-11-02 北京紫涓科技有限公司 Conference voice data acquisition method and system

Similar Documents

Publication Publication Date Title
CN108182945A (en) A kind of more voice cents based on vocal print feature are from method and device
US10593332B2 (en) Diarization using textual and audio speaker labeling
CN103035247B (en) Based on the method and device that voiceprint is operated to audio/video file
US10026405B2 (en) Method for speaker diarization
CN103500579B (en) Audio recognition method, Apparatus and system
US6697564B1 (en) Method and system for video browsing and editing by employing audio
CN105845129A (en) Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN109065023A (en) A kind of voice identification method, device, equipment and computer readable storage medium
CN111128223A (en) Text information-based auxiliary speaker separation method and related device
CN108307250B (en) Method and device for generating video abstract
CN108257592A (en) A kind of voice dividing method and system based on shot and long term memory models
US20100057452A1 (en) Speech interfaces
CN107967912A (en) A kind of voice dividing method and device
CN104781862A (en) Real-time traffic detection
CN106372653A (en) Stack type automatic coder-based advertisement identification method
CN104410973A (en) Recognition method and system for tape played phone fraud
US7349477B2 (en) Audio-assisted video segmentation and summarization
CN109410934A (en) A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature
Venkatesan et al. Automatic language identification using machine learning techniques
US20070083367A1 (en) Method and system for bandwidth efficient and enhanced concatenative synthesis based communication
EP1197952B1 (en) Coding method of the prosody for a very low bit rate speech encoder
CN112579744A (en) Method for controlling risk in online psychological consultation
US7571093B1 (en) Method of identifying duplicate voice recording
CN111010484A (en) Automatic quality inspection method for call recording
CN113921011A (en) Audio processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180619

RJ01 Rejection of invention patent application after publication