CN108021675B - Automatic segmentation and alignment method for multi-equipment recording - Google Patents

Automatic segmentation and alignment method for multi-equipment recording Download PDF

Info

Publication number
CN108021675B
CN108021675B CN201711284222.0A CN201711284222A CN108021675B CN 108021675 B CN108021675 B CN 108021675B CN 201711284222 A CN201711284222 A CN 201711284222A CN 108021675 B CN108021675 B CN 108021675B
Authority
CN
China
Prior art keywords
recording
time
long
recordings
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711284222.0A
Other languages
Chinese (zh)
Other versions
CN108021675A (en
Inventor
吴妍
郑羲光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huiting Technology Corp
Original Assignee
Beijing Huiting Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huiting Technology Corp filed Critical Beijing Huiting Technology Corp
Priority to CN201711284222.0A priority Critical patent/CN108021675B/en
Publication of CN108021675A publication Critical patent/CN108021675A/en
Application granted granted Critical
Publication of CN108021675B publication Critical patent/CN108021675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Abstract

The invention discloses an automatic segmentation and alignment method for multi-equipment sound recording, which comprises the following steps: correspondingly processing a plurality of original recordings in different forms into a plurality of long-time recordings in the same format; associating the same long-term recordings included in the plurality of long-term recordings; and respectively aligning the associated long-time records by using the short-time reference records, and then cutting the long-time records into the short-time records corresponding to the short-time reference records. The invention solves the problem of complex data processing of the voice recognition database of the recording multi-device.

Description

Automatic segmentation and alignment method for multi-equipment recording
Technical Field
The invention relates to the technical field of voice recognition database manufacturing, in particular to an automatic segmentation and alignment method for multi-device recording.
Background
In the voice recognition database manufacturing process, the efficiency and the diversity of the recording can be greatly improved by simultaneously acquiring the recording by utilizing multiple devices. For example, signals of a head microphone, a mobile phone and a microphone array are simultaneously collected in recording, so that diversity of channels can be ensured, and further, the practicability of the identification database is improved, and the database can be used in applications of far-field identification, awakening, noise reduction and the like. Due to the fact that corresponding data of near-speaking and far-speaking exist at the same time, performance of far-field recognition, awakening and noise reduction algorithms can be evaluated conveniently.
However, in the process of acquiring the multiple-device recording, because the recording devices are different, the recording devices cannot start recording at the same time in terms of time (i.e. simultaneously pressing a recording switch or sending a recording command); the problem of frame loss of the recording of part of recording equipment and misoperation in the recording process bring certain challenges to the post-processing of voice recognition data.
Disclosure of Invention
The invention aims to provide an automatic segmentation and alignment method of multi-equipment sound records for manufacturing a voice recognition database, aiming at the technical defects in the prior art, which realizes the automatic and respective alignment of associated sound records in a plurality of target sound records by taking a short-time reference sound record as a reference, and then segments the associated sound records to form corresponding short-time sound records to be stored in the voice recognition database, thereby realizing the conversion of different original sound records into the short-time sound records which can be used by a voice recognition system.
The technical scheme adopted for realizing the purpose of the invention is as follows:
an automatic segmentation and alignment method for multi-device sound recording comprises the following steps:
correspondingly processing a plurality of original recordings in different forms into a plurality of long-time recordings in the same format;
associating the same long-term recordings included in the plurality of long-term recordings;
and respectively aligning the associated long-time records by using the short-time reference records, and then cutting the long-time records into the short-time records corresponding to the short-time reference records.
In the invention, the long-time recording refers to all recordings which are continuously acquired by different recording equipment from the recording starting time to the recording ending time, and comprises effective recording and invalid recording; the short-time recording refers to an effective recording cut out from the long-time recording.
In the invention, the original recording comprises an original short-time recording and an original long-time recording, and the long-time recording is formed by the following steps respectively;
for the original long-term recording, performing uniform format conversion after decompressing the original long-term recording, and resampling the original long-term recording according to a uniform sampling rate, thereby forming the long-term recording;
and for the original short-time recording, performing unified format conversion after decompressing the original short-time recording, resampling the original short-time recording according to a unified sampling rate, and splicing the original short-time recording into the long-time recording according to the timestamp.
The alignment of the plurality of associated long-term recordings is performed by using the short-term reference recording, which may be implemented by searching the plurality of associated long-term recordings for the short-term reference recording, respectively.
Further, the short-time reference recordings are used to align the plurality of associated long-time recordings respectively, and the following method can be adopted:
respectively intercepting the head and tail sections of the associated long-time recording and the short-time reference recording, and calculating the recording offset of the associated long-time recording and short-time reference recording at the starting stage and the ending stage of the recording;
and acquiring the position of the short-time reference recording in the associated long-time recording according to the recording offset, and then cutting out the corresponding short recording in the associated long-time recording by using the short-time reference recording.
Specifically, the recording offset may be calculated on the original time domain signal, or on the time domain signal after noise reduction, or on the domain of the signal characteristics.
The short-time reference recording can be formed by cutting a long-time reference recording recorded by a reference recording device or a short-time recording directly recorded by the reference recording device.
The segmentation of the long-time reference record recorded by the reference recording equipment is performed by using voice activity detection information.
In the invention, the same long-time recordings in the plurality of long-time recordings are associated by reading the content of the long-time recordings and calculating the correlation degree of the content of the plurality of long-time recordings.
The correlation comprises the correlation between the time domain correlation of the audio record and the audio characteristic sequence.
According to the automatic segmentation and alignment method for the multi-equipment sound recording, after the original sound recording formats of a plurality of different sound recording equipment are unified, the target sound recording file is automatically associated, the target sound recording is aligned by using the reference short-time sound recording and then segmented, the original sound recordings with different formats recorded by the multi-sound recording equipment can be automatically converted into the short-time sound recording used by the voice recognition system, and the problem that the data processing for recording the multi-equipment voice recognition database is complex is solved.
Drawings
FIG. 1 is a process flow diagram of an automatic segmentation alignment method for multi-device audio recordings;
FIG. 2 is a flow diagram illustrating format unification processing of original audio records.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1-2, an automatic splitting and aligning method for multi-device sound recording includes the steps of:
correspondingly processing a plurality of original recordings in different forms into a plurality of long-time recordings in the same format;
associating the same long-term recordings included in the plurality of long-term recordings;
and respectively aligning the associated long-time records by using the short-time reference records, and then cutting the long-time records into the short-time records corresponding to the short-time reference records.
And the short-time recording corresponding to the short-time reference recording is cut into short-time recordings, and the short-time recordings are stored in a voice recognition database for recognition, so that different original recordings are converted into short-time recordings which can be used by a voice recognition system.
The method comprises the steps of inputting a plurality of original recordings in different forms by different recording input devices, as shown in fig. 1, inputting the original recordings in different forms by a recording device 1 and a recording device 2 … …, processing the original recordings in different forms into a plurality of long-term recordings in the same format by a format unified processing step, associating the same recording files in the same recording file in the long-term recording in the same format, aligning the associated long-term recordings by using short-term reference recordings, splitting the long-term recordings, realizing the recording stored in a voice recognition database, and outputting the recording files to the voice recognition database for storage by the recording device 1 and the recording device 2 … ….
The original sound recordings come from different sound recording devices, such as a head microphone, a mobile phone, a microphone array and the like, and because the formats of the sound recordings acquired by the sound recording devices are possibly inconsistent, the invention is convenient for subsequent segmentation processing.
Because of the difference of the recording devices, in the process of collecting the audio, the original recording which is possibly formed is the original short-time recording and also the original long-time recording, therefore, the corresponding long-time recording is formed by the following steps aiming at the processing of the original short-time recording and the original long-time recording;
for the original long-term recording, performing unified format conversion after decompressing (and decrypting) the original long-term recording, and resampling the original long-term recording according to a unified sampling rate, thereby forming the long-term recording;
and for the original short-time recording, performing unified format conversion after decompressing (and decrypting) the original short-time recording, resampling the original short-time recording according to a unified sampling rate, and splicing the original short-time recording into the long-time recording according to the timestamp information.
The original short-time recording splicing specifically may be:
if SkIs the kth original short-time recording (K is more than or equal to 1 and less than or equal to K), K is a natural number, SkCorresponding time stamp is tk=[tk start,tk end]If the long-time recording s (t) spliced by the corresponding timestamp t is:
Figure BDA0001498102540000041
Sk(t) is the kth original short-time recording corresponding to the timestamp t; t is tk start,tk endIs SkThe start time and the end time of the corresponding timestamp.
The short-time reference recording can be selected by corresponding recording reference equipment, the recording reference equipment can select equipment with high signal-to-noise ratio as the recording reference equipment according to the signal-to-noise ratio of the recording file, and the recording reference equipment can also be selected according to the actual recording item requirement.
The long-term recording is formed in a unified mode through the unified file format and the sampling rate, and subsequent processing is facilitated.
In the invention, the long-time recording refers to all recordings which are continuously acquired by different recording equipment from the recording starting time to the recording ending time, and comprises effective recording and invalid recording; since the start and/or end times of the individual recording devices are not necessarily the same, the process of re-recording, pausing, etc. in the middle of capturing audio is included in the long recording.
The short-time recording refers to an effective recording cut from the long-time recording according to a cutting rule, and is usually a complete sentence or paragraph.
Because the start and stop times of different recording devices are different and some recording devices may have frame loss and pause in the recording process, when the recordings of other recording devices are divided, the short-time reference recording and the target long-time recording (i.e., the associated same long-time recording) need to be aligned first.
The method can be realized by respectively searching the short-time reference records in the plurality of associated long-time records, and the method needs to search each sentence of short record, has a large search range and is easy to cause alignment errors.
Further, the short-time reference recordings are used to align the plurality of associated long-time recordings respectively, and the following method can be adopted:
respectively intercepting the head and tail sections of the associated long-time recording and the short-time reference recording, and calculating the recording offset of the associated long-time recording and short-time reference recording at the starting stage and the ending stage of the recording;
and acquiring the position of the short-time reference recording in the associated long-time recording according to the recording offset, and then cutting out the corresponding short recording in the associated long-time recording by using the short-time reference recording.
The method is realized by calculating the cross-correlation coefficient between corresponding signals intercepted at the beginning and ending stages of the target long-time recording and the reference long-time recording, can improve the alignment accuracy and simultaneously reduce the search range, and specifically comprises the following steps:
step 1: respectively intercepting target long-time recording S1And a reference long-time recording S2Respectively calculating the recording offsets D1, D2 of the target long-time recording and the reference long-time recording at the beginning stage and the end stage of the recording, wherein the offset refers to the offset of time, such as the offset due to the target long-time recording S1And a reference long-time recording S2When the collecting device presses the recording switchDifferent in scale, S1And S2There may be a difference of D seconds between them, and the recording offset is D seconds here. Recording S if the target is long1And a reference long-time recording S2If the length is N, then S1And S2No time deviation occurs, and the cross correlation coefficient between the two signals has a maximum value at N + 1; otherwise, D is the maximum value of the cross correlation coefficient- (N +1), where D is the recording offset.
If the head and tail deviation D1 of the recording is D2, the recording equipment is good, the recording at the reference equipment t1 is at the t1+ D position of the target equipment, and the step 3 is directly carried out; otherwise, indicating that the phenomena of frame loss or pause and the like exist in the recording process, and entering the step 2;
step 2: according to the sound record head-tail deviation D1 and D2, for the short sound record starting at the time of t1 and ending at the time of t2 of the reference equipment, searching for the corresponding sound record in the range of [ D1+ t1-delta and D2+ t2+ delta ] of the target long sound record, further obtaining the position of the short sound record on the target equipment, and entering the step 3. Where delta is the extended search duration (e.g., 1 second).
And step 3: and switching out the short record corresponding to the target long-time record according to the position of the short-time reference record in the target long-time record.
Specifically, the recording offset may be calculated on the original time domain signal, or on the time domain signal after noise reduction, or on the domain of the signal characteristics.
Wherein the short-time reference recording may be a short-time recording directly recorded by a reference recording device.
The original short-time recording can be directly used as the short-time reference recording to carry out alignment segmentation processing on the target long-time recording to be processed.
The short-time reference record can be formed by segmenting a long-time reference record recorded by a reference recording device, and if the long-time reference record recorded by the reference recording device is segmented, the short-time reference record can be segmented by utilizing voice activity detection information.
Segmentation with Voice Activity Detection (VAD) information: for long-term original recording files, VAD information of voice signals can be analyzed, long-term recording is further divided into short sentences according to a predefined criterion, the long-term recording can be divided according to the pause duration of the voice signals, and generally, pauses obviously longer than pauses in progress of each sentence are generated when each sentence is finished. And the VAD information can be utilized to carry out segmentation according to the pause length between two sentences of which the VAD detection values are true. If the continuous pause is found to exceed 2 seconds, the segmentation is performed once at the key point of pause. When the conversation database is recorded, the energy of the head-mounted microphones of the two parties of the conversation can be combined, and the segmentation precision is improved.
Because during the recording acquisition process, it is often necessary to process recordings of multiple persons (segments) simultaneously. Therefore, in the multi-device recording process, it is necessary to associate the recordings of different recording devices, that is, to find out a file corresponding to a certain person (session) recording in different recording devices, that is, to associate the same long-term recordings included in a plurality of long-term recordings.
As described above, the same long-time recordings included in the plurality of long-time recordings may be associated with each other in the following manner, for example, according to information such as the file names of the recordings, the recording durations, and the file sizes of the recordings. The method can also be realized by reading the content of the long-time recording and calculating the correlation degree of the content of a plurality of long-time recordings.
According to the content of the read recording files, correlation can be carried out by calculating the correlation degree among the recording files. Suppose there are N recording devices, each having M recordings. A plurality of files still appear after short-time recording splicing, because a certain recording device possibly participates in the recording of a plurality of people, and the files are stored in the same storage device. And calculating the correlation between all the files of the target sound recording and all the files of the reference sound recording by taking the reference sound recording as a reference so as to obtain an M-M sound recording correlation matrix T. Two recording devices n1(1≤n1≤N),n2(1≤n2≤N),n1≠n2Two-stage recording of
Figure BDA0001498102540000077
m1(1≤m1≤M),m2(1≤m2≤M),m1≠m2Correlation coefficient ρ of12Comprises the following steps:
Figure BDA0001498102540000071
wherein the content of the first and second substances,
Figure BDA0001498102540000072
E[·]as desired. Then two recording devices n1,n2The correlation matrix T of (a) is:
Figure BDA0001498102540000073
based on the correlation matrix T, according to a certain selection criterion (such as the total correlation after maximum correlation), the one-to-one corresponding relation between the target audio file and the reference audio file can be obtained. Namely with
Figure BDA0001498102540000074
Highest degree of association
Figure BDA0001498102540000075
M is
Figure BDA0001498102540000076
The correlation degree can be the time domain correlation degree of the sound recording or the correlation degree of the audio characteristic sequence.
The method related above has the advantage of being directly applicable to all devices, so as to reduce the complexity of calculation. In practical systems, computational complexity can be reduced by simplifying the correlation computation (e.g., sampling in computing the time-domain correlation).
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (9)

1. The method for automatically segmenting and aligning the multi-device sound records is characterized by comprising the following steps of:
correspondingly processing a plurality of original recordings in different forms into a plurality of long-time recordings in the same format;
associating the same long-term recordings included in the plurality of long-term recordings;
respectively aligning the associated long-time recordings by using short-time reference recordings, and then cutting the long-time recordings into short-time recordings corresponding to the short-time reference recordings;
aligning a plurality of associated long-time recordings respectively using a short-time reference recording, comprising the steps of:
respectively intercepting the head and tail sections of the associated long-time recording and the short-time reference recording, and calculating the recording offset of the associated long-time recording and short-time reference recording at the starting stage and the ending stage of the recording;
and acquiring the position of the short-time reference recording in the associated long-time recording according to the recording offset, and then cutting out the corresponding short recording in the associated long-time recording by using the short-time reference recording.
2. The method for automatically segmenting and aligning the multi-device recording according to claim 1, wherein the long-time recording refers to all recordings that are continuously acquired by different recording devices from the recording start time to the recording end time, and includes valid recordings and invalid recordings; the short-time recording refers to an effective recording cut out from the long-time recording.
3. The method for automatically slicing and aligning multiple device recordings according to claim 1, wherein the original recordings include an original short-time recording and an original long-time recording, the long-time recording being formed by the steps of;
for the original long-term recording, performing uniform format conversion after decompressing the original long-term recording, and resampling the original long-term recording according to a uniform sampling rate, thereby forming the long-term recording;
and for the original short-time recording, performing unified format conversion after decompressing the original short-time recording, resampling the original short-time recording according to a unified sampling rate, and splicing the original short-time recording into the long-time recording according to the timestamp.
4. The method for automatically segmenting and aligning multiple device records according to claim 1, wherein the short-time reference records are used to align the multiple associated long-time records respectively by searching the multiple associated long-time records for the short-time reference records respectively.
5. The method of claim 1, wherein the recording offset is calculated in the original time domain signal, in the noise-reduced time domain signal, or in the signal feature domain.
6. The method for automatically slicing and aligning multiple device recordings according to claim 1, wherein the short time reference recording is formed by slicing a long time reference recording recorded by a reference recording device or is a short time recording directly recorded by a reference recording device.
7. The method for automatically segmenting and aligning multiple device recordings according to claim 6, wherein the segmentation of the long-term reference recordings recorded by the reference recording device is performed by using voice activity detection information.
8. The method for automatically segmenting and aligning the multi-device sound recordings according to claim 1, wherein the same long-time recordings included in the plurality of long-time recordings are associated by reading the contents of the long-time recordings and calculating the correlation degree of the contents of the plurality of long-time recordings.
9. The method of claim 8, wherein the correlation comprises a correlation between a time-domain correlation of the audio record and a sequence of audio features.
CN201711284222.0A 2017-12-07 2017-12-07 Automatic segmentation and alignment method for multi-equipment recording Active CN108021675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711284222.0A CN108021675B (en) 2017-12-07 2017-12-07 Automatic segmentation and alignment method for multi-equipment recording

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711284222.0A CN108021675B (en) 2017-12-07 2017-12-07 Automatic segmentation and alignment method for multi-equipment recording

Publications (2)

Publication Number Publication Date
CN108021675A CN108021675A (en) 2018-05-11
CN108021675B true CN108021675B (en) 2021-11-09

Family

ID=62078879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711284222.0A Active CN108021675B (en) 2017-12-07 2017-12-07 Automatic segmentation and alignment method for multi-equipment recording

Country Status (1)

Country Link
CN (1) CN108021675B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769559B (en) * 2018-05-25 2020-12-01 数据堂(北京)科技股份有限公司 Multimedia file synchronization method and device
CN109166570B (en) * 2018-07-24 2019-11-26 百度在线网络技术(北京)有限公司 A kind of method, apparatus of phonetic segmentation, equipment and computer storage medium
CN109151705A (en) * 2018-08-27 2019-01-04 北京爱数智慧科技有限公司 A kind of alignment schemes and relevant device of conferencing data
CN109195048B (en) * 2018-09-03 2020-05-08 中科探索创新(北京)科技院 Distortion-free recording earphone
CN110334240B (en) * 2019-07-08 2021-10-22 联想(北京)有限公司 Information processing method and system, first device and second device
CN116758939B (en) * 2023-08-21 2023-11-03 北京希尔贝壳科技有限公司 Multi-device audio data alignment method, device and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612205A (en) * 2003-10-29 2005-05-04 雅马哈株式会社 Audio signal processor
CN1716380A (en) * 2005-07-26 2006-01-04 浙江大学 Audio frequency splitting method for changing detection based on decision tree and speaking person
CN101075183A (en) * 2007-06-29 2007-11-21 北京中星微电子有限公司 Multi-path audio-frequency data processing system
CN102364952A (en) * 2011-10-25 2012-02-29 浙江万朋网络技术有限公司 Method for processing audio and video synchronization in simultaneous playing of a plurality of paths of audio and video
CN103354588A (en) * 2013-06-28 2013-10-16 贵阳朗玛信息技术股份有限公司 Determination method, apparatus and system for recording and playing sampling rate
CN104347096A (en) * 2013-08-09 2015-02-11 上海证大喜马拉雅网络科技有限公司 Recording system and method integrating audio cutting, continuous recording and combination
CN104700839A (en) * 2015-02-26 2015-06-10 深圳市中兴移动通信有限公司 Method and device for collecting multichannel sound, cellphone and system
CN105989846A (en) * 2015-06-12 2016-10-05 乐视致新电子科技(天津)有限公司 Multi-channel speech signal synchronization method and device
CN106504777A (en) * 2016-11-25 2017-03-15 维沃移动通信有限公司 A kind of processing method of recording data and mobile terminal
CN106782508A (en) * 2016-12-20 2017-05-31 美的集团股份有限公司 The cutting method of speech audio and the cutting device of speech audio

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107195316B (en) * 2017-04-28 2019-11-08 北京声智科技有限公司 Training data preparation system and method for far field speech recognition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612205A (en) * 2003-10-29 2005-05-04 雅马哈株式会社 Audio signal processor
CN1716380A (en) * 2005-07-26 2006-01-04 浙江大学 Audio frequency splitting method for changing detection based on decision tree and speaking person
CN101075183A (en) * 2007-06-29 2007-11-21 北京中星微电子有限公司 Multi-path audio-frequency data processing system
CN102364952A (en) * 2011-10-25 2012-02-29 浙江万朋网络技术有限公司 Method for processing audio and video synchronization in simultaneous playing of a plurality of paths of audio and video
CN103354588A (en) * 2013-06-28 2013-10-16 贵阳朗玛信息技术股份有限公司 Determination method, apparatus and system for recording and playing sampling rate
CN104347096A (en) * 2013-08-09 2015-02-11 上海证大喜马拉雅网络科技有限公司 Recording system and method integrating audio cutting, continuous recording and combination
CN104700839A (en) * 2015-02-26 2015-06-10 深圳市中兴移动通信有限公司 Method and device for collecting multichannel sound, cellphone and system
CN105989846A (en) * 2015-06-12 2016-10-05 乐视致新电子科技(天津)有限公司 Multi-channel speech signal synchronization method and device
CN106504777A (en) * 2016-11-25 2017-03-15 维沃移动通信有限公司 A kind of processing method of recording data and mobile terminal
CN106782508A (en) * 2016-12-20 2017-05-31 美的集团股份有限公司 The cutting method of speech audio and the cutting device of speech audio

Also Published As

Publication number Publication date
CN108021675A (en) 2018-05-11

Similar Documents

Publication Publication Date Title
CN108021675B (en) Automatic segmentation and alignment method for multi-equipment recording
US11699456B2 (en) Automated transcript generation from multi-channel audio
JP5826291B2 (en) Extracting and matching feature fingerprints from speech signals
Burges et al. Distortion discriminant analysis for audio fingerprinting
JP5362178B2 (en) Extracting and matching characteristic fingerprints from audio signals
JP6469252B2 (en) Account addition method, terminal, server, and computer storage medium
US20160133251A1 (en) Processing of audio data
US8706276B2 (en) Systems, methods, and media for identifying matching audio
WO2019076313A1 (en) Audio recognition method, device and server
CN104078044A (en) Mobile terminal and sound recording search method and device of mobile terminal
CN105975568B (en) Audio processing method and device
CN110111808B (en) Audio signal processing method and related product
WO2016197708A1 (en) Recording method and terminal
CN103559882A (en) Meeting presenter voice extracting method based on speaker division
US20120035919A1 (en) Voice recording device and method thereof
US9058384B2 (en) System and method for identification of highly-variable vocalizations
CN106098081B (en) Sound quality identification method and device for sound file
CN112242149A (en) Audio data processing method and device, earphone and computer readable storage medium
CN101950564A (en) Remote digital voice acquisition, analysis and identification system
CN111382303B (en) Audio sample retrieval method based on fingerprint weight
KR101382356B1 (en) Apparatus for forgery detection of audio file
KR100842310B1 (en) Method and system for clustering moving picture date according to the sameness with each other
Kepesi et al. Joint position-pitch estimation for multiple speaker scenarios
CN110661923A (en) Method and device for recording speech information in conference
CN111540377B (en) System for intelligent fragmentation of broadcast program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant