CN112562712A - Recording data processing method and system, electronic equipment and storage medium - Google Patents
Recording data processing method and system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112562712A CN112562712A CN202011549737.0A CN202011549737A CN112562712A CN 112562712 A CN112562712 A CN 112562712A CN 202011549737 A CN202011549737 A CN 202011549737A CN 112562712 A CN112562712 A CN 112562712A
- Authority
- CN
- China
- Prior art keywords
- sound
- audio
- track
- data processing
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 27
- 238000000926 separation method Methods 0.000 claims abstract description 50
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000004590 computer program Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 3
- 238000001228 spectrum Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000013507 mapping Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 241000209140 Triticum Species 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention provides a recording data processing method, a system, electronic equipment and a storage medium, wherein the method comprises a first pickup step, a second pickup step and a third pickup step, wherein a first pickup device worn by a first speaker is used for collecting the sound of the first speaker, transmitting the sound to a second pickup device and storing the sound as a first audio track; a second sound pickup step of picking up the sound of a second party by using the second sound pickup apparatus and storing the sound as a second sound track; an audio generating step of processing the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus; and audio separation, namely performing voice role separation on the intermediate audio file. The invention solves the problems of higher cost and poor effect of the existing recording processing method.
Description
Technical Field
The invention belongs to the field of audio processing, and particularly relates to a recording data processing method and system applicable to a field recording technology, electronic equipment and a storage medium.
Background
In the scene through the recording of recording equipment in the current trade, through the recording equipment like three wheat, four wheat recording equipment improve the recording in-process distinguish the method that the person of wearing, the character sound of interlocutor were drawed through the recording separation to the recording process that the person of speaking reaches better effective recording, lower noise effect, changeing.
In the method, the non-effective recording of the environment is noise, the separation of the audio role recording is easily greatly interfered, errors are easily caused after the separation of the wearer and the interlocutor recording role, the scheme is solved by using higher cost and more complex technology, and the use cost is greatly improved in a commercial scene.
Disclosure of Invention
The embodiment of the application provides a recording data processing method, a recording data processing system, electronic equipment and a storage medium, and aims to at least solve the problems of high cost and poor effect of the existing recording processing method.
In a first aspect, an embodiment of the present application provides a method for processing recorded sound data, including: a first pickup step, using a first pickup device worn by a first speaker to pick up the sound of the first speaker, transmitting the sound to a second pickup device, and storing the sound as a first sound track; a second sound pickup step of picking up the sound of a second party by using the second sound pickup apparatus and storing the sound as a second sound track; an audio generating step of processing the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus; and audio separation, namely performing voice role separation on the intermediate audio file.
Preferably, the intermediate audio file is a dual-track single audio file or two dual-track single audio files.
Preferably, the audio separating step further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
Preferably, the audio separating step further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In a second aspect, an embodiment of the present application provides a recording data processing system, which is suitable for the above recording data processing method, and includes: a first sound pickup unit: collecting the sound of a first speaker by using a first sound pickup device worn on the first speaker, transmitting the sound to a second sound pickup device, and storing the sound as a first sound track; the second sound pickup unit is used for collecting the sound of a second conversation party by using the second sound pickup equipment and storing the sound as a second sound track; an audio generation unit that processes the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus; and the audio separation unit is used for separating voice roles of the intermediate audio file.
In some of these embodiments, the intermediate audio file is one dual-track single audio file or two dual-track single audio files.
In some of these embodiments, the audio separation unit further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
In some of these embodiments, the audio separation unit further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the sound recording data processing method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements a recording data processing method as described in the first aspect.
Compared with the related art, the recording data processing method provided by the embodiment of the application can be used for recording more accurate audio source recording in a mode with lower cost and lower technical difficulty.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a recording data processing method according to the present invention;
FIG. 2 is a block diagram of a recorded data processing system of the present invention;
FIG. 3 is a block diagram of an electronic device of the present invention;
in the above figures:
1. a first sound pickup unit; 2. a second sound pickup unit; 3. an audio generation unit; 4. an audio separation unit; 60. a bus; 61. a processor; 62. a memory; 63. a communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In a real environment, a speech signal of interest is usually interfered by noise, so that the intelligibility of speech is seriously damaged, and the performance of speech recognition is reduced. Front-end speech separation techniques are one of the most common methods for noise. A good front-end speech separation module can greatly improve the speech intelligibility and the recognition performance of an automatic speech recognition system.
From a signal processing point of view, many methods propose estimating the power spectrum of the noise or ideal wiener filters, such as spectral subtraction and wiener filtering. Where wiener filtering is the optimal filter to separate clean speech in the least mean square error sense. Given a noisy speech, it can infer the spectral coefficients of the speech given a priori distributions of speech and noise. Signal processing based methods typically assume that the noise is stationary or slowly varying. These methods can achieve better separation performance when the assumed conditions are satisfied. Compared with a signal processing method, the model-based method utilizes pure signals before mixing to respectively construct models of voice and noise, and important performance improvement is achieved under the condition of low signal-to-noise ratio. Among model-based speech separation methods, non-negative matrix factorization is a common modeling method that can mine local basis representation in non-negative data, and is currently widely applied to speech separation. Computational auditory scene analysis is another important speech separation technique that attempts to solve the speech separation problem by simulating the processing of sound by the human ear. The basic computational goal of computational auditory scene analysis is to estimate an ideal binary mask, which achieves speech separation based on the auditory masking of the human ear. Compared with other voice separation methods, the computational auditory scene analysis has no any hypothesis on noise and has better generalization performance.
Speech separation aims at separating useful signals from disturbed speech signals, a process that can naturally represent a supervised learning problem. A typical supervised speech separation system typically learns a mapping function from noisy features to separation targets, such as an ideal mask or a magnitude spectrum of the speech of interest, through a supervised learning algorithm, such as a deep neural network.
Embodiments of the invention are described in detail below with reference to the accompanying drawings:
fig. 1 is a flowchart of a recording data processing method according to the present invention, and referring to fig. 1, the recording data processing method according to the present invention includes the following steps:
s1: the first sound pickup equipment worn by the first speaker is used for collecting the sound of the first speaker, transmitting the sound to the second sound pickup equipment and storing the sound as the first sound track.
In the implementation, the recording object is divided into a wearer and a corresponding interlocutor, and the first sound pickup equipment is worn on the body of the wearer, optionally, the first sound pickup equipment may be an earphone; the first sound pickup device is used for collecting the sound of a wearer within a certain distance, and optionally, the certain distance can be 0.2 m.
In a specific implementation, after the sound of the wearer is collected, the recorded sound data is transmitted to a second sound pickup device, and optionally, the second sound pickup device may be a sound recorder; the second sound-collecting apparatus saves the received sound of the wearer in the form of one sound track.
In this step, a pickup device worn by the first talker is used to achieve targeted approach pickup.
S2: and collecting the sound of a second speaker by using the second sound pickup equipment, and storing the sound as a second sound track.
In a specific implementation, the second sound pickup device is used for collecting the sound of an interlocutor corresponding to the wearer, and optionally, the second sound pickup device is arranged within a certain distance radius of the interlocutor; alternatively, the certain distance may be 2 meters.
In an implementation, the second sound pick-up device saves the picked-up sound of the wearer in the form of one sound track.
S3: processing the first audio track and the second audio track into an intermediate audio file using the second pickup apparatus.
Optionally, the intermediate audio file is a dual-track single audio file or two dual-track single audio files.
In a specific implementation, the sound of the wearer and the sound of the interlocutor are stored in the form of two sound tracks respectively, and the second sound pickup equipment can select the two sound tracks to be processed in different forms; alternatively, the two tracks may be combined into one audio file with two tracks; alternatively, the two audio tracks may be separately generated into two audio files of a single audio track, respectively.
S4: and carrying out voice role separation on the intermediate audio file.
Optionally, the audio separating step further includes: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
Optionally, the audio separating step further includes: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In specific implementation, the intermediate audio file is separated through a voice separation algorithm, and the audio of the non-wearer in the audio track of the wearer is eliminated; and (4) outputting the audio of the non-interlocutors in the audio track of the interlocutors to obtain an audio file with less noise.
In specific implementation, the embodiment of the present application uses a fixed human voice as a main separation object, and therefore, optionally, a method based on spectrum mapping may be used for the voice separation algorithm, where the method based on spectrum mapping is to let a model learn a mapping relationship from a spectrum with interference to a spectrum without interference (clean voice) through supervised learning; the model may be DNN, CNN, LSTM or even GAN.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment of the application provides a recording data processing system, which is suitable for the recording data processing method. As used below, the terms "unit," "module," and the like may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
FIG. 2 is a block diagram of a recording data processing system according to the present invention, referring to FIG. 2, including:
first sound pickup unit 1: the first sound pickup equipment worn by the first speaker is used for collecting the sound of the first speaker, transmitting the sound to the second sound pickup equipment and storing the sound as the first sound track.
In the implementation, the recording object is divided into a wearer and a corresponding interlocutor, and the first sound pickup equipment is worn on the body of the wearer, optionally, the first sound pickup equipment may be an earphone; the first sound pickup device is used for collecting the sound of a wearer within a certain distance, and optionally, the certain distance can be 0.2 m.
In a specific implementation, after the sound of the wearer is collected, the recorded sound data is transmitted to a second sound pickup device, and optionally, the second sound pickup device may be a sound recorder; the second sound-collecting apparatus saves the received sound of the wearer in the form of one sound track.
In the unit, a pickup device worn by a first speaker is used to realize targeted approach pickup.
Second sound pickup unit 2: and collecting the sound of a second speaker by using the second sound pickup equipment, and storing the sound as a second sound track.
In a specific implementation, the second sound pickup device is used for collecting the sound of an interlocutor corresponding to the wearer, and optionally, the second sound pickup device is arranged within a certain distance radius of the interlocutor; alternatively, the certain distance may be 2 meters.
In an implementation, the second sound pick-up device saves the picked-up sound of the wearer in the form of one sound track.
The audio generation unit 3: processing the first audio track and the second audio track into an intermediate audio file using the second pickup apparatus.
Optionally, the intermediate audio file is a dual-track single audio file or two dual-track single audio files.
In a specific implementation, the sound of the wearer and the sound of the interlocutor are stored in the form of two sound tracks respectively, and the second sound pickup equipment can select the two sound tracks to be processed in different forms; alternatively, the two tracks may be combined into one audio file with two tracks; alternatively, the two audio tracks may be separately generated into two audio files of a single audio track, respectively.
The audio separation unit 4: and carrying out voice role separation on the intermediate audio file.
Optionally, the audio separation unit 4 further includes: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
Optionally, the audio separation unit 4 further includes: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In specific implementation, the intermediate audio file is separated through a voice separation algorithm, and the audio of the non-wearer in the audio track of the wearer is eliminated; and (4) outputting the audio of the non-interlocutors in the audio track of the interlocutors to obtain an audio file with less noise.
In specific implementation, the embodiment of the present application uses a fixed human voice as a main separation object, and therefore, optionally, a method based on spectrum mapping may be used for the voice separation algorithm, where the method based on spectrum mapping is to let a model learn a mapping relationship from a spectrum with interference to a spectrum without interference (clean voice) through supervised learning; the model may be DNN, CNN, LSTM or even GAN.
In addition, a recording data processing method described in conjunction with fig. 1 may be implemented by an electronic device. Fig. 3 is a block diagram of an electronic device of the present invention.
The electronic device may comprise a processor 61 and a memory 62 in which computer program instructions are stored.
Specifically, the processor 61 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory 62 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 61.
The processor 61 realizes any one of the sound recording data processing methods in the above-described embodiments by reading and executing computer program instructions stored in the memory 62.
In some of these embodiments, the electronic device may also include a communication interface 63 and a bus 60. As shown in fig. 3, the processor 61, the memory 62, and the communication interface 63 are connected via a bus 60 to complete communication therebetween.
The communication port 63 may be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 60 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 60 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 60 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 60 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device can execute the recording data processing method in the embodiment of the application.
In addition, in combination with the recording data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement the recording data processing method. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the sound recording data processing methods in the above embodiments.
And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method for processing recorded data, comprising:
a first pickup step, using a first pickup device worn by a first speaker to pick up the sound of the first speaker, transmitting the sound to a second pickup device, and storing the sound as a first sound track;
a second sound pickup step of picking up the sound of a second party by using the second sound pickup apparatus and storing the sound as a second sound track;
an audio generating step of processing the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus;
and audio separation, namely performing voice role separation on the intermediate audio file.
2. The audio recording data processing method of claim 1, wherein the intermediate audio file is one dual-track mono audio file or two dual-track mono audio files.
3. The recorded sound data processing method of claim 1, wherein the audio separating step further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
4. The recorded sound data processing method of claim 1 or 3, wherein the audio separating step further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
5. A recorded sound data processing system, comprising:
a first sound pickup unit: collecting the sound of a first speaker by using a first sound pickup device worn on the first speaker, transmitting the sound to a second sound pickup device, and storing the sound as a first sound track;
the second sound pickup unit is used for collecting the sound of a second conversation party by using the second sound pickup equipment and storing the sound as a second sound track;
an audio generation unit that processes the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus;
and the audio separation unit is used for separating voice roles of the intermediate audio file.
6. The audio recording data processing system of claim 5 wherein said intermediate audio file is one dual track single audio file or two dual track single audio files.
7. The recorded sound data processing system of claim 5, wherein the audio separation unit further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
8. The recorded sound data processing system of claim 5 or 7, wherein the audio separating unit further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the sound recording data processing method according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the sound recording data processing method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011549737.0A CN112562712A (en) | 2020-12-24 | 2020-12-24 | Recording data processing method and system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011549737.0A CN112562712A (en) | 2020-12-24 | 2020-12-24 | Recording data processing method and system, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112562712A true CN112562712A (en) | 2021-03-26 |
Family
ID=75033282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011549737.0A Pending CN112562712A (en) | 2020-12-24 | 2020-12-24 | Recording data processing method and system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112562712A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470687A (en) * | 2021-06-29 | 2021-10-01 | 北京明略昭辉科技有限公司 | Audio acquisition and transmission device, audio processing system and audio acquisition and transmission method |
CN113706844A (en) * | 2021-08-31 | 2021-11-26 | 上海明略人工智能(集团)有限公司 | Method and device for early warning of voice acquisition equipment, voice acquisition equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007322899A (en) * | 2006-06-02 | 2007-12-13 | D & M Holdings Inc | Sound recording device |
CN104123950A (en) * | 2014-07-17 | 2014-10-29 | 深圳市中兴移动通信有限公司 | Sound recording method and device |
JP2017011754A (en) * | 2016-09-14 | 2017-01-12 | ソニー株式会社 | Auricle mounted sound collecting apparatus, signal processing apparatus, and sound collecting method |
JP2018013742A (en) * | 2016-07-22 | 2018-01-25 | 富士通株式会社 | Speech summary creation assist device, speech summary creation assist method, and speech summary creation assist program |
US20180096705A1 (en) * | 2016-10-03 | 2018-04-05 | Nokia Technologies Oy | Method of Editing Audio Signals Using Separated Objects And Associated Apparatus |
US20190304437A1 (en) * | 2018-03-29 | 2019-10-03 | Tencent Technology (Shenzhen) Company Limited | Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition |
CN111128197A (en) * | 2019-12-25 | 2020-05-08 | 北京邮电大学 | Multi-speaker voice separation method based on voiceprint features and generation confrontation learning |
CN111243579A (en) * | 2020-01-19 | 2020-06-05 | 清华大学 | Time domain single-channel multi-speaker voice recognition method and system |
CN111586050A (en) * | 2020-05-08 | 2020-08-25 | 上海明略人工智能(集团)有限公司 | Audio file transmission method and device, storage medium and electronic equipment |
CN111833898A (en) * | 2020-07-24 | 2020-10-27 | 上海明略人工智能(集团)有限公司 | Multi-source data processing method and device and readable storage medium |
-
2020
- 2020-12-24 CN CN202011549737.0A patent/CN112562712A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007322899A (en) * | 2006-06-02 | 2007-12-13 | D & M Holdings Inc | Sound recording device |
CN104123950A (en) * | 2014-07-17 | 2014-10-29 | 深圳市中兴移动通信有限公司 | Sound recording method and device |
JP2018013742A (en) * | 2016-07-22 | 2018-01-25 | 富士通株式会社 | Speech summary creation assist device, speech summary creation assist method, and speech summary creation assist program |
JP2017011754A (en) * | 2016-09-14 | 2017-01-12 | ソニー株式会社 | Auricle mounted sound collecting apparatus, signal processing apparatus, and sound collecting method |
US20180096705A1 (en) * | 2016-10-03 | 2018-04-05 | Nokia Technologies Oy | Method of Editing Audio Signals Using Separated Objects And Associated Apparatus |
US20190304437A1 (en) * | 2018-03-29 | 2019-10-03 | Tencent Technology (Shenzhen) Company Limited | Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition |
CN111128197A (en) * | 2019-12-25 | 2020-05-08 | 北京邮电大学 | Multi-speaker voice separation method based on voiceprint features and generation confrontation learning |
CN111243579A (en) * | 2020-01-19 | 2020-06-05 | 清华大学 | Time domain single-channel multi-speaker voice recognition method and system |
CN111586050A (en) * | 2020-05-08 | 2020-08-25 | 上海明略人工智能(集团)有限公司 | Audio file transmission method and device, storage medium and electronic equipment |
CN111833898A (en) * | 2020-07-24 | 2020-10-27 | 上海明略人工智能(集团)有限公司 | Multi-source data processing method and device and readable storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470687A (en) * | 2021-06-29 | 2021-10-01 | 北京明略昭辉科技有限公司 | Audio acquisition and transmission device, audio processing system and audio acquisition and transmission method |
CN113706844A (en) * | 2021-08-31 | 2021-11-26 | 上海明略人工智能(集团)有限公司 | Method and device for early warning of voice acquisition equipment, voice acquisition equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
CN107452389B (en) | Universal single-track real-time noise reduction method | |
CN110970057B (en) | Sound processing method, device and equipment | |
CN109788400B (en) | Neural network howling suppression method, system and storage medium for digital hearing aid | |
US9378754B1 (en) | Adaptive spatial classifier for multi-microphone systems | |
CN113129917A (en) | Speech processing method based on scene recognition, and apparatus, medium, and system thereof | |
CN111868823B (en) | Sound source separation method, device and equipment | |
CN112562712A (en) | Recording data processing method and system, electronic equipment and storage medium | |
CN104505099A (en) | Method and equipment for removing known interference in voice signal | |
CN116403592A (en) | Voice enhancement method and device, electronic equipment, chip and storage medium | |
TWI581255B (en) | Front-end audio processing system | |
CN113205803A (en) | Voice recognition method and device with adaptive noise reduction capability | |
CN112309417A (en) | Wind noise suppression audio signal processing method, device, system and readable medium | |
CN114333896A (en) | Voice separation method, electronic device, chip and computer readable storage medium | |
WO2017045512A1 (en) | Voice recognition method and apparatus, terminal, and voice recognition device | |
CN114302286A (en) | Method, device and equipment for reducing noise of call voice and storage medium | |
CN111933140B (en) | Method, device and storage medium for detecting voice of earphone wearer | |
CN113039601B (en) | Voice control method, device, chip, earphone and system | |
CN111009259B (en) | Audio processing method and device | |
CN108899041B (en) | Voice signal noise adding method, device and storage medium | |
CN105491336A (en) | Image identification module with low power consumption | |
TWI761018B (en) | Voice capturing method and voice capturing system | |
CN115293205A (en) | Anomaly detection method, self-encoder model training method and electronic equipment | |
Birnie et al. | Noise retf estimation and removal for low snr speech enhancement | |
CN111028851B (en) | Sound playing device and noise reducing method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |