CN112562712A - Recording data processing method and system, electronic equipment and storage medium - Google Patents

Recording data processing method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN112562712A
CN112562712A CN202011549737.0A CN202011549737A CN112562712A CN 112562712 A CN112562712 A CN 112562712A CN 202011549737 A CN202011549737 A CN 202011549737A CN 112562712 A CN112562712 A CN 112562712A
Authority
CN
China
Prior art keywords
sound
audio
track
data processing
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011549737.0A
Other languages
Chinese (zh)
Inventor
吴光需
梁志婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202011549737.0A priority Critical patent/CN112562712A/en
Publication of CN112562712A publication Critical patent/CN112562712A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a recording data processing method, a system, electronic equipment and a storage medium, wherein the method comprises a first pickup step, a second pickup step and a third pickup step, wherein a first pickup device worn by a first speaker is used for collecting the sound of the first speaker, transmitting the sound to a second pickup device and storing the sound as a first audio track; a second sound pickup step of picking up the sound of a second party by using the second sound pickup apparatus and storing the sound as a second sound track; an audio generating step of processing the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus; and audio separation, namely performing voice role separation on the intermediate audio file. The invention solves the problems of higher cost and poor effect of the existing recording processing method.

Description

Recording data processing method and system, electronic equipment and storage medium
Technical Field
The invention belongs to the field of audio processing, and particularly relates to a recording data processing method and system applicable to a field recording technology, electronic equipment and a storage medium.
Background
In the scene through the recording of recording equipment in the current trade, through the recording equipment like three wheat, four wheat recording equipment improve the recording in-process distinguish the method that the person of wearing, the character sound of interlocutor were drawed through the recording separation to the recording process that the person of speaking reaches better effective recording, lower noise effect, changeing.
In the method, the non-effective recording of the environment is noise, the separation of the audio role recording is easily greatly interfered, errors are easily caused after the separation of the wearer and the interlocutor recording role, the scheme is solved by using higher cost and more complex technology, and the use cost is greatly improved in a commercial scene.
Disclosure of Invention
The embodiment of the application provides a recording data processing method, a recording data processing system, electronic equipment and a storage medium, and aims to at least solve the problems of high cost and poor effect of the existing recording processing method.
In a first aspect, an embodiment of the present application provides a method for processing recorded sound data, including: a first pickup step, using a first pickup device worn by a first speaker to pick up the sound of the first speaker, transmitting the sound to a second pickup device, and storing the sound as a first sound track; a second sound pickup step of picking up the sound of a second party by using the second sound pickup apparatus and storing the sound as a second sound track; an audio generating step of processing the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus; and audio separation, namely performing voice role separation on the intermediate audio file.
Preferably, the intermediate audio file is a dual-track single audio file or two dual-track single audio files.
Preferably, the audio separating step further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
Preferably, the audio separating step further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In a second aspect, an embodiment of the present application provides a recording data processing system, which is suitable for the above recording data processing method, and includes: a first sound pickup unit: collecting the sound of a first speaker by using a first sound pickup device worn on the first speaker, transmitting the sound to a second sound pickup device, and storing the sound as a first sound track; the second sound pickup unit is used for collecting the sound of a second conversation party by using the second sound pickup equipment and storing the sound as a second sound track; an audio generation unit that processes the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus; and the audio separation unit is used for separating voice roles of the intermediate audio file.
In some of these embodiments, the intermediate audio file is one dual-track single audio file or two dual-track single audio files.
In some of these embodiments, the audio separation unit further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
In some of these embodiments, the audio separation unit further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the sound recording data processing method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements a recording data processing method as described in the first aspect.
Compared with the related art, the recording data processing method provided by the embodiment of the application can be used for recording more accurate audio source recording in a mode with lower cost and lower technical difficulty.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a recording data processing method according to the present invention;
FIG. 2 is a block diagram of a recorded data processing system of the present invention;
FIG. 3 is a block diagram of an electronic device of the present invention;
in the above figures:
1. a first sound pickup unit; 2. a second sound pickup unit; 3. an audio generation unit; 4. an audio separation unit; 60. a bus; 61. a processor; 62. a memory; 63. a communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In a real environment, a speech signal of interest is usually interfered by noise, so that the intelligibility of speech is seriously damaged, and the performance of speech recognition is reduced. Front-end speech separation techniques are one of the most common methods for noise. A good front-end speech separation module can greatly improve the speech intelligibility and the recognition performance of an automatic speech recognition system.
From a signal processing point of view, many methods propose estimating the power spectrum of the noise or ideal wiener filters, such as spectral subtraction and wiener filtering. Where wiener filtering is the optimal filter to separate clean speech in the least mean square error sense. Given a noisy speech, it can infer the spectral coefficients of the speech given a priori distributions of speech and noise. Signal processing based methods typically assume that the noise is stationary or slowly varying. These methods can achieve better separation performance when the assumed conditions are satisfied. Compared with a signal processing method, the model-based method utilizes pure signals before mixing to respectively construct models of voice and noise, and important performance improvement is achieved under the condition of low signal-to-noise ratio. Among model-based speech separation methods, non-negative matrix factorization is a common modeling method that can mine local basis representation in non-negative data, and is currently widely applied to speech separation. Computational auditory scene analysis is another important speech separation technique that attempts to solve the speech separation problem by simulating the processing of sound by the human ear. The basic computational goal of computational auditory scene analysis is to estimate an ideal binary mask, which achieves speech separation based on the auditory masking of the human ear. Compared with other voice separation methods, the computational auditory scene analysis has no any hypothesis on noise and has better generalization performance.
Speech separation aims at separating useful signals from disturbed speech signals, a process that can naturally represent a supervised learning problem. A typical supervised speech separation system typically learns a mapping function from noisy features to separation targets, such as an ideal mask or a magnitude spectrum of the speech of interest, through a supervised learning algorithm, such as a deep neural network.
Embodiments of the invention are described in detail below with reference to the accompanying drawings:
fig. 1 is a flowchart of a recording data processing method according to the present invention, and referring to fig. 1, the recording data processing method according to the present invention includes the following steps:
s1: the first sound pickup equipment worn by the first speaker is used for collecting the sound of the first speaker, transmitting the sound to the second sound pickup equipment and storing the sound as the first sound track.
In the implementation, the recording object is divided into a wearer and a corresponding interlocutor, and the first sound pickup equipment is worn on the body of the wearer, optionally, the first sound pickup equipment may be an earphone; the first sound pickup device is used for collecting the sound of a wearer within a certain distance, and optionally, the certain distance can be 0.2 m.
In a specific implementation, after the sound of the wearer is collected, the recorded sound data is transmitted to a second sound pickup device, and optionally, the second sound pickup device may be a sound recorder; the second sound-collecting apparatus saves the received sound of the wearer in the form of one sound track.
In this step, a pickup device worn by the first talker is used to achieve targeted approach pickup.
S2: and collecting the sound of a second speaker by using the second sound pickup equipment, and storing the sound as a second sound track.
In a specific implementation, the second sound pickup device is used for collecting the sound of an interlocutor corresponding to the wearer, and optionally, the second sound pickup device is arranged within a certain distance radius of the interlocutor; alternatively, the certain distance may be 2 meters.
In an implementation, the second sound pick-up device saves the picked-up sound of the wearer in the form of one sound track.
S3: processing the first audio track and the second audio track into an intermediate audio file using the second pickup apparatus.
Optionally, the intermediate audio file is a dual-track single audio file or two dual-track single audio files.
In a specific implementation, the sound of the wearer and the sound of the interlocutor are stored in the form of two sound tracks respectively, and the second sound pickup equipment can select the two sound tracks to be processed in different forms; alternatively, the two tracks may be combined into one audio file with two tracks; alternatively, the two audio tracks may be separately generated into two audio files of a single audio track, respectively.
S4: and carrying out voice role separation on the intermediate audio file.
Optionally, the audio separating step further includes: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
Optionally, the audio separating step further includes: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In specific implementation, the intermediate audio file is separated through a voice separation algorithm, and the audio of the non-wearer in the audio track of the wearer is eliminated; and (4) outputting the audio of the non-interlocutors in the audio track of the interlocutors to obtain an audio file with less noise.
In specific implementation, the embodiment of the present application uses a fixed human voice as a main separation object, and therefore, optionally, a method based on spectrum mapping may be used for the voice separation algorithm, where the method based on spectrum mapping is to let a model learn a mapping relationship from a spectrum with interference to a spectrum without interference (clean voice) through supervised learning; the model may be DNN, CNN, LSTM or even GAN.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment of the application provides a recording data processing system, which is suitable for the recording data processing method. As used below, the terms "unit," "module," and the like may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
FIG. 2 is a block diagram of a recording data processing system according to the present invention, referring to FIG. 2, including:
first sound pickup unit 1: the first sound pickup equipment worn by the first speaker is used for collecting the sound of the first speaker, transmitting the sound to the second sound pickup equipment and storing the sound as the first sound track.
In the implementation, the recording object is divided into a wearer and a corresponding interlocutor, and the first sound pickup equipment is worn on the body of the wearer, optionally, the first sound pickup equipment may be an earphone; the first sound pickup device is used for collecting the sound of a wearer within a certain distance, and optionally, the certain distance can be 0.2 m.
In a specific implementation, after the sound of the wearer is collected, the recorded sound data is transmitted to a second sound pickup device, and optionally, the second sound pickup device may be a sound recorder; the second sound-collecting apparatus saves the received sound of the wearer in the form of one sound track.
In the unit, a pickup device worn by a first speaker is used to realize targeted approach pickup.
Second sound pickup unit 2: and collecting the sound of a second speaker by using the second sound pickup equipment, and storing the sound as a second sound track.
In a specific implementation, the second sound pickup device is used for collecting the sound of an interlocutor corresponding to the wearer, and optionally, the second sound pickup device is arranged within a certain distance radius of the interlocutor; alternatively, the certain distance may be 2 meters.
In an implementation, the second sound pick-up device saves the picked-up sound of the wearer in the form of one sound track.
The audio generation unit 3: processing the first audio track and the second audio track into an intermediate audio file using the second pickup apparatus.
Optionally, the intermediate audio file is a dual-track single audio file or two dual-track single audio files.
In a specific implementation, the sound of the wearer and the sound of the interlocutor are stored in the form of two sound tracks respectively, and the second sound pickup equipment can select the two sound tracks to be processed in different forms; alternatively, the two tracks may be combined into one audio file with two tracks; alternatively, the two audio tracks may be separately generated into two audio files of a single audio track, respectively.
The audio separation unit 4: and carrying out voice role separation on the intermediate audio file.
Optionally, the audio separation unit 4 further includes: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
Optionally, the audio separation unit 4 further includes: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
In specific implementation, the intermediate audio file is separated through a voice separation algorithm, and the audio of the non-wearer in the audio track of the wearer is eliminated; and (4) outputting the audio of the non-interlocutors in the audio track of the interlocutors to obtain an audio file with less noise.
In specific implementation, the embodiment of the present application uses a fixed human voice as a main separation object, and therefore, optionally, a method based on spectrum mapping may be used for the voice separation algorithm, where the method based on spectrum mapping is to let a model learn a mapping relationship from a spectrum with interference to a spectrum without interference (clean voice) through supervised learning; the model may be DNN, CNN, LSTM or even GAN.
In addition, a recording data processing method described in conjunction with fig. 1 may be implemented by an electronic device. Fig. 3 is a block diagram of an electronic device of the present invention.
The electronic device may comprise a processor 61 and a memory 62 in which computer program instructions are stored.
Specifically, the processor 61 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 62 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 62 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 62 may include removable or non-removable (or fixed) media, where appropriate. The memory 62 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 62 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 62 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 62 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 61.
The processor 61 realizes any one of the sound recording data processing methods in the above-described embodiments by reading and executing computer program instructions stored in the memory 62.
In some of these embodiments, the electronic device may also include a communication interface 63 and a bus 60. As shown in fig. 3, the processor 61, the memory 62, and the communication interface 63 are connected via a bus 60 to complete communication therebetween.
The communication port 63 may be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 60 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 60 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 60 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 60 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device can execute the recording data processing method in the embodiment of the application.
In addition, in combination with the recording data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement the recording data processing method. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the sound recording data processing methods in the above embodiments.
And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for processing recorded data, comprising:
a first pickup step, using a first pickup device worn by a first speaker to pick up the sound of the first speaker, transmitting the sound to a second pickup device, and storing the sound as a first sound track;
a second sound pickup step of picking up the sound of a second party by using the second sound pickup apparatus and storing the sound as a second sound track;
an audio generating step of processing the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus;
and audio separation, namely performing voice role separation on the intermediate audio file.
2. The audio recording data processing method of claim 1, wherein the intermediate audio file is one dual-track mono audio file or two dual-track mono audio files.
3. The recorded sound data processing method of claim 1, wherein the audio separating step further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
4. The recorded sound data processing method of claim 1 or 3, wherein the audio separating step further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
5. A recorded sound data processing system, comprising:
a first sound pickup unit: collecting the sound of a first speaker by using a first sound pickup device worn on the first speaker, transmitting the sound to a second sound pickup device, and storing the sound as a first sound track;
the second sound pickup unit is used for collecting the sound of a second conversation party by using the second sound pickup equipment and storing the sound as a second sound track;
an audio generation unit that processes the first audio track and the second audio track into an intermediate audio file using the second sound pickup apparatus;
and the audio separation unit is used for separating voice roles of the intermediate audio file.
6. The audio recording data processing system of claim 5 wherein said intermediate audio file is one dual track single audio file or two dual track single audio files.
7. The recorded sound data processing system of claim 5, wherein the audio separation unit further comprises: and eliminating the sound of a person who is not a first speaker in the first audio track in the intermediate audio file, and eliminating the sound of a person who is not a second speaker in the second audio track.
8. The recorded sound data processing system of claim 5 or 7, wherein the audio separating unit further comprises: and performing voice role separation on the intermediate audio file by using a voice separation algorithm.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the sound recording data processing method according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the sound recording data processing method according to any one of claims 1 to 4.
CN202011549737.0A 2020-12-24 2020-12-24 Recording data processing method and system, electronic equipment and storage medium Pending CN112562712A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011549737.0A CN112562712A (en) 2020-12-24 2020-12-24 Recording data processing method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011549737.0A CN112562712A (en) 2020-12-24 2020-12-24 Recording data processing method and system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112562712A true CN112562712A (en) 2021-03-26

Family

ID=75033282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011549737.0A Pending CN112562712A (en) 2020-12-24 2020-12-24 Recording data processing method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112562712A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470687A (en) * 2021-06-29 2021-10-01 北京明略昭辉科技有限公司 Audio acquisition and transmission device, audio processing system and audio acquisition and transmission method
CN113706844A (en) * 2021-08-31 2021-11-26 上海明略人工智能(集团)有限公司 Method and device for early warning of voice acquisition equipment, voice acquisition equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007322899A (en) * 2006-06-02 2007-12-13 D & M Holdings Inc Sound recording device
CN104123950A (en) * 2014-07-17 2014-10-29 深圳市中兴移动通信有限公司 Sound recording method and device
JP2017011754A (en) * 2016-09-14 2017-01-12 ソニー株式会社 Auricle mounted sound collecting apparatus, signal processing apparatus, and sound collecting method
JP2018013742A (en) * 2016-07-22 2018-01-25 富士通株式会社 Speech summary creation assist device, speech summary creation assist method, and speech summary creation assist program
US20180096705A1 (en) * 2016-10-03 2018-04-05 Nokia Technologies Oy Method of Editing Audio Signals Using Separated Objects And Associated Apparatus
US20190304437A1 (en) * 2018-03-29 2019-10-03 Tencent Technology (Shenzhen) Company Limited Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition
CN111128197A (en) * 2019-12-25 2020-05-08 北京邮电大学 Multi-speaker voice separation method based on voiceprint features and generation confrontation learning
CN111243579A (en) * 2020-01-19 2020-06-05 清华大学 Time domain single-channel multi-speaker voice recognition method and system
CN111586050A (en) * 2020-05-08 2020-08-25 上海明略人工智能(集团)有限公司 Audio file transmission method and device, storage medium and electronic equipment
CN111833898A (en) * 2020-07-24 2020-10-27 上海明略人工智能(集团)有限公司 Multi-source data processing method and device and readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007322899A (en) * 2006-06-02 2007-12-13 D & M Holdings Inc Sound recording device
CN104123950A (en) * 2014-07-17 2014-10-29 深圳市中兴移动通信有限公司 Sound recording method and device
JP2018013742A (en) * 2016-07-22 2018-01-25 富士通株式会社 Speech summary creation assist device, speech summary creation assist method, and speech summary creation assist program
JP2017011754A (en) * 2016-09-14 2017-01-12 ソニー株式会社 Auricle mounted sound collecting apparatus, signal processing apparatus, and sound collecting method
US20180096705A1 (en) * 2016-10-03 2018-04-05 Nokia Technologies Oy Method of Editing Audio Signals Using Separated Objects And Associated Apparatus
US20190304437A1 (en) * 2018-03-29 2019-10-03 Tencent Technology (Shenzhen) Company Limited Knowledge transfer in permutation invariant training for single-channel multi-talker speech recognition
CN111128197A (en) * 2019-12-25 2020-05-08 北京邮电大学 Multi-speaker voice separation method based on voiceprint features and generation confrontation learning
CN111243579A (en) * 2020-01-19 2020-06-05 清华大学 Time domain single-channel multi-speaker voice recognition method and system
CN111586050A (en) * 2020-05-08 2020-08-25 上海明略人工智能(集团)有限公司 Audio file transmission method and device, storage medium and electronic equipment
CN111833898A (en) * 2020-07-24 2020-10-27 上海明略人工智能(集团)有限公司 Multi-source data processing method and device and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470687A (en) * 2021-06-29 2021-10-01 北京明略昭辉科技有限公司 Audio acquisition and transmission device, audio processing system and audio acquisition and transmission method
CN113706844A (en) * 2021-08-31 2021-11-26 上海明略人工智能(集团)有限公司 Method and device for early warning of voice acquisition equipment, voice acquisition equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN107452389B (en) Universal single-track real-time noise reduction method
CN110970057B (en) Sound processing method, device and equipment
CN109788400B (en) Neural network howling suppression method, system and storage medium for digital hearing aid
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
CN113129917A (en) Speech processing method based on scene recognition, and apparatus, medium, and system thereof
CN111868823B (en) Sound source separation method, device and equipment
CN112562712A (en) Recording data processing method and system, electronic equipment and storage medium
CN104505099A (en) Method and equipment for removing known interference in voice signal
CN116403592A (en) Voice enhancement method and device, electronic equipment, chip and storage medium
TWI581255B (en) Front-end audio processing system
CN113205803A (en) Voice recognition method and device with adaptive noise reduction capability
CN112309417A (en) Wind noise suppression audio signal processing method, device, system and readable medium
CN114333896A (en) Voice separation method, electronic device, chip and computer readable storage medium
WO2017045512A1 (en) Voice recognition method and apparatus, terminal, and voice recognition device
CN114302286A (en) Method, device and equipment for reducing noise of call voice and storage medium
CN111933140B (en) Method, device and storage medium for detecting voice of earphone wearer
CN113039601B (en) Voice control method, device, chip, earphone and system
CN111009259B (en) Audio processing method and device
CN108899041B (en) Voice signal noise adding method, device and storage medium
CN105491336A (en) Image identification module with low power consumption
TWI761018B (en) Voice capturing method and voice capturing system
CN115293205A (en) Anomaly detection method, self-encoder model training method and electronic equipment
Birnie et al. Noise retf estimation and removal for low snr speech enhancement
CN111028851B (en) Sound playing device and noise reducing method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination