CN111128212A - Mixed voice separation method and device - Google Patents

Mixed voice separation method and device Download PDF

Info

Publication number
CN111128212A
CN111128212A CN201911252348.9A CN201911252348A CN111128212A CN 111128212 A CN111128212 A CN 111128212A CN 201911252348 A CN201911252348 A CN 201911252348A CN 111128212 A CN111128212 A CN 111128212A
Authority
CN
China
Prior art keywords
voice
audio track
mixed
far
recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911252348.9A
Other languages
Chinese (zh)
Inventor
李健
徐浩
梁志婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaozhen Information Technology Co Ltd
Original Assignee
Miaozhen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaozhen Information Technology Co Ltd filed Critical Miaozhen Information Technology Co Ltd
Priority to CN201911252348.9A priority Critical patent/CN111128212A/en
Publication of CN111128212A publication Critical patent/CN111128212A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02087Noise filtering the noise being separate speech, e.g. cocktail party

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a mixed voice separation method and a device, wherein the method comprises the following steps: respectively acquiring a near audio track and a far audio track by a near-end recording device and a far-end recording device, wherein the near audio track is a first mixed voice comprising a first voice and environmental noise, and the far audio track is a second mixed voice comprising the first voice, a second voice and the environmental noise; marking the starting time and the ending time of each recording of the first voice in the near audio track to obtain a first marking file; separating the mixed speech in the far-track according to the first markup file. In the invention, the first voice is recognized from the mixed voice by marking the starting time and the ending time of the first voice so as to separate the first voice and the second voice, thereby improving the voice separation effect under a complex background sound scene.

Description

Mixed voice separation method and device
Technical Field
The invention relates to the field of voice processing and recognition, in particular to a mixed voice separation method and device.
Background
Currently, recorders requiring voice separation are in the market mostly used in quiet environments, such as in cars, or in background sound law environments, such as watching television. The separation method is mostly two-dimensional horizontal placement or one-dimensional horizontal placement, for example, 2 to 6 MICs are used to judge the direction and type (voice, noise) of the voice through the propagation speed of the voice, so as to separate the voice (track) of the person in different directions.
However, in the above separation manner, in some complex environments, for example, in a service site, a scene in which background sound changes may result in that human voice (adulteration noise and environmental sound) cannot be correctly separated.
Disclosure of Invention
The embodiment of the invention provides a mixed voice separation method and a mixed voice separation device, which are used for at least solving the problem of unsatisfactory voice separation effect in a scene that background voice can change in the related art.
According to an embodiment of the present invention, there is provided a mixed speech separation method including: respectively acquiring a near audio track and a far audio track by a near-end recording device and a far-end recording device, wherein the near audio track is a first mixed voice comprising a first voice and environmental noise, and the far audio track is a second mixed voice comprising the first voice, a second voice and the environmental noise; marking the starting time and the ending time of each recording of the first voice in the near audio track to obtain a first marking file; separating the mixed speech in the far-track according to the first markup file.
Optionally, before labeling the start time and the end time of each recording of the first speech in the near-audio track to obtain a first label file, the method further includes: and obtaining the first voice by noise reduction processing and enhancement processing of the first voice in the near audio track based on the spatial information.
Optionally, before labeling the start time and the end time of each recording of the first speech in the near-audio track to obtain a first label file, the method further includes: and processing the first voice by a voice signal to judge the starting time point and the ending time point of each recording of the first voice.
Optionally, separating the second mixed speech in the far-track according to the first markup file comprises: and recognizing the voice in the mixed voice, which is the same as the time stamp in the first stamp file, as a first voice so as to cut the first voice and a second voice from the second mixed voice.
According to another embodiment of the present invention, there is provided a mixed voice separating apparatus including: the near-end recording module is used for acquiring a near-audio track, wherein the near-audio track is a first mixed voice comprising a first voice and environmental noise; the far-end recording module is used for acquiring a far-end audio track, wherein the far-end audio track is a second mixed voice comprising the first voice, a second voice and environmental noise; the marking module is used for marking the starting time and the ending time of each recording of the first voice in the near audio track to obtain a first marking file; and the separation module is used for separating the mixed voice in the far audio track according to the first mark file.
Optionally, the apparatus further comprises: and the preprocessing module is used for enhancing the first voice in the near audio track through noise reduction processing and based on spatial information to obtain the first voice.
Optionally, the apparatus further comprises: and the judging module is used for carrying out voice signal processing on the first voice so as to judge the starting time point and the ending time point of each recording of the first voice.
Optionally, the separation module comprises: a recognition unit, configured to recognize a voice in the second mixed voice that is the same as the time stamp in the first markup file as a first voice; and the segmentation unit is used for segmenting the first voice and the second voice from the second mixed voice.
According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
In the above-described embodiments of the present invention, the first voice is recognized from the mixed voice by marking the start and end times of the first voice to separate the first and second voices, thereby improving the voice separation effect in a complex background sound scene.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a hybrid speech separation method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a voice separation method in a service scenario according to an embodiment of the present invention;
fig. 3 is a block diagram of a hybrid voice separating apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a hybrid speech separation device according to an alternative embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In the present embodiment, a mixed speech separation method is provided, and fig. 1 is a flowchart of a method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, a near-end recording device and a far-end recording device are used for respectively acquiring a near audio track and a far audio track, wherein the near audio track is a first mixed voice comprising a first voice and environmental noise, and the far audio track is a second mixed voice comprising the first voice, a second voice and the environmental noise;
step S104, marking the starting time and the ending time of each recording of the first voice in the near audio track to obtain a first marking file;
and step S106, separating the mixed voice in the far-distance audio track according to the first mark file.
Before step S104 in this embodiment, the method may further include: and obtaining the first voice by noise reduction processing and enhancement processing of the first voice in the near audio track based on the spatial information. Enhancing the first voice in the near audio track based on the spatial information, specifically, setting a near-end sound recording device as a directional sound pickup device, wherein the directional sound pickup device takes the sound source direction of the first voice as a preset sound pickup direction; and performing enhancement processing on the audio data from the preset pickup direction to obtain clear first voice.
In the present embodiment, before separating the mixed speech in the far-audio track, enhancement processing is also performed on the non-ambient-noise speech (the first speech and the second speech that are not separated) in the far-audio track by noise reduction processing and based on spatial information; the processing may be the same as for the near soundtrack.
In this embodiment, before labeling the start time and the end time of each recording of the first speech in the near-audio track to obtain the first markup file, the method may further include: and processing the first voice by a voice signal to judge the starting time point and the ending time point of each recording of the first voice.
In this embodiment, the voice signal processing is performed on the first voice, specifically, voice endpoint detection may be performed on the first voice; the first voice may be a plurality of discontinuous voice segments, the starting endpoint and the ending endpoint of each voice segment can be marked through voice endpoint detection, and correspondingly, the starting time point and the ending time point of each voice segment in the first voice can be determined.
In addition, voice endpoint detection is also performed on the voices (the first voice and the second voice which are not separated) which are not the environmental noise in the far-audio track, and the starting time point and the ending time point of each voice segment in the voices (the first voice and the second voice which are not separated) which are not the environmental noise in the far-audio track are marked.
In step S106 of the embodiment, a voice of the mixed voice that is the same as the time stamp in the first markup file is recognized as a first voice to cut out the first voice and a second voice from the second mixed voice.
Based on the start time point and the end time point of each speech segment in the first speech and the start time point and the end time point of each speech segment in the non-ambient noise speech (the first speech and the second speech that are not separated) in the far-reaching audio track; it can be determined which speech segments in the non-ambient noise speech in the far-audio track are the first speech, and after the first speech is distinguished, the remaining speech segments are the second speech.
In order to facilitate understanding of the technical solutions provided by the present invention, the following description will be made with reference to an embodiment of a specific scenario.
The embodiment provides a mixed voice separation method. The method can be applied to various service scenarios. For example, a one-to-one service scenario, i.e., a recording of one question and one answer at the time of service. Or one-to-many service scenarios, i.e., one-to-many-answer recordings when served. The following takes as an example the separation of the voice of the attendant and the customer in the service location.
As shown in fig. 2, the method of the present embodiment mainly includes the following steps:
step S201, a near-end acquisition unit (first MIC) is arranged beside the mouth of the attendant to acquire a near-audio track, wherein the near-audio track is mixed audio comprising environmental noise and attendant voice, and the attendant voice can be obtained by performing noise reduction processing and enhancing processing on the attendant voice based on spatial information.
Step S202, a remote acquisition unit (second MIC) is set right in front of the attendant to acquire the remote track. For example, the second MIC may be worn in the form of a workcard on the chest of the attendant.
Step S203, the first MIC and the second MIC are recorded simultaneously, and recording time is marked respectively.
Step S204, processing the voice signal acquired by the first MIC by using, for example, VAD technology, determining a speaking start time point and an ending time point of the attendant, and adding a mark of the start time point and the ending time point to each voice fragment in the voice signal acquired by the first MIC to obtain a first mark file.
Step S205 separates the mixed voice in the far-end audio track according to the first markup file, and recognizes the voice identical to the time stamp in the first markup file as the voice of the attendant, thereby cutting the voice of the customer.
In this embodiment, the speech time of the attendant can be known by the marked recording time, and then it is inferred that the other time is that the customer is speaking, thereby realizing the separation of the voices of the attendant and the customer.
The voice separation technical scheme that this embodiment provided realizes simply, and it is easier to distinguish the waiter speech in the service scene that the background sound is complicated, and the power saving of the lower (simple algorithm) of power consumption of recorder saves storage space, can directly cut, and the application scene is clear and definite, and mixed speech just directly abandons, filters invalid audio frequency from the hardware end, saves the discernment power of calculating, for example, the customer speech finishes, and the waiter does not speak, just is invalid recording.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a mixed speech separation apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of which has been already made is omitted. As used below, the term "module" or "unit" may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a hybrid speech separation apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes a near-end recording module 10, a far-end recording module 20, a labeling module 30, and a separation module 40.
The near-end recording module 10 is configured to acquire a near-audio track, where the near-audio track is a first mixed voice including a first voice and ambient noise.
And a far-end recording module 20, configured to acquire a far-end audio track, where the far-end audio track is a second mixed audio including the first audio, the second audio, and the ambient noise.
And the labeling module 30 is configured to label the start time and the end time of each recording of the first speech in the near-track to obtain a first label file.
A separating module 40, configured to separate the mixed speech in the far-audio track according to the first markup file.
Fig. 4 is a block diagram of a hybrid speech separation apparatus according to an alternative embodiment of the present invention, and as shown in fig. 4, the apparatus may further include a preprocessing module 50 and a determining module 60 in addition to all the modules shown in fig. 3.
And the preprocessing module 50 is configured to perform noise reduction processing and enhancement processing on the first speech in the near audio track based on the spatial information to obtain the first speech.
The determining module 60 is configured to perform voice signal processing on the first voice to determine a start time point and an end time point of each recording of the first voice.
In this embodiment, the separation module 40 may further include a recognition unit 41 and a cutting unit 42.
A recognition unit 41, configured to recognize a voice in the second mixed voice that is the same as the time stamp in the first markup file as the first voice.
A segmenting unit 42, configured to segment the first voice and the second voice from the second mixed voice.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for separating mixed speech, comprising:
respectively acquiring a near audio track and a far audio track by a near-end recording device and a far-end recording device, wherein the near audio track is a first mixed voice comprising a first voice and environmental noise, and the far audio track is a second mixed voice comprising the first voice, a second voice and the environmental noise;
marking the starting time and the ending time of each recording of the first voice in the near audio track to obtain a first marking file;
separating the mixed speech in the far-track according to the first markup file.
2. The method of claim 1, wherein prior to labeling a start time and an end time of each recording of a first speech in the near-track to obtain a first markup file, further comprising:
and obtaining the first voice by noise reduction processing and enhancement processing of the first voice in the near audio track based on the spatial information.
3. The method of claim 2, wherein prior to labeling the start time and the end time of each recording of the first speech in the near-track to obtain the first markup file, further comprising:
and processing the first voice by a voice signal to judge the starting time point and the ending time point of each recording of the first voice.
4. The method of claim 3, wherein separating the second mixed speech in the far-audio track from the first markup file comprises:
and recognizing the voice with the same time stamp as that in the first label file in the second mixed voice as a first voice, and segmenting the first voice and the second voice from the second mixed voice.
5. A hybrid speech separation device, comprising:
the near-end recording module is used for acquiring a near-audio track, wherein the near-audio track is a first mixed voice comprising a first voice and environmental noise;
the far-end recording module is used for acquiring a far-end audio track, wherein the far-end audio track is a second mixed voice comprising the first voice, a second voice and environmental noise;
the marking module is used for marking the starting time and the ending time of each recording of the first voice in the near audio track to obtain a first marking file;
and the separation module is used for separating the mixed voice in the far audio track according to the first mark file.
6. The apparatus of claim 5, further comprising:
and the preprocessing module is used for enhancing the first voice in the near audio track through noise reduction processing and based on spatial information to obtain the first voice.
7. The apparatus of claim 6, further comprising:
and the judging module is used for carrying out voice signal processing on the first voice so as to judge the starting time point and the ending time point of each recording of the first voice.
8. The apparatus of claim 7, wherein the separation module comprises:
a recognition unit, configured to recognize a voice in the second mixed voice that is the same as the time stamp in the first markup file as a first voice;
and the segmentation unit is used for segmenting the first voice and the second voice from the second mixed voice.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 4 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 4.
CN201911252348.9A 2019-12-09 2019-12-09 Mixed voice separation method and device Pending CN111128212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911252348.9A CN111128212A (en) 2019-12-09 2019-12-09 Mixed voice separation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911252348.9A CN111128212A (en) 2019-12-09 2019-12-09 Mixed voice separation method and device

Publications (1)

Publication Number Publication Date
CN111128212A true CN111128212A (en) 2020-05-08

Family

ID=70497960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911252348.9A Pending CN111128212A (en) 2019-12-09 2019-12-09 Mixed voice separation method and device

Country Status (1)

Country Link
CN (1) CN111128212A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833898A (en) * 2020-07-24 2020-10-27 上海明略人工智能(集团)有限公司 Multi-source data processing method and device and readable storage medium
CN111934705A (en) * 2020-07-29 2020-11-13 杭州叙简科技股份有限公司 Voice anti-loss method and device for interphone, electronic equipment and medium
CN111986657A (en) * 2020-08-21 2020-11-24 上海明略人工智能(集团)有限公司 Audio recognition method and device, recording terminal, server and storage medium
CN112102825A (en) * 2020-08-11 2020-12-18 湖北亿咖通科技有限公司 Audio processing method and device based on vehicle-mounted machine voice recognition and computer equipment
CN112887875A (en) * 2021-01-22 2021-06-01 平安科技(深圳)有限公司 Conference system voice data acquisition method and device, electronic equipment and storage medium
CN113840028A (en) * 2021-09-22 2021-12-24 Oppo广东移动通信有限公司 Audio processing method and device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932665A (en) * 2014-03-19 2015-09-23 联想(北京)有限公司 Information processing method and electronic device
CN108198570A (en) * 2018-02-02 2018-06-22 北京云知声信息技术有限公司 The method and device of speech Separation during hearing
CN108725340A (en) * 2018-03-30 2018-11-02 斑马网络技术有限公司 Vehicle audio processing method and its system
CN109905764A (en) * 2019-03-21 2019-06-18 广州国音智能科技有限公司 Target person voice intercept method and device in a kind of video

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932665A (en) * 2014-03-19 2015-09-23 联想(北京)有限公司 Information processing method and electronic device
CN108198570A (en) * 2018-02-02 2018-06-22 北京云知声信息技术有限公司 The method and device of speech Separation during hearing
CN108725340A (en) * 2018-03-30 2018-11-02 斑马网络技术有限公司 Vehicle audio processing method and its system
CN109905764A (en) * 2019-03-21 2019-06-18 广州国音智能科技有限公司 Target person voice intercept method and device in a kind of video

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭华章: "《实用耳鼻喉头颈外科学 上》", 吉林科学技术出版社, pages: 59 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833898A (en) * 2020-07-24 2020-10-27 上海明略人工智能(集团)有限公司 Multi-source data processing method and device and readable storage medium
CN111934705A (en) * 2020-07-29 2020-11-13 杭州叙简科技股份有限公司 Voice anti-loss method and device for interphone, electronic equipment and medium
CN112102825A (en) * 2020-08-11 2020-12-18 湖北亿咖通科技有限公司 Audio processing method and device based on vehicle-mounted machine voice recognition and computer equipment
CN111986657A (en) * 2020-08-21 2020-11-24 上海明略人工智能(集团)有限公司 Audio recognition method and device, recording terminal, server and storage medium
CN111986657B (en) * 2020-08-21 2023-08-25 上海明略人工智能(集团)有限公司 Audio identification method and device, recording terminal, server and storage medium
CN112887875A (en) * 2021-01-22 2021-06-01 平安科技(深圳)有限公司 Conference system voice data acquisition method and device, electronic equipment and storage medium
CN112887875B (en) * 2021-01-22 2022-10-18 平安科技(深圳)有限公司 Conference system voice data acquisition method and device, electronic equipment and storage medium
CN113840028A (en) * 2021-09-22 2021-12-24 Oppo广东移动通信有限公司 Audio processing method and device, electronic equipment and computer readable storage medium
CN113840028B (en) * 2021-09-22 2022-12-02 Oppo广东移动通信有限公司 Audio processing method and device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN111128212A (en) Mixed voice separation method and device
WO2020237855A1 (en) Sound separation method and apparatus, and computer readable storage medium
EP2922051A1 (en) Method, device, and system for classifying audio conference minutes
US11315366B2 (en) Conference recording method and data processing device employing the same
CN109361825A (en) Meeting summary recording method, terminal and computer storage medium
CN111739553A (en) Conference sound acquisition method, conference recording method, conference record presentation method and device
CN107564531A (en) Minutes method, apparatus and computer equipment based on vocal print feature
JP2013527947A5 (en)
CN109829691B (en) C/S card punching method and device based on position and deep learning multiple biological features
WO2016029806A1 (en) Sound image playing method and device
CN111243595B (en) Information processing method and device
CN107680584B (en) Method and device for segmenting audio
CN110289016A (en) A kind of voice quality detecting method, device and electronic equipment based on actual conversation
US20130246061A1 (en) Automatic realtime speech impairment correction
CN110619897A (en) Conference summary generation method and vehicle-mounted recording system
US20140358528A1 (en) Electronic Apparatus, Method for Outputting Data, and Computer Program Product
CN113315979A (en) Data processing method and device, electronic equipment and storage medium
CN111311774A (en) Sign-in method and system based on voice recognition
CN111145774A (en) Voice separation method and device
CN109981448A (en) Information processing method and electronic equipment
CN105979469B (en) recording processing method and terminal
CN108830980A (en) Security protection integral intelligent robot is received in Study of Intelligent Robot Control method, apparatus and attendance
CN110400560B (en) Data processing method and device, storage medium and electronic device
CN111161710A (en) Simultaneous interpretation method and device, electronic equipment and storage medium
CN103929532A (en) Information processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508

RJ01 Rejection of invention patent application after publication