CN116798442A - Multi-application voice acquisition method, pickup equipment and system - Google Patents

Multi-application voice acquisition method, pickup equipment and system Download PDF

Info

Publication number
CN116798442A
CN116798442A CN202210258225.1A CN202210258225A CN116798442A CN 116798442 A CN116798442 A CN 116798442A CN 202210258225 A CN202210258225 A CN 202210258225A CN 116798442 A CN116798442 A CN 116798442A
Authority
CN
China
Prior art keywords
audio data
terminal
processing unit
communication unit
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210258225.1A
Other languages
Chinese (zh)
Inventor
王勇
黄朝敏
吴振志
吴涵渠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Otto Intelligent Technology Co ltd
Original Assignee
Wuhan Otto Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Otto Intelligent Technology Co ltd filed Critical Wuhan Otto Intelligent Technology Co ltd
Priority to CN202210258225.1A priority Critical patent/CN116798442A/en
Publication of CN116798442A publication Critical patent/CN116798442A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application relates to the field of audio data processing, and discloses a multi-application voice acquisition method, pickup equipment and a system. The method comprises the steps of collecting audio data sent by a second target application program of a second terminal from a first loudspeaker of the first terminal; processing the audio data to obtain first audio data; the first audio data is sent to the first terminal, so that the first terminal carries out secondary processing on the first audio data to obtain second audio data; and sending the second audio data to the translation application program so that the translation application program translates the second audio data, and the translation application program can smoothly acquire the audio data and translate the audio data by means of an audio transmission mode of low time delay of the pickup equipment.

Description

Multi-application voice acquisition method, pickup equipment and system
Technical Field
The present application relates to the field of audio data processing, and in particular, to a method, a pickup device, and a system for multi-application voice acquisition.
Background
With the continuous development of internet technology, sessions relying on the internet are becoming more popular, and it is becoming more common for users a and b to conduct remote video conferences through the internet.
When a user A and a user B conduct a remote video conference, because the microphone can generate howling after collecting the sound of the loudspeaker, the current conference system can offset the sound of the loudspeaker collected by the microphone by using a technical means.
However, because the user a and the user b use different voices, there is a voice obstacle, so that the dialogue speaking content needs to be translated into the native language by means of the translation software, in order to solve the howling, the sound of the loudspeaker collected by the microphone is counteracted by the active noise reduction technology, and further the translation software cannot collect the speaking content, so that the translation of the speaking content cannot be performed.
Disclosure of Invention
Accordingly, in order to solve the above-mentioned problems, it is necessary to provide a method, a sound pickup device and a system for multi-application voice collection, which can enable a translation application to successfully collect audio data and translate the audio data.
In a first aspect, an embodiment of the present application provides a method for multi-application voice collection, applied to a sound pickup apparatus, where the method includes:
collecting audio data sent by a second target application program of a second terminal from a first loudspeaker of the first terminal;
processing the audio data to obtain first audio data;
the first audio data are sent to the first terminal, so that the first terminal carries out secondary processing on the first audio data to obtain second audio data;
and sending the second audio data to a translation application program so that the translation application program translates the second audio data.
In some embodiments, the audio data includes ambient speech and audio speech.
In some embodiments, the processing the audio data to obtain first audio data includes:
and packetizing, encoding and compressing the audio data to obtain first audio data.
In some embodiments, the sending the first audio data to the first terminal, so that the first terminal performs secondary data on the first audio signal to obtain second audio data, includes:
and sending the first audio data to the first terminal so that the first terminal decompresses and decodes the first audio signal to obtain second audio data.
In a second aspect, an embodiment of the present application further provides a sound pickup apparatus including:
a third microphone;
the first processing unit is connected with the third microphone and is used for processing audio data;
the first communication unit is connected with the first processing unit and is used for sending the audio data to the second communication unit;
wherein the first processing unit includes:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.
In a third aspect, the embodiment of the present application further provides a multi-application voice acquisition system, the system including the pickup device of the second aspect, a second communication unit, and a second processing unit, the second communication unit being connected to the pickup device and the second processing unit respectively,
the second communication unit is used for receiving the audio data sent by the pickup device and sending the audio data to the second processing unit for secondary processing.
In some embodiments, the pickup apparatus includes a third microphone, a first processing unit, and a first communication unit, the first processing unit being connected to the third microphone and the first communication unit, respectively.
In some embodiments, the sound pickup apparatus is a sound pickup.
In a fourth aspect, embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by a processor, cause the processor to perform the above-described method.
In a fifth aspect, embodiments of the present application also provide a computer program product having at least one computer instruction stored therein, the at least one computer instruction being loaded and executed by a processor to cause the computer to implement the above-described method.
Compared with the prior art, the application has the beneficial effects that: in contrast to the situation of the prior art, in the method for multi-application voice acquisition provided by the embodiment of the application, audio data sent by the second target application program of the second terminal is acquired from the first loudspeaker of the first terminal, then the audio data is processed to obtain first audio data, then the first audio data is sent to the first terminal, so that the first terminal carries out secondary processing on the first audio data to obtain second audio data, finally the second audio data is sent to the translation application program, so that the translation application program translates the second audio data, and the translation application program can smoothly acquire the audio data and translate the audio data in a low-time-delay audio transmission mode by means of the pickup device.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
FIG. 1 is a schematic diagram of a multi-application speech acquisition system according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a multi-application voice acquisition method according to an embodiment of the present application;
fig. 3 is a schematic hardware structure of a first processing unit according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that, if not in conflict, the features of the embodiments of the present application may be combined with each other, which is within the protection scope of the present application. In addition, while functional block division is performed in a device diagram and logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. Furthermore, the words "first," "second," "third," and the like as used herein do not limit the order of data and execution, but merely distinguish between identical or similar items that have substantially the same function and effect.
The method for multi-application voice acquisition provided by the embodiment of the application is suitable for the application scenario shown in fig. 1, wherein the application scenario is a multi-application voice acquisition system, and the multi-application voice acquisition system 100 comprises a pickup device 30, a second communication unit 13 and a second processing unit 14, and the second communication unit 13 is respectively connected with the pickup device 30 and the second processing unit 14. The second communication unit 13 and the second processing unit 14 are operated in the first terminal 10, and the first terminal device 10 is communicatively connected to the second terminal 20 and the sound pickup device 30, respectively.
The first terminal device 10 is provided with a first microphone 11 and a first loudspeaker 12, the second terminal device 20 is provided with a second microphone 21 and a second loudspeaker 22, and the first terminal device 10 is installed and operated with a first target application a and a translation application B, and the second terminal device 20 is installed and operated with a second target application D. The first target application a and the second target application D may be, for example, an instant messaging application, a social application, or the like. The first terminal 10 and the second terminal 20 may be, for example, terminal devices such as a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The sound pickup apparatus may be, for example, a sound pickup.
Further, the first terminal 10 further includes a second communication unit 13 and a second processing unit 14, and the second communication unit 13 and the second processing unit 14 are communicatively connected. The sound pickup apparatus 30 includes a third microphone 31, a first processing unit 32, and a first communication unit 33, and the first processing unit 32 is connected to the third microphone and the first communication unit 33, respectively. The first processing unit 32 is configured to process the audio data, the first communication unit 33 is configured to send the processed audio data to the second communication unit 13, and the second communication unit 13 is configured to send the processed audio data to the second processing unit 14 for further processing, and then send the processed audio data to the translation application B through the interface.
The system is easy to understand, and a multi-application voice acquisition system will be described with reference to the accompanying drawings. As shown in fig. 1, the first user and the second user are internet users, the first user is a chinese, the second user is a foreigner, and the first user and the second user need to hold a video conference. The first user uses the first terminal 10 and the second user uses the second terminal 20. Since the first user and the second user use different voices, there may be a language barrier. Because the dialog's content of the speaking needs to be translated into the native language by means of the translation application B. But none of the popular conference systems on the market have this function.
The first user and the second user perform a conversation, the process in which the first user hears the second user, the second user performs voice input through the second microphone 21 of the second terminal 20, the second target application program D operated by the second terminal 20 collects audio data input by the second user through the second microphone 21, and then transmits the audio data to the first terminal 10 through the internet. The audio data is played by the first speaker 12 on the first terminal 10 when the first user is able to hear the voice content of the second user. But the first user and the second user have language barriers, so that the first user cannot understand the voice content of the second user.
In the above example, the second user hears the first user, and the first user inputs voice through the first microphone 11 of the first terminal 10, the first target application a running on the first terminal 10 collects audio data input by the first user through the first microphone 11, and the first target application a also collects audio data of the second user played through the first speaker 12, and when the first user and the second user repeatedly talk, the microphones and speakers on both sides repeatedly collect audio data and play, whereby howling noise occurs.
In order to avoid howling, the first microphone 11 is usually shielded from the sound collected from the first speaker 12 by an active noise reduction and cancellation technique, with opposite phases and a cancellation mechanism, so that the second user does not hear his own voice from the second microphone 21, thereby solving the howling noise. But the translation application B cannot collect audio data of the second user from the first microphone 11.
In order to solve the technical problem, the present application introduces a sound pickup apparatus 30, where the sound pickup apparatus 30 collects audio data played by the first speaker 12 through a third microphone 31, the audio data including ambient speech and audio speech, and then packetizes, encodes and compresses the audio data through a first processing unit 32 to obtain first audio data. The first audio data is then transmitted to the second communication unit 13 through the first communication unit 33, the second communication unit 13 sends the first audio data to the second processing unit 14 for decompression and decoding to obtain second audio data, and the second processing unit 14 transmits the second audio data to the translation application B through the interface, so that the translation application B translates the second audio data. Therefore, the translation application program can smoothly collect and translate the audio data by means of the audio transmission mode of the pickup device with low time delay.
It should be noted that, the method for multi-application voice acquisition provided by the embodiment of the present application may also be applied to other suitable application scenarios, and may also include more terminals in the actual application process.
As shown in fig. 2, an embodiment of the present application provides a method of multi-application voice acquisition, the method being performed by a sound pickup apparatus, the method including:
at step 210, audio data sent by a second target application of the second terminal is collected from a first speaker of the first terminal.
In an embodiment of the present application, the audio data includes ambient speech and audio speech. The audio voice is a voice played by a first loudspeaker of the first terminal. Specifically, after the second target application program of the second terminal sends the audio data to the first terminal, the first speaker of the first terminal plays the audio data, and the video device collects the audio data played by the first speaker.
And 220, processing the audio data to obtain first audio data.
Specifically, audio data is processed, firstly, the audio data is packetized by a first processing unit, then the audio data is encoded, and finally, the audio data is compressed, so that first audio data is obtained. It should be noted that, the first audio data and the second audio data below are defined for the purpose of illustrating the present application, and are relative concepts, not limiting the present application.
And 230, transmitting the first audio data to the first terminal, so that the first terminal performs secondary processing on the first audio data to obtain second audio data.
In the embodiment of the application, the first audio data is subjected to secondary processing, the first audio data is decompressed by the second processing unit, and then the first audio data is decoded, so that the second audio data is obtained. Specifically, the first communication unit of the pickup apparatus transmits first audio data to the first terminal, and the second processing unit of the first terminal decompresses and decodes the first audio data, thereby obtaining second audio data.
And step 240, sending the second audio data to a translation application program so that the translation application program translates the second audio data.
The second processing unit transmits the processed second audio data to the translation application program through the communication interface, and the translation application program translates the second audio data, so that the first user and the second user can smoothly communicate.
In the embodiment of the application, the audio data sent by the second target application program of the second terminal is acquired from the first loudspeaker of the first terminal, then the audio data is processed to obtain the first audio data, then the first audio data is sent to the first terminal, so that the first terminal carries out secondary processing on the first audio data to obtain the second audio data, finally the second audio data is sent to the translation application program, so that the translation application program translates the second audio data, and the translation application program can successfully acquire the audio data and translate the audio data by means of the audio transmission mode of the pick-up equipment with low time delay.
Fig. 3 is a schematic hardware structure of a first processing unit of the sound pickup apparatus according to the embodiment of the present application, and as shown in fig. 3, the first processing unit 300 includes:
one or more processors 301, one processor being illustrated in fig. 3, and a memory 302.
The processor 301 and the memory 302 may be connected by a bus or otherwise, for example in fig. 3.
The memory 302 is used as a non-volatile computer readable storage medium for storing non-volatile software programs, non-volatile computer executable programs and modules, such as program instructions/modules corresponding to the method for multi-application voice acquisition in the embodiment of the present application. The processor 301 executes various functional applications of the sound pickup apparatus and data processing, that is, a method of multi-application voice collection implementing the above-described method embodiment, by executing nonvolatile software programs, instructions, and modules stored in the memory 302.
Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the multi-application voice-captured device, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to a device that does not employ voice acquisition at all via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of multi-application speech acquisition in any of the method embodiments described above.
The embodiment of the application also provides a computer program or a computer program product, wherein at least one computer instruction is stored in the computer program or the computer program product, and the at least one computer instruction is loaded and executed by a processor, so that the computer realizes the multi-application voice acquisition method.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program may include processes implementing the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the application, the steps may be implemented in any order, and there are many other variations of the different aspects of the application as described above, which are not provided in detail for the sake of brevity; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (10)

1. A method of multi-application speech acquisition for a sound pickup apparatus, the method comprising:
collecting audio data sent by a second target application program of a second terminal from a first loudspeaker of the first terminal;
processing the audio data to obtain first audio data;
the first audio data are sent to the first terminal, so that the first terminal carries out secondary processing on the first audio data to obtain second audio data;
and sending the second audio data to a translation application program so that the translation application program translates the second audio data.
2. The method of claim 1, wherein the audio data comprises ambient speech and audio speech.
3. The method of claim 2, wherein processing the audio data to obtain first audio data comprises:
and packetizing, encoding and compressing the audio data to obtain first audio data.
4. A method according to claim 3, wherein said transmitting the first audio data to the first terminal to cause the first terminal to perform secondary data on the first audio signal to obtain second audio data comprises:
and sending the first audio data to the first terminal so that the first terminal decompresses and decodes the first audio signal to obtain second audio data.
5. A sound pickup apparatus, characterized by comprising:
a third microphone;
the first processing unit is connected with the third microphone and is used for processing audio data;
the first communication unit is connected with the first processing unit and is used for sending the audio data to the second communication unit;
wherein the first processing unit includes:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
6. A multi-application voice acquisition system, characterized in that the system comprises a sound pick-up device as claimed in claim 5, a second communication unit and a second processing unit, the second communication unit being connected to the sound pick-up device and the second processing unit, respectively,
the second communication unit is used for receiving the audio data sent by the pickup device and sending the audio data to the second processing unit for secondary processing.
7. The system of claim 6, wherein the sound pickup apparatus comprises a third microphone, a first processing unit, and a first communication unit, the first processing unit being coupled to the third microphone and the first communication unit, respectively.
8. The system of claim 7, wherein the sound pickup apparatus is a sound pickup.
9. A non-transitory computer readable storage medium storing computer executable instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-4.
10. A computer program product, characterized in that it has stored therein at least one computer instruction that is loaded and executed by a processor to cause the computer to implement the method according to any of claims 1-4.
CN202210258225.1A 2022-03-16 2022-03-16 Multi-application voice acquisition method, pickup equipment and system Pending CN116798442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210258225.1A CN116798442A (en) 2022-03-16 2022-03-16 Multi-application voice acquisition method, pickup equipment and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210258225.1A CN116798442A (en) 2022-03-16 2022-03-16 Multi-application voice acquisition method, pickup equipment and system

Publications (1)

Publication Number Publication Date
CN116798442A true CN116798442A (en) 2023-09-22

Family

ID=88040768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210258225.1A Pending CN116798442A (en) 2022-03-16 2022-03-16 Multi-application voice acquisition method, pickup equipment and system

Country Status (1)

Country Link
CN (1) CN116798442A (en)

Similar Documents

Publication Publication Date Title
EP3084633B1 (en) Attribute-based audio channel arbitration
WO2020063146A1 (en) Data transmission method and system, and bluetooth headphone
CN107995360B (en) Call processing method and related product
US11710488B2 (en) Transcription of communications using multiple speech recognition systems
CN110782907B (en) Voice signal transmitting method, device, equipment and readable storage medium
US9311920B2 (en) Voice processing method, apparatus, and system
AU2014357638B2 (en) Multi-path audio processing
CN113284500B (en) Audio processing method, device, electronic equipment and storage medium
CN112565668B (en) Method for sharing sound in network conference
US11580985B2 (en) Transcription of communications
CN112786070A (en) Audio data processing method and device, storage medium and electronic equipment
CN110224904B (en) Voice processing method, device, computer readable storage medium and computer equipment
US20230146871A1 (en) Audio data processing method and apparatus, device, and storage medium
CN116798442A (en) Multi-application voice acquisition method, pickup equipment and system
WO2013142705A1 (en) Voice communication method and apparatus and method and apparatus for operating jitter buffer
CN114979344A (en) Echo cancellation method, device, equipment and storage medium
US11321047B2 (en) Volume adjustments
US20200184973A1 (en) Transcription of communications
CN110730408A (en) Audio parameter switching method and device, electronic equipment and storage medium
CN114760389B (en) Voice communication method and device, computer storage medium and electronic equipment
CN112543202B (en) Method, system and readable storage medium for transmitting shared sound in network conference
CN110225364B (en) Video processing method, device, terminal, server and storage medium
JP7017755B2 (en) Broadcast wave receiver, broadcast reception method, and broadcast reception program
CN116036591A (en) Sound effect optimization method, device, equipment and storage medium
CN116939215A (en) Video frame coding transmitting method and device, storage medium, product and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination