CN112995838B - Sound pickup apparatus, sound pickup system, and audio processing method - Google Patents

Sound pickup apparatus, sound pickup system, and audio processing method Download PDF

Info

Publication number
CN112995838B
CN112995838B CN202110225053.3A CN202110225053A CN112995838B CN 112995838 B CN112995838 B CN 112995838B CN 202110225053 A CN202110225053 A CN 202110225053A CN 112995838 B CN112995838 B CN 112995838B
Authority
CN
China
Prior art keywords
target
microphone
pickup
signal
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110225053.3A
Other languages
Chinese (zh)
Other versions
CN112995838A (en
Inventor
杜艳斌
郑伟军
陈仁武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202110225053.3A priority Critical patent/CN112995838B/en
Publication of CN112995838A publication Critical patent/CN112995838A/en
Application granted granted Critical
Publication of CN112995838B publication Critical patent/CN112995838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

In the sound pickup apparatus, the sound pickup system, and the audio processing method provided in the present specification, a target angle is raised by adjusting an installation angle of each single-directional microphone in a microphone array, so as to change a sound pickup direction of the microphone, and satisfy both a near-field sound pickup mode and a far-field sound pickup mode; and the control host can autonomously identify a target mode of a microphone signal output by the microphone array, namely whether the microphone signal is a signal collected in a near-field pickup mode or a far-field pickup mode, so that a target algorithm corresponding to the target mode is selected to perform the signal processing on the microphone signal, and the voice quality of the output target audio is improved. The sound pickup equipment, the sound pickup system and the audio processing method provided by the specification can improve the use convenience of the sound pickup equipment, expand the use scene of the sound pickup equipment, reduce the use difficulty of a user and simultaneously improve the voice quality of an output target audio signal.

Description

Sound pickup apparatus, sound pickup system, and audio processing method
Technical Field
The present disclosure relates to the field of voice capture technologies, and in particular, to a sound pickup apparatus, a sound pickup system, and an audio processing method.
Background
Remote conference is an indispensable affair of current enterprises and public institutions, the voice quality of conversation is a core needing attention, and the quality directly influences the conference effect. Most of the existing sound pickup apparatuses support only a far-field sound pickup mode or only a near-field sound pickup mode. Pickup equipment that can realize far field pickup mode and near field pickup mode simultaneously needs to be equipped with near field pickup module and far field pickup module simultaneously, realizes near field pickup mode through near field pickup module, realizes far field pickup mode through far field pickup module, can't realize far field pickup mode and near field pickup mode simultaneously through same group pickup module, and its is with high costs, and the structure is complicated.
Therefore, it is desirable to provide a sound pickup apparatus, a sound pickup system, and an audio processing method that have a simple structure, can simultaneously realize a far-field sound pickup mode and a near-field sound pickup mode by the same group of sound pickup modules, and can recognize the far-field sound pickup mode and the near-field sound pickup mode.
Disclosure of Invention
The present specification provides a sound pickup apparatus, a sound pickup system, and an audio processing method capable of automatic labeling, improving accuracy, and reducing cost.
In a first aspect, the present disclosure provides a pickup apparatus, which is communicatively connected to a control host during operation, and includes a housing and a microphone array, where the microphone array is installed on the housing, includes a plurality of microphones distributed in a preset array shape, each of the plurality of microphones is a single-directional microphone, and during operation, collects a sound signal in a corresponding pickup direction to generate a microphone signal, and the plurality of microphones generate a plurality of microphone signals, and the microphone array simultaneously supports a near-field pickup mode and a far-field pickup mode, where the control host during operation receives the plurality of microphone signals, determines a target pattern of the plurality of microphone signals, and performs signal processing on the plurality of microphone signals based on a target algorithm corresponding to the target pattern, where the target pattern includes one of the near-field pickup pattern and the far-field pickup pattern, and the target algorithm includes one of a first algorithm corresponding to the near-field pickup pattern and a second algorithm corresponding to the far-field pickup pattern.
In some embodiments, the housing includes a bottom surface, and the pickup direction raises a target angle in a direction away from the bottom surface, the target angle being an acute angle.
In some embodiments, the pickup apparatus further includes an angle measuring device, which is in communication with the control host, and the control host receives measurement data from the angle measuring device.
In some embodiments, the angle measurement device comprises at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
In some embodiments, the determining the target pattern of the plurality of microphone signals comprises at least one of: determining the target pattern based on intensity differences among the plurality of microphone signals; and determining the target pattern based on a change in the measurement data.
In some embodiments, said determining said target pattern based on said change in measurement data comprises: determining angle change data of the sound pickup apparatus based on the change of the measurement data; determining a target state of the pickup device based on the angle change data, the target state including one of a fixed state and a moving state, wherein the fixed state corresponds to the far-field pickup mode and the moving state corresponds to the near-field pickup mode; and determining the target mode based on the target state.
In a second aspect, the present specification provides a pickup system including the pickup apparatus of the first aspect of the present specification and the control host, the control host including at least one processor, communicatively coupled to the pickup apparatus during operation, to receive the plurality of microphone signals, to determine the target pattern of the plurality of microphone signals, and to perform the signal processing on the plurality of microphone signals based on a target algorithm corresponding to the target pattern, the target pattern including one of the near-field pickup pattern and the far-field pickup pattern, the target algorithm including one of the first algorithm corresponding to the near-field pattern and the second algorithm corresponding to the far-field pickup pattern.
In some embodiments, the control host is mounted on the housing, and the communication connection comprises an electrical connection.
In some embodiments, the pickup equipment and the control host are arranged separately, and the pickup equipment is an external expansion equipment of the control host.
In some embodiments, the housing includes a bottom surface, and the pickup direction raises a target angle in a direction away from the bottom surface, the target angle being an acute angle.
In some embodiments, the pickup apparatus further includes an angle measuring device, which is in communication with the control host, and the control host receives measurement data from the angle measuring device.
In some embodiments, the angle measurement device comprises at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
In a third aspect, the present specification further provides an audio processing method for the sound pickup system according to the second aspect, the method including the step of executing, by the control host: receiving the plurality of microphone signals; determining the target pattern of the plurality of microphone signals; and performing the signal processing on the plurality of microphone signals based on the target algorithm corresponding to the target mode to acquire a target sound.
In some embodiments, the sound pickup apparatus further includes an angle measuring device, which is in communication connection with the control host, and the control host receives measurement data of the angle measuring device.
In some embodiments, the angle measurement device comprises at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
In some embodiments, the determining the target pattern of the plurality of microphone signals comprises at least one of: determining the target pattern based on intensity differences among the plurality of microphone signals; and determining the target pattern based on a change in the measurement data.
In some embodiments, said determining said target pattern based on said change in measurement data comprises: determining angle change data of the sound pickup apparatus based on the change of the measurement data; determining a target state of the pickup based on the angle change data, the target state including one of a fixed state and a moving state, wherein the fixed state corresponds to the far-field pickup mode and the moving state corresponds to the near-field pickup mode; and determining the target mode based on the target state.
In some embodiments, said determining said target pattern based on intensity differences between said plurality of microphone signals comprises: determining a spectral characteristic of said each of said plurality of microphone signals; determining a reference intensity of each microphone signal based on the spectral characteristics of each microphone signal; determining at least one intensity difference of the plurality of microphone signals with respect to each other based on the reference intensity of each microphone signal; and comparing the at least one intensity difference with a preset difference threshold value to determine the target mode.
In some embodiments, the comparing the at least one intensity difference with a preset difference threshold to determine the target mode includes: determining that at least one of the at least one intensity difference exceeds the difference threshold, determining that the target mode is the near-field pickup mode; or determining that the at least one intensity difference is smaller than the difference threshold value, and determining that the target pattern is the far-field pickup pattern.
In some embodiments, said determining a reference strength of said each microphone signal comprises: and performing feature fusion calculation on a plurality of signal intensities corresponding to a plurality of frequencies in the high-frequency region of each microphone signal to acquire the reference intensity of the high-frequency region.
In some embodiments, the feature fusion calculation comprises a mean calculation.
In some embodiments, in the determining the target pattern of the plurality of microphone signals, further comprises: pre-processing the each microphone signal, the pre-processing including at least one of echo cancellation and reverberation cancellation.
In some embodiments, said signal processing said plurality of microphone signals based on a target algorithm corresponding to said target pattern, obtaining target audio, comprises: performing the signal processing on the plurality of microphone signals based on the first algorithm to obtain a first target audio; or performing the signal processing on the plurality of microphone signals based on the second algorithm to obtain a second target audio, wherein the target audio includes the first target audio or the second target audio.
In some embodiments, said signal processing said plurality of microphone signals based on said first algorithm to obtain a first target audio comprises: determining a primary microphone signal and a secondary microphone signal of the plurality of microphone signals; performing noise reduction processing on each of the microphone signals based on a difference in intensity between the primary microphone signal and the secondary microphone signal; performing signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain a first fusion signal; and performing gain control on the first fusion signal to determine the first target audio.
In some embodiments, said performing said signal processing on said plurality of microphone signals based on said second algorithm to obtain a second target audio comprises: determining the position of a target sound source based on a sound source positioning algorithm; performing noise reduction processing on each microphone signal; performing signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain a second fusion signal; and performing gain control on the second fusion signal to determine the second target audio.
As can be seen from the above technical solutions, the sound pickup apparatus, the sound pickup system, and the audio processing method provided in the present specification can simultaneously support a far-field sound pickup mode and a near-field sound pickup mode by using a microphone array formed by a plurality of microphones. The control host receives the microphone signals collected by the microphone array, carries out pickup mode identification on the microphone signals in a target mode, determines whether the microphone signals are signals collected in a near-field pickup mode or signals collected in a far-field pickup mode, and then carries out signal processing on the microphone signals by adopting a corresponding target algorithm. The signal processing methods for the microphone signal collected in the far-field sound pickup mode and the microphone signal collected in the near-field sound pickup mode are different. The microphone signals collected in the near-field sound pickup mode are subjected to signal processing through a first algorithm, and the microphone signals collected in the far-field sound pickup mode are subjected to signal processing through a second algorithm. The control host can determine the pose state of the pickup equipment through angle change data of the pickup equipment measured by an angle measuring device arranged on the pickup equipment, and judge whether the pickup equipment is in a moving state or a fixed state, so that the target mode of a microphone signal is judged. When the pickup equipment is in a moving state, the pickup equipment is in a near-field pickup mode, and when the pickup equipment is in a fixed state, the pickup equipment is in a far-field pickup mode. The control host can also determine the target state of the microphone signals according to the signal intensity difference among the microphone signals collected by a plurality of microphones in the microphone array. When the difference in signal strength between the microphone signals exceeds a threshold value, it represents that the microphone signals are signals collected in a near-field sound pickup mode, and when the difference in signal strength between the microphone signals does not exceed the threshold value, it represents that the microphone signals are signals collected in a far-field sound pickup mode. The pickup equipment, the pickup system and the audio processing method provided by the specification can simultaneously realize a far-field pickup mode and a near-field pickup mode through the same group of microphone modules, so that the equipment cost and the equipment volume are reduced, the microphone signal can be automatically identified to be a signal acquired by the far-field pickup mode or a signal acquired by the near-field pickup mode, and the microphone signal is subjected to signal processing by adopting a corresponding target algorithm so as to improve the voice quality.
Other functions of the sound pickup apparatus, sound pickup system, and audio processing method provided in the present specification will be partially listed in the following description. The following numerical and exemplary descriptions will be readily apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the sound pickup apparatus, sound pickup system, and audio processing method provided in the present specification can be fully explained by practicing or using the methods, apparatuses, and combinations described in the following detailed examples.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view illustrating an application scenario of a sound pickup system provided in an embodiment of the present specification;
fig. 2 is a schematic structural diagram illustrating a sound pickup apparatus provided in accordance with an embodiment of the present specification;
fig. 3 is a top view illustrating a sound pickup range of a sound pickup apparatus provided in accordance with an embodiment of the present specification;
fig. 4 illustrates a side view of a single microphone pickup range of a pickup apparatus provided in accordance with an embodiment of the present description;
fig. 5 is a schematic diagram illustrating an operation of a sound pickup apparatus according to an embodiment of the present specification;
fig. 6 shows a schematic diagram of a device for controlling a host provided according to an embodiment of the present description;
FIG. 7 is a flow chart illustrating an audio processing method provided in accordance with an embodiment of the present description;
FIG. 8 illustrates a flow chart of a method for determining a target pattern based on intensity differences between multiple microphone signals provided in accordance with embodiments of the present description; and
fig. 9 is a flowchart illustrating a method for determining a target pattern based on a change in measurement data according to an embodiment of the present disclosure.
Detailed Description
The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present description. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are intended to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
These and other features of the present specification, as well as the operation and function of the related elements of structure and the combination of parts and economies of manufacture, may be significantly improved upon consideration of the following description. Reference is made to the accompanying drawings, all of which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the specification. It should also be understood that the drawings are not drawn to scale.
The flow diagrams used in this specification illustrate the operation of system implementations according to some embodiments of the specification. It should be clearly understood that the operations of the flow diagrams may be performed out of order. Rather, the operations may be performed in reverse order or simultaneously. In addition, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.
First, a part of nouns or terms appearing in the course of describing the present specification is applicable to the following explanations:
microphone Array (Microphone Array): physically, it means that a plurality of microphones are arranged in order, that is, a system is composed of a certain number of acoustic sensors (generally microphones) and is used for sampling and processing the spatial characteristics of a sound field, and because the positions of the plurality of microphones are different, the times of sound signals arriving at different microphones are different; algorithmically, a sound signal in a specific direction is enhanced by using time difference information. After the microphones are arranged according to the specified requirements, the acoustic problems of a plurality of rooms can be solved by adding a corresponding algorithm (arrangement + algorithm), such as sound source positioning, dereverberation, voice enhancement, blind source separation and the like. Speech enhancement refers to the process of extracting clean speech from noisy sound signals when the sound signals are disturbed or even drowned out by various noises (including speech). The sound source positioning technology is a pretreatment technology which is very important in the fields of human-computer interaction, audio and video conferences and the like and is used for calculating the angle and the distance of a target speaker by using a microphone array so as to realize the tracking of the target speaker and the subsequent directional voice pickup. The dereverberation technology can well perform self-adaptive estimation on the reverberation condition of a room, so that the pure signal can be well restored, and the voice audibility and the recognition effect are remarkably improved. The sound source signal extraction is to extract a target signal from a plurality of sound signals, and the sound source signal separation technology is to extract all of a plurality of mixed sounds required.
Fig. 1 shows a schematic structural diagram of a sound pickup system 001 (hereinafter, referred to as system 001) provided according to an embodiment of the present disclosure. The system 001 may be applied to a conference system. As shown in fig. 1, the system 001 may include a sound pickup apparatus 100 and a control host 200. The sound pickup apparatus 100 needs to be communicatively connected to the control main unit 200 when operating. The communication connection refers to any form of connection capable of receiving information directly or indirectly. For example, the control host 200 may establish a wireless connection with the sound pickup apparatus 100 through wireless communication to transfer data with each other; the control host 200 can also be directly connected with the sound pickup apparatus 100 through a wire to transfer data with each other; the control host 200 may also establish an indirect connection with the sound pickup apparatus 100 by directly connecting a wire to other circuits, thereby enabling data to be transferred to each other.
The control host 200 may be integrated with the sound pickup apparatus 100, the control host 200 may be installed in a housing of the sound pickup apparatus 100, and the control host 200 may be electrically connected to the sound pickup apparatus 100 to implement the communication. The housing of the sound pickup apparatus 100 will be described in detail later in the description. When the control host 200 is integrated with the sound pickup apparatus 100, other sound pickup apparatuses 100 may be externally connected to the control host 200. The control master 200 may be provided separately from the sound collecting apparatus 100, and the sound collecting apparatus 100 may be used as an external expansion apparatus of the control master 200. In the system 001 shown in fig. 1, the control unit 200 is provided separately from the sound collecting apparatus 100. The system 001 may include 1 sound pickup apparatus 100, or may include a plurality of sound pickup apparatuses 100. The sound pickup device 100 is an external expansion device of the control host 200, is a portable device capable of moving freely, and can move freely according to the requirements of participants during the use process. The participant can place the pickup equipment on the conference table to be used as far-field pickup equipment, and can also be held in the hand to be used as near-field pickup equipment. In the system 001 shown in fig. 1, 2 sound pickup apparatuses 100 are shown, distributed at both ends of the conference table.
Fig. 2 shows a schematic structural diagram of a sound pickup apparatus 100 provided according to an embodiment of the present specification. As shown in fig. 2, the sound pickup apparatus 100 may include a housing 110 and a microphone array 120. In some embodiments, the sound pickup apparatus 100 may further include an angle measuring device 130. In some embodiments, the sound pickup apparatus 100 may further include a gain controller (not shown in fig. 2). In some embodiments, the sound pickup apparatus 100 may further include a plurality of pointing devices (not shown in fig. 2). In some examples, the sound pickup apparatus 100 may further include a communication interface (not shown in fig. 2).
The housing 110 may be a mounting support member of the sound pickup apparatus 100. The microphone array 120, the angle measurement device 130, the gain controller 140, the plurality of indication devices 160, and the communication interface 170 may all be mounted on the housing 110. When the control unit 200 is integrated with the sound pickup apparatus 100, the control unit 200 may be mounted on the housing 110. The housing 110 may include a bottom surface 111. The bottom surface 111 may be a flat surface to facilitate placement of the sound pickup apparatus 100. The shape of the housing 110 shown in fig. 2 is merely an example, and the appearance, shape, and material of the housing 110 are not limited in this specification.
As shown in fig. 2, the microphone array 120 may be mounted inside the housing 110 or may be mounted on the outer surface of the housing 110. The microphone array 120 may include a plurality of microphones 122. The plurality of microphones 122 may be distributed in a preset array shape. The number of microphones 122 in the microphone array 120 may be 2 or more than 2. In some embodiments, the number of microphones 122 in the microphone array 120 may also be 1. For convenience of description, 3 microphones 122 are shown in fig. 2. The preset array shape can be a linear array, a circular array, a rectangular array, and the like. For convenience of description, the plurality of microphones 122 in the microphone array 120 shown in fig. 2 are distributed in a ring shape. The microphone array formed by the fully directional microphones is suitable for far-field pickup and cannot meet the requirements of far-field pickup and near-field pickup at the same time. Therefore, each microphone 122 of the plurality of microphones 122 may be a single-directional microphone, and may collect a sound signal in a corresponding pickup direction to generate a microphone signal during operation. The sound pickup apparatus 100 may collect multiple microphone signals. The plurality of microphones 122 generates a plurality of microphone signals. The plurality of microphones 122 may be directed in different pickup directions. The sound pickup direction may be an angular range, i.e., a sound pickup range. When a sound source is within the pick-up range of one of the microphones 122, the sound signal emitted or propagated by the sound source entering the current microphone 122 along the angular range is the microphone signal collected by the current microphone 122. Multiple microphones 122 may pick up sound signals within different pickup ranges. The combination of multiple microphones 122 may cover a wider pickup range. The microphone array 120 may pick up different sound pickup ranges according to the difference of the beam widths of the microphones 122 and the array shapes of the microphones 122.
As shown in fig. 2, the sound pickup direction of the microphone 122 may be a target angle a that is acute and raised in a direction away from the bottom surface 111. That is, each microphone 122 of the microphone array 120 may have an upward elevation angle a for changing the sound pickup direction of the microphone array 120 to facilitate picking up a sound in a vertical direction, so that the microphone array 120 meets both the near-field sound pickup requirement and the far-field sound pickup requirement. Under the meeting scene, when sound collecting equipment 100 placed at the desktop and carried out far-field pickup, the participant was higher than the meeting desktop, and the target angle a that microphone 122 was upwards raised makes microphone 122's pickup direction point to the ascending direction of the place ahead slant to pick up the sound that the participant at meeting table sent better, thereby improve the pickup effect. When handheld pickup apparatus 100 of participant was in order to carry out the near field pickup, the target angle that microphone 122 upwards raised makes the participant aim at microphone 122's pickup direction with the mouth more easily to improve the pickup effect and reduce pickup apparatus 100's the use degree of difficulty, promote user experience.
Fig. 3 is a top view illustrating a sound pickup range of the sound pickup apparatus 100 according to the embodiment of the present specification; fig. 4 illustrates a single microphone pickup range side view of a pickup apparatus 100 provided in accordance with embodiments of the present description. The sound pickup apparatus 100 as shown in fig. 3 and 4 may include 3 microphones 122. Each microphone 122 corresponds to a sound pickup area. For ease of illustration, we label the 3 microphones 122 as 122-1, 122-2, and 122-3, respectively. The pickup areas corresponding to the 3 microphones 122 are labeled 1, 2, and 3, respectively. The sound pickup apparatus 100 shown in fig. 3 can pick up sound signals in a range of 360 degrees. The sound pickup direction of the microphone 122 shown in fig. 4 has an upward elevation angle a, which makes it easier to pick up a sound signal in the vertical direction, so that the sound pickup apparatus 100 can satisfy both the near-field sound pickup requirement and the far-field sound pickup requirement. The pickup range coverage space of the microphone array 120 in the pickup equipment 100 is wide, the pickup direction can meet the conference requirements better, and participants do not need to spend effort to find the position of the microphone 122, so that the voice quality of the participants nearby is improved. Of course, the sound pickup range of the microphone array 120 may be set according to different use requirements.
Fig. 5 illustrates an operation diagram of a sound pickup apparatus 100 provided according to an embodiment of the present specification. As shown in fig. 5, the sound pickup apparatus 100 can pick up sound from a participant regardless of the change in the distance of the mouth of the participant from the sound pickup apparatus 100.
The adapterization equipment 100 that this specification provided is under the prerequisite of guaranteeing directional pickup advantage, changes microphone 122's installation angle to change microphone 122's pickup direction, enlarge the pickup range, make adapterization equipment 100 satisfy near field pickup demand and far field pickup demand simultaneously, reduce the user and use the degree of difficulty, and then promote user and use experience, promote the pickup effect. The sound pickup apparatus 100 provided by the present specification supports both the near-field sound pickup mode and the far-field sound pickup mode, and both the sound pickup modes can be operated simultaneously and can be switched arbitrarily. The user need not to switch over the pickup mode of pickup 100 deliberately, and when the user required pickup 100 to operate far-field pickup mode, only need keep away from user's mouth with pickup 100, pickup 100 alright with far-field pickup mode of self-running carries out remote pickup. When a user needs to use the apparatus 100 to operate the near-field sound pickup mode, the sound pickup apparatus 100 may operate the near-field sound pickup mode by itself for near-distance sound pickup only by bringing the sound pickup apparatus 100 close to the mouth of the user. The pickup equipment 100 provided by the specification does not need a user to actively switch the pickup mode, can automatically perform near-field pickup or far-field pickup according to a use scene, reduces the use difficulty of the user, and improves the experience. And the sound pickup range of the sound pickup apparatus 100 is not limited by distance and scenes, thereby making the sound pickup apparatus 100 suitable for more scenes.
In some embodiments, the tone pickup apparatus 100 may further include a windshield (not shown in FIG. 2). The windshield is mounted inside the housing 110 above the microphone array 120 for preventing part of the wind noise generated by the mouth airflow in the near-field pick-up mode of the handheld sound pick-up device 100. The meeting scene is in the room, and the stronger air current can not appear generally, can not have wind noise (air conditioner, window air current are weaker, can not cause wind noise). However, when the user holds the sound pickup apparatus 100, the mouth of the user is closer to the sound pickup apparatus 100, and the airflow generated by the mouth is stronger, which may cause wind noise and a phenomenon of microphone ejection. Mounting a layer of wind shield over the microphone array 120 may address wind noise to some extent. The thickness of windshield material is in predetermineeing the within range, can not be too thick, otherwise can influence the pickup effect.
In some embodiments, the sound pickup apparatus 100 may further include an angle measuring device 130. The angle measuring device 130 may be mounted in the housing 110 and may be used to measure pose data of the sound pickup apparatus 100, i.e. the pose of the sound pickup apparatus 100 in space, i.e. the angle in space. The angle measuring device 130 may be communicatively coupled to the control host 200. The control host 200 may receive the measurement data of the angle measuring device 130 and determine whether the sound pickup apparatus 100 is in a moving state based on the measurement data. When the measured data changes within a preset time period or within a preset time period, it represents that the sound pickup apparatus 100 is in a moving state, and at this time, the sound pickup apparatus 100 may be in a hand of a participant, that is, the sound pickup apparatus 100 is in a near-field sound pickup mode. When the measured data is not changed or is slightly changed and does not exceed the threshold within the preset time period, or when the measured data is not changed or is slightly changed and does not exceed the threshold within the preset time period, it represents that the sound pickup apparatus 100 is in a fixed state, and at this time, the sound pickup apparatus 100 may be placed on a conference table, that is, the sound pickup apparatus 100 is in a far-field sound pickup mode. The angle measuring device 130 may include at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
In some embodiments, the sound pickup apparatus 100 may further include a gain controller. The gain controller may be in communication with the control host 200 during operation, and send gain control data to the control host 200 to control the gain of the sound signal collected by the corresponding microphone 122, so as to control the output intensity of the sound signal collected by the corresponding microphone 122.
In some embodiments, the sound pickup apparatus 100 may further include a plurality of pointing devices. The plurality of indication devices may be in communication with the control host 200 during operation, and display the state of the corresponding microphone 122 and the output intensity of the collected sound signal.
In some embodiments, the sound pickup apparatus 100 may further include a communication interface. The sound pickup apparatus 100 may be communicatively connected to the control main unit 200 through the communication interface. The communication interface can be a wireless communication interface, a wired communication interface, a wireless communication interface and a wired communication interface. The wireless communication interface may be at least one of a bluetooth module, a WiFi module, and an NFC module. The wired communication interface may be at least one of a USB interface, a TypeC interface, and a UART interface.
The control host 200 may store data or instructions to perform the audio processing methods described herein and may execute or be used to execute the data and/or instructions. The control host 200 may be in communication with the sound pickup apparatus 100 during operation, and receive the plurality of microphone signals collected by the sound pickup apparatus 100, and perform signal processing on the plurality of microphone signals. The microphone signals acquired by the sound pickup apparatus 100 in different sound pickup modes are different, and the processing method adopted in the signal processing method performed on the microphone signals is also different. As described above, the sound pickup apparatus 100 can simultaneously operate the near-field sound pickup mode and the far-field sound pickup mode, and automatically pick up sound according to the scene usage requirement. After receiving the plurality of microphone signals, the control host 200 needs to identify the plurality of microphone signals to determine a target pattern of the plurality of microphone signals, and perform the signal processing on the plurality of microphone signals based on a target algorithm corresponding to the target pattern. That is, the control host 200 needs to identify whether the plurality of microphone signals are microphone signals collected in a near-field sound pickup mode or a far-field sound pickup mode. Specifically, the control host 200 needs to recognize whether a voice signal in the plurality of microphone signals is a voice signal collected in a near-field sound pickup mode or a voice signal collected in a far-field sound pickup mode. The target mode may be one of a near-field pickup mode and a far-field pickup mode. When a voice signal in the microphone signals is a signal collected by a near-field sound pickup pattern, the target pattern of the plurality of microphone signals is a near-field sound pickup pattern, and when the voice signal in the microphone signals is a signal collected by a far-field sound pickup pattern, the target pattern of the plurality of microphone signals is a far-field sound pickup pattern.
The target algorithm may include one of a first algorithm corresponding to the near-field pickup mode and a second algorithm corresponding to the far-field pickup mode. When the target mode of the microphone signals is a near-field pickup mode, the control host 200 may perform the signal processing on the microphone signals by using the first algorithm. When the target pattern of the plurality of microphone signals is a far-field pickup pattern, the control host 200 may perform the signal processing on the plurality of microphone signals by using the second algorithm.
In some embodiments, at least one microphone may be integrated with the control host 200 for collecting sound signals. Specifically, the control host 200 may receive multiple microphone signals collected by the sound pickup apparatus 100, and perform signal processing on the multiple microphone signals. That is, the control host 200 may perform echo cancellation and dereverberation processing on each signal and then identify the target pattern of the plurality of microphone signals. And performing signal processing such as beam selection, background noise elimination, wind noise reduction, signal fusion, gain control and the like on the microphone signal in the near-field pickup mode through the first algorithm. Processing sound source positioning, background noise elimination, signal fusion, gain control and the like on the microphone signal in the far-field pickup mode through the second algorithm; and finally, outputting the signal to a modulation and demodulation module.
The control host 200 may include a hardware device having a data information processing function and necessary programs for driving the hardware device to operate. Of course, the control host 200 may be only a hardware device having a data processing capability, or only a program running in a hardware device. The control host 200 may be the same device, may be different devices, may be a single device, or may be a system composed of a plurality of devices. The control host 200 may be any electronic device having a control function. For example, the control host 200 may be an intelligent control host running an operating system such as Android, iOS, and the like, such as an intelligent set top box in the related art, which is not limited in this specification.
Fig. 6 shows a schematic diagram of a device for controlling the host 200 according to an embodiment of the present disclosure. The control host 200 may perform the audio processing method described in this specification. The audio processing method is described elsewhere in this specification. For example, the audio processing method P200 is introduced in the description of fig. 7.
As shown in fig. 6, the control host 200 may include at least one storage medium 230 and at least one processor 220. In some embodiments, the control host 200 may also include a communication port 250 and an internal communication bus 210.
Internal communication bus 210 may connect the various system components including storage medium 230 and processor 220.
Storage medium 230 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage device may include one or more of a magnetic disk 232, a read only memory medium (ROM) 234, or a random access memory medium (RAM) 236. The storage medium 230 further includes at least one set of instructions stored in the data storage device. The instructions are computer program code that may include programs, routines, objects, components, data structures, procedures, modules, and the like that perform the audio processing methods provided herein.
The communication port 250 is used for controlling data communication between the host 200 and the outside. The communication port 250 may be a wireless communication port or a wired communication port. For example, the control host 200 may be connected to the sound pickup apparatus 100 through the wireless communication port to perform wireless communication, thereby performing data transmission with the sound pickup apparatus 100. The wireless communication port may be a WiFi port, a bluetooth port, an NFC port, or the like. For another example, the control host 200 may establish a wired connection with the sound pickup apparatus 100 through the wired communication port. The wired communication port may be a USB interface, a TypeC interface, and a UART interface, etc. The control host 200 may also be connected to one or more cameras via a communication port 250. The camera is used for image acquisition. When a plurality of cameras exist, the control host 200 may select a captured image of one camera from the plurality of cameras to perform conference playing, for example, the control host 200 may select an image captured by a camera capable of capturing a front image of a subject according to a real-time posture of the subject to be photographed to apply to conference playing, and when the posture of the subject to be photographed changes, the control host 200 may switch to an optimal camera in real time to perform image capturing and apply to conference playing.
The at least one processor 220 is communicatively coupled to at least one storage medium 230 via an internal communication bus 210. The at least one processor 220 is configured to execute the at least one instruction set. When the system 001 is operating, the at least one processor 220 reads the at least one instruction set and executes the audio processing method P200 provided herein according to the instructions of the at least one instruction set. The processor 220 may perform all the steps involved in the audio processing method P200. Processor 220 may be in the form of one or more processors, and in some embodiments, processor 220 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), central Processing Units (CPUs), graphics Processing Units (GPUs), physical Processing Units (PPUs), microcontroller units, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), advanced RISC Machines (ARM), programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 220 is depicted in the control host 200 in this description. However, it should be noted that the control host 200 may also include multiple processors, and thus, the operations and/or method steps disclosed in this specification may be performed by one processor as described in this specification, or may be performed by a combination of multiple processors. For example, if the processor 220 of the control host 200 performs steps a and B in this specification, it should be understood that steps a and B may also be performed by two different processors 220 in combination or separately (e.g., a first processor performs step a, a second processor performs step B, or both a first and second processor perform steps a and B together).
Fig. 7 shows a flowchart of an audio processing method P200 provided according to an embodiment of the present description. As described above, the control host 200 may perform the audio processing method P200 provided in the present specification. Specifically, the processor 220 in the control host 200 may read an instruction set stored in its local storage medium and then execute the audio processing method P200 provided in the present specification according to the specification of the instruction set. The method P200 may comprise performing, by the at least one processor 220 of the controlling host 200, the steps of:
s220: the plurality of microphone signals output by the sound pickup apparatus 100 are received.
In some embodiments, the method P200 may further include:
s240: and preprocessing each microphone signal.
The pre-processing includes at least one of echo cancellation, reverberation cancellation, and noise suppression. To improve the voice quality, the control host 200 may perform echo cancellation and/or reverberation cancellation on the plurality of microphone signals to reduce echo and room reverberation problems caused by the audio signals sent by the control host 200 being picked up by the sound pickup apparatus 100.
S260: determining the target pattern of the plurality of microphone signals.
As mentioned above, the control host 200 needs to select a corresponding target algorithm according to the target mode of the microphone signals to perform the signal processing on the microphone signals. Step S260 may include at least one of the following cases:
s262: determining the target mode based on a difference in intensity between the plurality of microphone signals.
S264: determining the target pattern based on a change in the measurement data.
When the sound pickup apparatus 100 does not include the angle measuring device 130, the control main body 200 may determine the target pattern of the plurality of microphone signals according to the intensity difference of the microphone signals between the plurality of microphones 122 in the microphone array 120. When the sound pickup apparatus 100 includes the angle measuring device 130, the control host 200 may determine whether the sound pickup apparatus 100 is in a fixed state or a moving state according to measurement data of the angle measuring device 130, where the target mode is the far-field sound pickup mode when the sound pickup apparatus 100 is in the fixed state, and the target mode is the near-field sound pickup mode when the sound pickup apparatus 100 is in the moving state. The control host 200 may also determine the target modes jointly according to the intensity differences between the microphone signals and the measurement data of the angle measuring device 130. The priority may be preset in the control master 200. The control host 200 may set the priority of step S262 higher than that of step S264, that is, when the recognition result of step S262 conflicts with the recognition result of step S264, the control host 200 takes the recognition result of step S262. The control master 200 may also set the priority of step S264 higher than that of step S262, i.e., when the recognition result of step S262 conflicts with the recognition result of step S264, the control master 200 takes the recognition result of step S264.
Fig. 8 shows a flowchart of a method of step S262 provided according to an embodiment of the present description. Specifically, step S262 may include:
s262-2: spectral characteristics of each of the plurality of microphone signals are determined.
As previously mentioned, the target pattern of the plurality of microphone signals may be the target pattern of a speech signal in the plurality of microphone signals. In step S262, determining the target mode according to the intensity difference between the plurality of microphone signals may be determining the target mode according to the intensity difference of the voice signals in the plurality of microphones. Therefore, the control unit 200 extracts a voice signal from each of the plurality of microphone signals. Specifically, the control host 200 may extract the voice signal according to the spectral characteristics of each microphone signal. In step S262-2, the control host 200 may acquire a spectrogram of each microphone signal, perform voice detection on each microphone signal according to the spectrogram, and determine a frequency and a corresponding intensity included in the voice signal of each microphone signal and a frequency and a corresponding intensity included in the non-voice signal.
S262-4: determining a reference intensity for each microphone signal based on the spectral characteristics of each microphone signal.
The reference strength of each microphone signal may include a reference strength of a speech signal in each microphone signal. In some embodiments, the reference strength of each microphone signal may further include a reference strength of a non-speech signal in each microphone signal. The reference intensity may be an average intensity of the microphone signal over a frequency interval. Specifically, the reference intensity may be an average intensity of the voice signal in the microphone signal in a frequency interval, or an average intensity of the non-voice signal in the frequency interval. The frequency interval may be the whole frequency band, or the low frequency region, or the high frequency region, or the medium frequency region.
As the distance between the user's mouth and the microphone 122 increases, the attenuation of the voice signal in the microphone signal received by the microphone 122 is greater in the high frequency region and smaller in the low frequency region. Therefore, when the microphone signals are in the near-field sound pickup mode, distances between the plurality of microphones 122 and the sound source are all small, and therefore, the intensity difference of the high-frequency region of the voice signals in the plurality of microphone signals is large. When the microphone signals are in the far-field pickup mode, distances between the microphones 122 and the sound source are large, and high-frequency regions of voice signals in the microphone signals are attenuated greatly, so that the intensity difference of the high-frequency regions of the voice signals in the microphone signals is small. Specifically, step S262-4 may include:
s262-5: and performing feature fusion calculation on a plurality of signal intensities corresponding to a plurality of frequencies in the high-frequency region of each microphone signal to acquire the reference intensity of the high-frequency region. I.e. the average strength of the speech signal in each of said microphone signals in the high frequency region is obtained. In particular, the feature fusion calculation may comprise a mean value calculation. The average calculation may be an arithmetic average, a geometric average, a root mean square average, a harmonic average, a weighted average, and the like.
S262-6: determining at least one intensity difference of the plurality of microphone signals with respect to each other based on the reference intensity of each of the microphone signals.
When the number of microphones in the microphone array 120 is 2, the number of intensity differences is 1. When the number of microphones in the microphone array 120 is 3, the number of intensity differences is 3. When the number of microphones in the microphone array 120 is 4, the number of intensity differences is 6, and so on.
S262-7: and comparing the at least one intensity difference with a preset difference threshold value to determine the target mode. Specifically, step S262-7 may include:
s262-8: determining that at least one of the at least one intensity difference exceeds the difference threshold, determining that the target mode is the near-field pickup mode; or
S262-9: determining that the at least one intensity difference is less than the difference threshold, and determining that the target pattern is the far-field pickup pattern.
When at least one value of the at least one intensity difference exceeds a preset difference threshold, it indicates that the intensity difference between at least two of the microphone signals is large, that is, the signal attenuation difference of the high frequency region of at least two of the microphone signals is large, that is, the distance difference between the two microphones 122 and the sound source is large relative to the distance between the two microphones 122, and thus, the target mode is the near-field sound pickup mode. When none of the at least one intensity difference exceeds a preset difference threshold, it indicates that the intensity differences among the microphone signals are all small, that is, the signal attenuations of the high-frequency regions of the microphone signals are close and are all large, and therefore, the target mode is the near-field pickup mode.
In some embodiments, the control host 200 may further divide the voice signal into a voiced signal and an unvoiced signal, and further compare the intensity difference of the voiced signal or the intensity difference of the unvoiced signal in the plurality of microphone signals, which is similar to the above steps and will not be described herein. In some embodiments, the control host 200 may also compare the intensity difference of the non-voice signals in the plurality of microphone signals, in a similar manner to the above steps, and this description is not repeated here.
Fig. 9 shows a flowchart of a method of step S264 provided according to an embodiment of the present specification. Step S264 may include:
s264-2: determining angle change data of the sound pickup apparatus based on the change of the measurement data.
The control host 200 can calculate the angle change data of the sound pickup apparatus 100 in the space, that is, the posture change data of the sound pickup apparatus 100 in the space according to the measurement data of the angle measurement device 130.
S264-4: and determining the target state of the pickup equipment based on the angle change data.
The control main unit 200 can determine the target state of the sound pickup apparatus 100, i.e., whether the sound pickup apparatus 100 is in a moving state or a fixed state, based on the angle change data. The target state includes one of a stationary state and a moving state.
S264-6: determining the target mode based on the target state. Specifically, step S264-6 may include:
s264-7: determining that the target pattern of the plurality of microphone signals is the far-field pickup pattern when the pickup apparatus 100 is in the stationary state; or alternatively
S264-8: when the sound pickup apparatus 100 is in the moving state, it is determined that the target pattern of the plurality of microphone signals is the near-field sound pickup pattern.
In some embodiments, the method P200 may further include:
s280: and performing the signal processing on the plurality of microphone signals based on a target algorithm corresponding to the target mode to acquire a target audio.
As described above, when the target mode is the near-field sound pickup mode, the control main unit 200 performs the signal processing on the microphone signals by using the first algorithm. Specifically, step S280 may include:
s282: performing the signal processing on the plurality of microphone signals based on the first algorithm to obtain a first target audio; or
S284 performs the signal processing on the plurality of microphone signals based on the second algorithm to obtain a second target audio.
Wherein the target audio comprises the first target audio or the second target audio.
Specifically, step S282 may include: determining a primary microphone signal and a secondary microphone signal of the plurality of microphone signals; performing noise reduction processing on the each microphone signal based on an intensity difference between the primary microphone signal and the secondary microphone signal; performing signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain a first fusion signal; and performing gain control on the first fusion signal to determine the first target audio.
That is, when the target mode of the microphone signals is the near-field sound pickup mode, the control host 200 may select a beam according to the signal strengths of the microphone signals, determine that the microphone signal with the strongest overall signal strength is a main microphone signal, the corresponding microphone 122 is a primary sound pickup microphone, the other microphone signals are secondary microphone signals, and the corresponding microphone 122 is a secondary sound pickup microphone; the controlling host may then utilize the difference in intensity between the primary and secondary microphone signals to perform noise reduction on each microphone signal, including background noise cancellation and wind noise reduction, among other things. The wind noise reduction algorithm mainly uses the frequency spectrum longitudinal gradient characteristic of a single microphone and the energy difference characteristic among multiple microphones for processing, and is not described again; then, the control host 200 may fuse the plurality of microphone signals after the noise reduction processing to obtain the first fusion signal, and perform gain control on the first fusion signal to prevent the voice quality from being affected by too large or too small signal energy.
Step S284 may include: determining the position of a target sound source based on a sound source positioning algorithm; performing noise reduction processing on each microphone signal; performing signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain a second fusion signal; and performing gain control on the second fusion signal to determine the second target audio.
That is, when the target pattern of the plurality of microphone signals is the far-field sound pickup pattern, the control host 200 may determine a target sound source position according to a sound source localization algorithm (which is not described herein); then, the control host 200 may perform noise reduction on each microphone signal, including background noise and interference sound in other directions; then, the control host 200 may perform signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain the second fusion signal, and perform gain control on the second fusion signal to prevent the voice quality from being affected by too large or too small signal energy.
In summary, the sound pickup apparatus 100, the sound pickup system 001, and the audio processing method P200 provided in the present specification raise the mounting angle of each single-directional microphone 122 in the microphone array 120 by a target angle a, so as to change the sound pickup direction of the microphone 122 to satisfy both the near-field sound pickup pattern and the far-field sound pickup pattern; and the control host 200 may autonomously identify the target pattern of the plurality of microphone signals output by the microphone array 120, that is, whether the plurality of microphone signals are collected in a near-field sound pickup mode or a far-field sound pickup mode, so as to select a target algorithm corresponding to the target pattern to perform the signal processing on the plurality of microphone signals, so as to improve the voice quality of the target audio output. The sound pickup apparatus 100, the sound pickup system 001, and the audio processing method P200 provided in this specification can improve the convenience of use of the sound pickup apparatus 100, expand the use scenarios of the sound pickup apparatus 100, reduce the difficulty of use for the user, and improve the voice quality of the output target audio signal.
Another aspect of the present description provides a non-transitory storage medium storing at least one set of executable instructions for audio processing, which when executed by a processor, direct the processor to perform the steps of the audio processing method P200 described herein. In some possible implementations, various aspects of the description may also be implemented in the form of a program product including program code. The program code is adapted to cause the control host 200 to perform the steps of the audio processing described in this specification when the program product is run on the control host 200. A program product for implementing the above method may employ a portable compact disc read only memory (CD-ROM) including program code and may be run on the control host 200. However, the program product of the present description is not so limited, and in this description, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system (e.g., the processor 220). The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for this specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the control host 200, partly on the control host 200, as a stand-alone software package, partly on the control host 200, partly on a remote computing device, or entirely on the remote computing device.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In conclusion, upon reading the present detailed disclosure, those skilled in the art will appreciate that the foregoing detailed disclosure can be presented by way of example only, and not limitation. Those skilled in the art will appreciate that the present specification is susceptible to various reasonable variations, improvements and modifications of the embodiments, even if not explicitly described herein. Such alterations, improvements, and modifications are intended to be suggested by this specification, and are within the spirit and scope of the exemplary embodiments of this specification.
Furthermore, certain terminology has been used in this specification to describe embodiments of the specification. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the specification.
It should be appreciated that in the foregoing description of embodiments of the specification, various features are grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the specification, for the purpose of aiding in the understanding of one feature. This is not to be taken as an admission that any of the features are required in combination, and it is fully possible for one skilled in the art to extract some of the features as separate embodiments when reading this specification. That is, embodiments in this specification may also be understood as an integration of a plurality of sub-embodiments. And each sub-embodiment described herein is equally applicable to less than all features of a single foregoing disclosed embodiment.
Each patent, patent application, publication of a patent application, and other material, such as articles, books, descriptions, publications, documents, articles, and the like, cited herein is hereby incorporated by reference. All matters hithertofore set forth herein except as related to any prosecution history, may be inconsistent or conflicting with this document or any prosecution history which may have a limiting effect on the broadest scope of the claims. Now or later associated with this document. For example, if there is any inconsistency or conflict in the description, definition, and/or use of terms associated with any of the included materials with respect to the terms, descriptions, definitions, and/or uses associated with this document, the terms in this document are used.
Finally, it should be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this description. Accordingly, the disclosed embodiments are to be considered in all respects as illustrative and not restrictive. Those skilled in the art may implement the applications in this specification in alternative configurations according to the embodiments in this specification. Therefore, the embodiments of the present description are not limited to the embodiments described precisely in the application.

Claims (25)

1. A pickup device, operable to communicate with a control host, comprising:
a housing; and
a microphone array installed on the housing, including a plurality of microphones distributed in a preset array shape, each of the plurality of microphones being a single directional microphone, the microphone array collecting sound signals in a corresponding pickup direction during operation to generate microphone signals, the plurality of microphones generating a plurality of microphone signals, the microphone array simultaneously supporting a near-field pickup mode and a far-field pickup mode to satisfy a near-field pickup requirement and a far-field pickup requirement,
wherein the control host is operative to receive the plurality of microphone signals, determine a target pattern of the plurality of microphone signals, and perform signal processing on the plurality of microphone signals based on a target algorithm corresponding to the target pattern, the target pattern including one of the near-field pickup pattern and the far-field pickup pattern, the target algorithm including one of a first algorithm corresponding to the near-field pickup pattern and a second algorithm corresponding to the far-field pickup pattern.
2. The pickup device of claim 1, wherein the housing includes a bottom surface, the pickup direction subtending a target angle in a direction away from the bottom surface, the target angle being an acute angle.
3. The sound pickup device as recited in claim 1, further comprising:
and the angle measuring device is in communication connection with the control host, and the control host receives the measurement data of the angle measuring device.
4. The pickup device of claim 3, wherein the angle measuring means includes at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
5. The tone pickup device of claim 3, wherein the determining the target pattern of the plurality of microphone signals includes at least one of:
determining the target pattern based on intensity differences among the plurality of microphone signals; and
determining the target pattern based on a change in the measurement data.
6. The tone pickup device of claim 5, wherein the determining the target pattern based on the change in the measurement data comprises:
determining angle change data of the sound pickup apparatus based on the change of the measurement data;
determining a target state of the pickup device based on the angle change data, the target state including one of a fixed state and a moving state, wherein the fixed state corresponds to the far-field pickup mode and the moving state corresponds to the near-field pickup mode; and
determining the target mode based on the target state.
7. A pickup system, comprising:
the pickup device of claim 1; and
the control host, including at least one processor, operatively coupled in communication with the pickup, receiving the plurality of microphone signals, determining the target pattern of the plurality of microphone signals, the target pattern including one of the near-field pickup pattern and the far-field pickup pattern, and performing the signal processing on the plurality of microphone signals based on the target algorithm corresponding to the target pattern, the target algorithm including one of the first algorithm corresponding to the near-field pickup pattern and the second algorithm corresponding to the far-field pickup pattern.
8. The pickup system of claim 7, wherein the control master is mounted on the housing, the communication connection comprising an electrical connection.
9. The pickup system as claimed in claim 7, wherein the pickup device is separated from the main control unit, and the pickup device is an external expansion device of the main control unit.
10. The pickup system of claim 7, wherein the housing includes a bottom surface, the pickup direction subtending a target angle in a direction away from the bottom surface, the target angle being an acute angle.
11. The pickup system of claim 7, wherein the pickup apparatus further comprises:
and the angle measuring device is in communication connection with the control host, and the control host receives the measurement data of the angle measuring device.
12. The pickup system of claim 11, wherein the angle measurement device comprises at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
13. An audio processing method for the pickup system of claim 7, the method comprising performing, by the control host:
receiving the plurality of microphone signals;
determining the target pattern of the plurality of microphone signals; and
and performing the signal processing on the plurality of microphone signals based on the target algorithm corresponding to the target mode to acquire target audio.
14. The audio processing method of claim 13, wherein the sound pickup apparatus further comprises:
and the angle measuring device is in communication connection with the control host, and the control host receives the measurement data of the angle measuring device.
15. The audio processing method of claim 14, wherein the angle measurement device comprises at least one of at least one acceleration sensor, at least one gyroscope, at least one optical sensor, at least one electromagnetic sensor, and a plurality of displacement sensors.
16. The audio processing method of claim 14, wherein the determining the target pattern of the plurality of microphone signals comprises at least one of:
determining the target pattern based on intensity differences among the plurality of microphone signals; and
determining the target pattern based on a change in the measurement data.
17. The audio processing method of claim 16, wherein the determining the target pattern based on the change in the measurement data comprises:
determining angle change data of the sound pickup apparatus based on the change of the measurement data;
determining a target state of the pickup device based on the angle change data, the target state including one of a fixed state and a moving state, wherein the fixed state corresponds to the far-field pickup mode and the moving state corresponds to the near-field pickup mode; and
determining the target mode based on the target state.
18. The audio processing method of claim 16, wherein the determining the target pattern based on intensity differences among the plurality of microphone signals comprises:
determining a spectral characteristic of each of the plurality of microphone signals;
determining a reference intensity of each of the microphone signals based on the spectral characteristics of each of the microphone signals;
determining at least one intensity difference of the plurality of microphone signals with respect to each other based on the reference intensity of each microphone signal; and
and comparing the at least one intensity difference with a preset difference threshold value to determine the target mode.
19. The audio processing method of claim 18, wherein said comparing the at least one intensity difference to a preset difference threshold to determine the target pattern comprises:
determining that at least one of the at least one intensity difference exceeds the difference threshold, determining that the target mode is the near-field pickup mode; or
Determining that the at least one intensity difference is less than the difference threshold, and determining that the target pattern is the far-field pickup pattern.
20. The audio processing method of claim 18, wherein the determining the reference strength for each microphone signal comprises:
and performing feature fusion calculation on a plurality of signal intensities corresponding to a plurality of frequencies in the high-frequency region of each microphone signal to acquire the reference intensity of the high-frequency region.
21. The audio processing method of claim 20, wherein the feature fusion calculation comprises an average calculation.
22. The audio processing method of claim 13, wherein, in the determining the target pattern of the plurality of microphone signals, further comprising:
each microphone signal is pre-processed, the pre-processing including at least one of echo cancellation and reverberation cancellation.
23. The audio processing method of claim 13, wherein the signal processing the plurality of microphone signals based on a target algorithm corresponding to the target pattern to obtain target audio comprises:
performing the signal processing on the plurality of microphone signals based on the first algorithm to obtain a first target audio; or
Performing the signal processing on the plurality of microphone signals based on the second algorithm to obtain a second target audio,
wherein the target audio comprises the first target audio or the second target audio.
24. The audio processing method of claim 23, wherein said performing said signal processing on said plurality of microphone signals based on said first algorithm to obtain a first target audio comprises:
determining a primary microphone signal and a secondary microphone signal of the plurality of microphone signals;
performing noise reduction processing on each microphone signal based on a difference in intensity between the primary microphone signal and the secondary microphone signal;
performing signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain a first fusion signal; and
and performing gain control on the first fusion signal to determine the first target audio.
25. The audio processing method of claim 23, wherein said performing said signal processing on said plurality of microphone signals based on said second algorithm to obtain a second target audio comprises:
determining the position of a target sound source based on a sound source positioning algorithm;
carrying out noise reduction processing on each microphone signal;
performing signal fusion on the plurality of microphone signals subjected to the noise reduction processing to obtain a second fusion signal; and
and performing gain control on the second fusion signal to determine the second target audio.
CN202110225053.3A 2021-03-01 2021-03-01 Sound pickup apparatus, sound pickup system, and audio processing method Active CN112995838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110225053.3A CN112995838B (en) 2021-03-01 2021-03-01 Sound pickup apparatus, sound pickup system, and audio processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110225053.3A CN112995838B (en) 2021-03-01 2021-03-01 Sound pickup apparatus, sound pickup system, and audio processing method

Publications (2)

Publication Number Publication Date
CN112995838A CN112995838A (en) 2021-06-18
CN112995838B true CN112995838B (en) 2022-10-25

Family

ID=76351561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110225053.3A Active CN112995838B (en) 2021-03-01 2021-03-01 Sound pickup apparatus, sound pickup system, and audio processing method

Country Status (1)

Country Link
CN (1) CN112995838B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI774490B (en) * 2021-07-28 2022-08-11 台灣立訊精密有限公司 Communication terminal, communication system and audio information processing method
CN113608167B (en) * 2021-10-09 2022-02-08 阿里巴巴达摩院(杭州)科技有限公司 Sound source positioning method, device and equipment
CN113938780A (en) * 2021-11-30 2022-01-14 联想(北京)有限公司 Sound pickup apparatus, electronic device, and control method
CN114827821B (en) * 2022-04-25 2024-06-11 世邦通信股份有限公司 Pickup control method and system, pickup device, and storage medium
CN115249476A (en) * 2022-07-15 2022-10-28 北京市燃气集团有限责任公司 Intelligent linkage gas cooker based on voice recognition and intelligent linkage method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251857A (en) * 2016-08-16 2016-12-21 青岛歌尔声学科技有限公司 Sounnd source direction judgment means, method and mike directivity regulation system, method
GB201709855D0 (en) * 2017-05-15 2017-08-02 Cirrus Logic Int Semiconductor Ltd Dual microphone voice processing for headsets with variable microphone array orientation
WO2019112468A1 (en) * 2017-12-08 2019-06-13 Huawei Technologies Co., Ltd. Multi-microphone noise reduction method, apparatus and terminal device
CN111935593A (en) * 2020-08-09 2020-11-13 天津讯飞极智科技有限公司 Recording pen and recording control method

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7123727B2 (en) * 2001-07-18 2006-10-17 Agere Systems Inc. Adaptive close-talking differential microphone array
JP4247002B2 (en) * 2003-01-22 2009-04-02 富士通株式会社 Speaker distance detection apparatus and method using microphone array, and voice input / output apparatus using the apparatus
US9392353B2 (en) * 2013-10-18 2016-07-12 Plantronics, Inc. Headset interview mode
CN206490770U (en) * 2016-11-30 2017-09-12 深圳市岚正科技有限公司 There is the set top box and set-top-box system of far field and near field voice identification simultaneously
GB201709851D0 (en) * 2017-06-20 2017-08-02 Nokia Technologies Oy Processing audio signals
US10580411B2 (en) * 2017-09-25 2020-03-03 Cirrus Logic, Inc. Talker change detection
US10531222B2 (en) * 2017-10-18 2020-01-07 Dolby Laboratories Licensing Corporation Active acoustics control for near- and far-field sounds
CN107680594B (en) * 2017-10-18 2023-12-15 宁波翼动通讯科技有限公司 Distributed intelligent voice acquisition and recognition system and acquisition and recognition method thereof
US10885907B2 (en) * 2018-02-14 2021-01-05 Cirrus Logic, Inc. Noise reduction system and method for audio device with multiple microphones
CN111048104B (en) * 2020-01-16 2022-11-29 北京声智科技有限公司 Speech enhancement processing method, device and storage medium
CN111341314B (en) * 2020-03-05 2024-02-23 北京声智科技有限公司 Speech recognition method and device
CN111640437A (en) * 2020-05-25 2020-09-08 中国科学院空间应用工程与技术中心 Voiceprint recognition method and system based on deep learning
CN111741404B (en) * 2020-07-24 2021-01-22 支付宝(杭州)信息技术有限公司 Sound pickup equipment, sound pickup system and sound signal acquisition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251857A (en) * 2016-08-16 2016-12-21 青岛歌尔声学科技有限公司 Sounnd source direction judgment means, method and mike directivity regulation system, method
GB201709855D0 (en) * 2017-05-15 2017-08-02 Cirrus Logic Int Semiconductor Ltd Dual microphone voice processing for headsets with variable microphone array orientation
WO2019112468A1 (en) * 2017-12-08 2019-06-13 Huawei Technologies Co., Ltd. Multi-microphone noise reduction method, apparatus and terminal device
CN111935593A (en) * 2020-08-09 2020-11-13 天津讯飞极智科技有限公司 Recording pen and recording control method

Also Published As

Publication number Publication date
CN112995838A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112995838B (en) Sound pickup apparatus, sound pickup system, and audio processing method
EP3480820B1 (en) Electronic device and method for processing audio signals
CN106653041B (en) Audio signal processing apparatus, method and electronic apparatus
CN105532017B (en) Device and method for Wave beam forming to obtain voice and noise signal
CN106782584B (en) Audio signal processing device, method and electronic device
WO2020108614A1 (en) Audio recognition method, and target audio positioning method, apparatus and device
CN106531179B (en) A kind of multi-channel speech enhancement method of the selective attention based on semantic priori
CN110970057B (en) Sound processing method, device and equipment
US9818403B2 (en) Speech recognition method and speech recognition device
US11856379B2 (en) Method, device and electronic device for controlling audio playback of multiple loudspeakers
US11435429B2 (en) Method and system of acoustic angle of arrival detection
CN111741404B (en) Sound pickup equipment, sound pickup system and sound signal acquisition method
JP7041157B6 (en) Audio capture using beamforming
US20130287224A1 (en) Noise suppression based on correlation of sound in a microphone array
CN110858488A (en) Voice activity detection method, device, equipment and storage medium
CN110085258A (en) A kind of method, system and readable storage medium storing program for executing improving far field phonetic recognization rate
CN111048104B (en) Speech enhancement processing method, device and storage medium
CN112233689B (en) Audio noise reduction method, device, equipment and medium
CN116158090A (en) Audio signal processing method and system for suppressing echo
CN112859000B (en) Sound source positioning method and device
WO2020064089A1 (en) Determining a room response of a desired source in a reverberant environment
CN108680902A (en) A kind of sonic location system based on multi-microphone array
CN114464184B (en) Method, apparatus and storage medium for speech recognition
CN114830686A (en) Improved localization of sound sources
Ba et al. Enhanced MVDR beamforming for arrays of directional microphones

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant