CN112639963A - Audio acquisition device, audio receiving device and audio processing method - Google Patents

Audio acquisition device, audio receiving device and audio processing method Download PDF

Info

Publication number
CN112639963A
CN112639963A CN202080004930.8A CN202080004930A CN112639963A CN 112639963 A CN112639963 A CN 112639963A CN 202080004930 A CN202080004930 A CN 202080004930A CN 112639963 A CN112639963 A CN 112639963A
Authority
CN
China
Prior art keywords
audio
audio data
control instruction
data
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080004930.8A
Other languages
Chinese (zh)
Inventor
边云锋
莫品西
薛政
刘洋
吴俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN112639963A publication Critical patent/CN112639963A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • H04N23/661Transmitting camera control signals through networks, e.g. control via the Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/667Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application provides an audio acquisition device, includes: a microphone, a processor and a wireless transceiver; the processor is used for carrying out instruction identification processing on the audio data collected by the microphone to obtain a control instruction; the wireless transceiver is also used for sending the audio data and the control instruction to an audio receiving device; the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode. The audio acquisition device that this application embodiment provided can gather clear audio data, improves speech recognition's accuracy. In addition, because the audio data are identified on the side of the audio acquisition device before the audio data are wirelessly transmitted, the accuracy of voice identification is further ensured.

Description

Audio acquisition device, audio receiving device and audio processing method
Technical Field
The embodiment of the application relates to the technical field of information processing, in particular to an audio acquisition device, an audio receiving device, an audio processing method and an audio acquisition system.
Background
Voice interaction is a common man-machine interaction mode. When voice interaction is carried out, a person can control the controlled equipment through voice, and therefore hands are liberated. However, in some scenarios, if the user is far away from the controlled device, the controlled device may not be able to accurately identify the command of the user due to the decrease of the signal-to-noise ratio of the audio data collected by the controlled device. For example, in a scene, the user has used from rapping bar centre gripping motion camera to shoot, when carrying out speech control to the motion camera, because increased control distance from the rapping bar, clear audio data that the motion camera was difficult to gather, therefore speech recognition's rate of accuracy also will greatly reduced.
Disclosure of Invention
In order to overcome the problems in the related art, embodiments of the present application provide an audio acquisition device, an audio receiving device, an audio processing method, and an audio acquisition system.
According to a first aspect of embodiments of the present application, there is provided an audio acquisition apparatus, including: a microphone, a processor and a wireless transceiver;
the processor is used for carrying out instruction identification processing on the audio data collected by the microphone to obtain a control instruction; the wireless transceiver is also used for sending the audio data and the control instruction to an audio receiving device;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a second aspect of embodiments of the present application, there is provided an audio acquisition apparatus, including: the processor is used for identifying and processing audio data collected by the microphone to obtain auxiliary identification information; the wireless transceiver is further used for transmitting the audio data and the auxiliary identification information to an audio receiving device;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a third aspect of embodiments of the present application, there is provided an audio receiving apparatus including: a wireless transceiver and a processor;
the processor is used for receiving the audio data and the control instruction sent by the audio acquisition device through the wireless transceiver; the control instruction is obtained by performing instruction identification processing on the acquired audio data by the audio acquisition device;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a fourth aspect of embodiments of the present application, there is provided an audio receiving apparatus including: a wireless transceiver and a processor;
the processor is used for receiving the audio data and the auxiliary identification information sent by the audio acquisition device through the wireless transceiver; the auxiliary identification information is obtained by identifying and processing the acquired audio data by the audio acquisition device;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a fifth aspect of the embodiments of the present application, there is provided an audio processing method applied to an audio acquisition apparatus, the method including:
carrying out instruction identification processing on the collected audio data to obtain a control instruction;
sending the audio data and the control instruction to an audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a sixth aspect of the embodiments of the present application, there is provided an audio processing method applied to an audio acquisition apparatus, the method including:
carrying out identification processing on the collected audio data to obtain auxiliary identification information;
transmitting the audio data and the auxiliary identification information to an audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a seventh aspect of the embodiments of the present application, there is provided an audio processing method applied to an audio receiving apparatus, the method including:
receiving audio data and a control instruction sent by an audio acquisition device through a wireless network;
the control instruction is obtained by performing instruction identification processing on the acquired audio data by the audio acquisition device, the audio data is used for one or more electronic devices to execute media processing, the control instruction is used for one or more electronic devices to execute control processing, and the electronic devices are the audio receiving device or other electronic devices in communication connection with the audio receiving device.
According to an eighth aspect of the embodiments of the present application, there is provided an audio processing method applied to an audio receiving apparatus, the method including:
receiving audio data and auxiliary identification information sent by an audio acquisition device through a wireless network;
the auxiliary identification information is obtained by identifying and processing the acquired audio data by the audio acquisition device; the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a ninth aspect of embodiments of the present application, there is provided an audio acquisition system comprising:
the audio acquisition device and the audio receiving device;
the audio acquisition device is used for carrying out instruction identification processing on the acquired audio data to obtain a control instruction; sending the audio data and the control instruction to the audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
According to a tenth aspect of embodiments of the present application, there is provided an audio acquisition system, comprising:
the audio acquisition device and the audio receiving device;
the audio acquisition device is used for identifying the acquired audio data to obtain auxiliary identification information; transmitting the audio data and the auxiliary identification information to the audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
the audio acquisition device that this application embodiment provided carries out communication connection through wireless network with audio frequency receiving arrangement. Because the audio acquisition device can be in the position that is very close to the user, therefore clear audio data can be gathered to audio acquisition device for speech recognition's accuracy improves greatly. In addition, considering that signals have certain loss in the wireless communication process, the stability of the wireless communication is not high enough, and in order to ensure that the voice command sent by the user can be accurately identified, the embodiment of the application identifies the audio data on the side of the audio acquisition device before the wireless transmission of the audio data, so that the accuracy of the voice identification is further ensured.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments consistent with the embodiments of the application and, together with the description, serve to explain the principles of the embodiments of the application.
Fig. 1 is a schematic structural diagram of a first audio capture device according to an exemplary embodiment of the present application.
Fig. 2a is a diagram illustrating an application scenario according to an exemplary embodiment of the present application.
Fig. 2b is a diagram of another application scenario illustrated in the present application according to an exemplary embodiment.
Fig. 2c is a diagram of yet another application scenario illustrated in the present application according to an exemplary embodiment.
Fig. 2d is a diagram of yet another application scenario illustrated in the present application according to an exemplary embodiment.
Fig. 3a is a flowchart illustrating the operation of a first audio capture device according to an exemplary embodiment of the present application.
Fig. 3b is a flowchart illustrating the operation of the first audio receiving apparatus according to an exemplary embodiment of the present application.
Fig. 3c is a schematic diagram illustrating division of labor between the first audio receiving device and the electronic device according to an exemplary embodiment of the present application.
Fig. 4a is a flowchart illustrating the operation of a second audio capture device according to an exemplary embodiment of the present application.
Fig. 4b is a flowchart illustrating the operation of a second audio receiving device according to an exemplary embodiment of the present application.
Fig. 5 is a schematic structural diagram of an audio acquisition system according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the examples of the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the embodiments of the application, as detailed in the appended claims.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Voice interaction is a common man-machine interaction mode. When voice interaction is carried out, a person can control the controlled equipment through voice, and therefore hands are liberated. However, in some scenarios, if the user is far away from the controlled device, the controlled device may not be able to accurately identify the command of the user due to the decrease of the signal-to-noise ratio of the audio data collected by the controlled device. For example, in a scene, the user has used from rapping bar centre gripping motion camera to shoot, when carrying out speech control to the motion camera, because increased control distance from the rapping bar, clear audio data that the motion camera was difficult to gather, therefore speech recognition's rate of accuracy also will greatly reduced.
In order to solve the above problem, an embodiment of the present application provides an audio acquisition device, where the audio acquisition device is provided with a wireless transceiver, and can perform wireless communication with an audio receiving device. In one implementation, the wireless transceiver may be a wireless network card, and in another implementation, the wireless transceiver may also be an integrated module, and of course, other hardware forms may also be available, which is not limited in this application.
Owing to have wireless transceiver, consequently audio acquisition device can with be controlled equipment remote communication, convenience of customers places or wears on one's body oneself to can gather the audio data that the SNR is high enough, clear, make speech recognition's rate of accuracy promote greatly.
It will be appreciated that the audio capture device may be presented in different product configurations in different scenarios. For example, in one scenario, the audio capture device may be a wireless microphone that is held by the user for voice input. In another scenario, the audio capture device may also be a smart speaker, placed by the user in close proximity to himself. In yet another scenario, the audio capture device may be a device that can be worn by the user, such as a wireless headset, a wearable speaker, and so forth.
In one implementation mode, the audio data collected by the audio collecting device can be wirelessly transmitted to the controlled device, and the controlled device performs command identification processing according to the collected audio data, so that a control command is obtained and response is performed. However, in the above implementation, the audio data needs to be wirelessly transmitted before being used for the command identification processing, and the signal has a certain loss during the wireless transmission, and the wireless transmission also needs to consider stability problems (such as packet loss), and the existence of these conditions may cause some defects in the audio data received by the controlled device, so that the controlled device still cannot accurately identify the control command during the command identification processing.
In view of the above problem, reference may be made to fig. 1, where fig. 1 is a schematic structural diagram of a first audio capture device according to an exemplary embodiment of the present application. The audio capture device 10 includes:
a microphone 101, a processor 102, and a wireless transceiver 103.
The processor 102 is configured to perform instruction identification processing on the audio data collected by the microphone 101 to obtain a control instruction, and is further configured to send the audio data and the control instruction to an audio receiving device through the wireless transceiver 103. Wherein the audio data is used for media processing.
It should be noted that the audio receiving device is any device that receives audio data and control commands, and a wireless transceiver is disposed therein, and a wireless connection channel can be established with the wireless transceiver of the audio collecting device. As to the hardware form of the audio receiving apparatus, in one implementation, the audio receiving apparatus itself may be an electronic device that needs to use audio data and respond to control instructions. For example, in one scenario, as shown in fig. 2a, the audio receiving device itself may be a motion camera, and the motion camera may wirelessly communicate with the audio capturing device 10, receive audio data and control instructions sent by the audio capturing device 10, and perform media processing using the received audio data, or may respond to the received control instructions.
In another implementation, the audio receiving device may be electrically connected to the electronic device, the audio data and the control command received by the audio receiving device are sent to the electronic device connected to the audio receiving device, and the electronic device performs media processing by using the audio data and executes an operation corresponding to the control command. For example, in one scenario, as shown in fig. 2b, the electronic device may be a motion camera, and the audio receiving apparatus may be an external plug-in of the motion camera.
It should also be noted that there may be a plurality of audio receiving devices, and correspondingly, there may also be a plurality of electronic apparatuses. For ease of understanding, reference may be made to fig. 2c, which shows a scenario comprising two audio receiving devices, the audio receiving devices themselves being electronic devices, the first audio receiving device being a pan-tilt head, the second audio receiving device being a motion camera. Thus, the two audio receiving devices can both receive the audio data and the control instruction sent by the audio collecting device 10, where the audio data can be used for the motion camera to perform media processing, and the control instruction can be for the motion camera or the pan-tilt. For example, in a more specific scenario, the user may issue such an instruction at the time of shooting: the holder rotates 45 degrees to the left, and the camera focuses on the face of a person. At this time, the audio acquisition device 10 may recognize a control command for controlling the pan/tilt head corresponding to the acquired audio data corresponding to the "pan/tilt head rotating 45 degrees to the left", and the pan/tilt head may respond correspondingly after receiving the control command, and for the acquired audio data corresponding to the "camera focusing face", the audio acquisition device 10 may recognize a control command for controlling the motion camera, and the motion camera may respond correspondingly after receiving the control command.
In the above scenario, the audio receiving device itself is an electronic device that needs to use audio data and respond to control commands. In another possible scenario, the audio receiving device may also be merely a repeater, which is electrically connected to the electronic device and wirelessly communicates with the audio capturing device 10. Referring to fig. 2d, fig. 2d shows a scene, which includes two audio receiving apparatuses and two electronic devices, where the two electronic devices are a motion camera and a pan/tilt head, respectively, and the motion camera is equipped with one audio receiving apparatus embodied in a plug-in form, and the pan/tilt head is also equipped with one audio receiving apparatus embodied in a plug-in form. For other descriptions of the scenario, reference may be made to the description related to the scenario shown in fig. 2c, and details are not repeated here.
Specifically, the audio acquisition device and the audio receiving device can be provided with independent power supply systems arranged on the devices.
The media processing of the audio data may specifically include audio editing and/or audio-video editing. The audio editing may be to clip the audio data, or to change the audio effect of the audio data, including but not limited to enhancement, color enhancement, sound change, and the like, or to perform noise reduction and filtering on the audio data, or to perform operations such as audio recording and audio broadcasting. There are also various audio and video editing methods, for example, one of them is to package audio data and shot video data to generate a video file, etc., and there are also other audio and video editing methods, which are not listed here.
For the electronic device, besides the motion camera and the pan-tilt head, the electronic device may also be a video camera, an unmanned aerial vehicle, an unmanned vehicle, or a robot. The electronic device may include one or more cameras having a function of taking images.
The audio frequency collection system that this application embodiment provided, because instruction identification handles and shifts to audio frequency collection system and go on, no longer by audio frequency receiving arrangement or the electronic equipment who is connected with audio frequency receiving arrangement, consequently audio data is being used for instruction identification to handle the time, does not have the process through wireless transmission to there is not the loss that wireless transmission brought, therefore instruction identification's accuracy can further improvement.
Although there are currently uncompressed transmission standards on wireless transmission, such as WHDI technology, uncompressed transmission places extremely high bandwidth requirements on wireless transmission, and therefore, to relieve the bandwidth pressure of wireless transmission, audio data may be encoded prior to transmission. In one embodiment, the processor of the audio capture device may encode the audio data prior to sending the audio data to the audio receiving device via the wireless transceiver.
Further, considering that the process of encoding the audio data is actually compressing the audio data to a certain degree, if the encoded audio data is used for instruction recognition, the recognition accuracy is also reduced to a certain degree. Therefore, in order to ensure the accuracy of the instruction recognition, a preferred embodiment is to perform the instruction recognition processing by using the audio data before encoding, that is, the instruction recognition processing is performed by using the original audio data collected by the microphone, so that the accuracy of the recognition can be ensured to be maintained at a high level.
There are various ways to implement instruction recognition processing, but the main way is to implement it by recognition model. In the embodiment of the present application, an alternative way of performing instruction recognition by using a speech recognition model is provided. Specifically, the method comprises the following steps:
and S1, intercepting the audio segment containing the voice in the audio data.
And S2, extracting the audio features of the audio clip.
And S3, inputting the extracted audio features into the specified voice recognition model, and recognizing the control command.
In step S1, an audio segment that may include speech needs to be detected first, and in a specific implementation, the detection may be performed by a speech activity detection algorithm, or may be performed by a sliding window.
It should be noted that the audio segment that may contain speech does not necessarily refer to only an audio segment that contains human speech, but may also be an audio segment that includes non-human speech. For example, the audio corresponding to the control command may be a three-tap sound in a short time, or may be a non-voice audio such as a two-tap clapping sound.
In step S2, it should be noted that there are many options for the extracted audio features. For example, the feature may be an MFCC (mel frequency cepstrum coefficient), an LPC feature, an Fbank feature, a bottleneck feature, etc., and those skilled in the art may select the feature according to actual needs.
In step S3, the specified speech recognition model is a model trained in advance. There are many choices for the speech recognition Model, such as GMM-HMM (Gaussian Mixed Model-Hidden Markov Model), DNN (Deep Neural Networks), LSTM (Long Short Term Memory Networks), CNN (Convolutional Neural Networks), and so on.
An example of training a speech recognition model is provided below. For example, the model may be selected from the GMM-HMM model described above, and the audio feature may be selected from the MFCC feature, then the training process may include the following steps.
And step A, using a plurality of instruction voices and negative samples as training data.
And B, performing MFCC feature extraction on the training data.
And C, training a GMM-HMM speech recognition model by using the extracted MFCC features to obtain the required GMM-HMM model. The HMM parameter estimation in the training process can adopt Baum-welch algorithm, the number of gaussians is several, and the GMM model training can use EM (Expectation Maximization) method.
The identified control command needs to be sent to the audio receiving apparatus together with the audio data. In one embodiment, the control instructions may be encapsulated with the audio data into data packets that are transmitted by the wireless transceiver to the audio receiving device. It should be noted that the protocol used in the wireless network transmission may be a public protocol or a private protocol, and is not limited to the wireless communication band of 2.4G or 1.9G.
In view of the fact that the control command and the audio data are directly encapsulated into a data packet, which requires a large transmission bandwidth to be sent, and the transmission of the control command is not real enough, in a preferred embodiment, the control command may be embedded into the audio data before the data packet is encapsulated, and then the audio data in which the control command is embedded is encapsulated, and the encapsulated data packet is sent to the audio receiving device. By adopting the implementation mode, the bandwidth required by wireless transmission can be reduced, and the real-time performance of control instruction transmission is also improved, so that the voice instruction sent by the user can be quickly responded.
Further, since the audio data needs to be used for media processing after being transmitted to the audio receiving apparatus. Therefore, when the control command is embedded in the audio data, the audio data itself should be affected as little as possible. For this reason, in a preferred embodiment, the control instructions may be converted into an audio digital watermark and then embedded into the audio data, so that the audio data is not affected.
There are many ways to convert the control command into the audio digital watermark, including amplitude modulation, phase modulation, transform frequency domain watermark, etc. The embodiment of the application provides an implementation manner, wherein a frequency modulation means is adopted to convert a control instruction into an audio digital watermark, that is, the control instruction is converted into the audio digital watermark with the frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside a human ear auditory frequency range. The specific implementation steps are as follows:
and step X, converting the control instruction c (t) into a binary control instruction cb with the length of M, wherein cb (i) is the bit value of the ith bit.
Figure BDA0002960797010000121
In step Y, in order not to affect the content of the audio data itself, a frequency range outside the auditory frequency range of human ears may be selected as the transmission frequency band of the control command, for example, a frequency range of 20kHz to 24kHz may be selected. In the specific conversion, the binary control command cb can be used for generating the audio digital watermark S (t) in the frequency range of 20kHz-24 kHz.
Figure BDA0002960797010000122
The audio digital watermark S (t) converted by the control command can be obtained through the steps.
Next, when embedding the audio digital watermark s (t) into the audio data x (t), high frequency signals in the audio data x (t) may be filtered, for example, the audio data x (t) may be passed through a low pass filter. In this way, the original signal in the frequency range of 20kHz to 24kHz in the audio data x (t) can be filtered out, so that the audio digital watermark s (t) is not distorted by being superimposed after being combined with the audio data x (t). The low pass filtered audio data x (t) may be combined with an audio digital watermark s (t). The above process is expressed mathematically as follows:
z(t)=S(t)+hpf(x(t))
where z (t) represents audio data in which a control instruction is embedded.
In one implementation, since the audio data needs to be encoded, the audio data embedded with the control command may be encoded audio data or audio data before encoding. Preferably, the audio data embedded by the control instruction is the audio data before encoding, so that after the audio data embedded by the control instruction is obtained, the audio data embedded by the control instruction can be encoded and compressed, and compared with directly compressing the audio data without the control instruction, the compressible space is larger, so that the bandwidth required by wireless transmission can be reduced.
To more intuitively embody the processing of the audio data by the audio capturing apparatus, reference may be made to fig. 3a, where fig. 3a is a flowchart illustrating the operation of the first audio capturing apparatus according to an exemplary embodiment of the present application. As shown in fig. 3a, the audio acquisition device converts the control command into an audio digital watermark and embeds the audio digital watermark into the audio data, and the audio data embedded with the control command is encoded and encapsulated into a data packet and sent to the audio receiving device.
It will be appreciated that the audio receiving means also needs to have corresponding processing for the audio capturing means side processing. For example, in one embodiment, the audio data is processed by the audio capture device as shown in fig. 3a, and then the workflow of the audio receiving device can be seen in fig. 3 b. In fig. 3b, the audio receiving device 20 includes a wireless transceiver 201 and a processor 202. The processor 202 may receive the data packet sent by the audio acquisition device through the wireless transceiver 201, decapsulate the data packet, and obtain encoded audio data embedded with the control instruction. Further, the processor 202 may decode the decapsulated audio data to obtain the audio data embedded with the control instruction. Finally, separating the audio data and the control command from the audio data embedded with the control command.
The separation operation is performed on the audio data embedded with the control instruction, and in the specific implementation, the control instruction is converted into the audio digital watermark of 20kHz to 24kHz as an example in the foregoing. Separating the audio data in which the control instructions are embedded may comprise the steps of:
step a), filtering signals with the frequency of more than 20kHz of the audio data embedded with the control command by a filter to obtain audio data x (t).
Step b), extracting the signal with the frequency of 20kHz-24kHz from the audio data z (t) embedded with the control command to obtain an audio digital watermark S (t). And analyzing the frequency domain information of the audio digital watermark S (t) to obtain a corresponding binary control instruction cb. The binary control command cb is transformed to obtain a control command c (t).
When the audio receiving device is used as a repeater and the audio data and the control instruction need to be provided for the electronic equipment connected with the audio receiving device to be processed, the operations of decapsulating the data packet, decoding the audio, extracting the watermark and the like can be flexibly distributed between the audio receiving device and the electronic equipment. For example, in one implementation, the audio receiving apparatus may be only used to directly transmit the received data packet to the electronic device connected to the audio receiving apparatus, and the electronic device performs the operations of decapsulating the data packet, decoding the audio, and the like. In another implementation, the audio receiving apparatus may also send the separated directly available control instruction and audio data to the electronic device after the decapsulation, audio decoding, watermark extraction, and the like of the data packet are all completed. However, during such a time division, two hardware links need to be configured between the audio receiving apparatus and the electronic device to transmit the audio data and the control command, respectively, which increases the cost of hardware.
Therefore, the embodiment of the present application provides a preferred implementation manner, and reference may be made to fig. 3c, where fig. 3c is a schematic diagram illustrating division of labor between an audio receiving apparatus and an electronic device according to an exemplary embodiment of the present application. In the embodiment shown in fig. 3c, unlike the steps executed by the audio receiving apparatus in fig. 3b, in fig. 3c, after the audio receiving apparatus 20 decodes the decapsulated audio data to obtain the audio data embedded with the control command, the audio data embedded with the control command may be transmitted to the electronic device 30, and the electronic device 30 may perform the separation operation of the audio data embedded with the control command. In this way, only one hardware link for transmitting the audio data embedded with the control command needs to be configured between the audio receiving apparatus 20 and the electronic device 30, which can save one hardware link and reduce cost compared to the aforementioned separation of the audio data and the control command at the audio receiving apparatus 20 side.
Further, the audio acquisition device can also be provided with a control sensor. The control sensor can be a pressed key or a touch sensing module, and can generate a corresponding control instruction according to the triggering of a user. The setting of control sensor provides more diversified control mode for the user, and in a scene, the user wants the motion camera of control at a distance to press the shutter, except can carrying out speech control through audio acquisition device, can also control through the button on the touch control sensor, and is very convenient.
The control command generated by the control sensor can also be packaged into a data packet together with the audio data and sent to the audio receiving device. Of course, the control command generated by the control sensor may also be embedded in the audio data, or be converted into an audio digital watermark and then embedded in the audio data. The content of the part may refer to the related content of the control command recognized by the voice, and is not described herein again.
The audio segment (hereinafter referred to as target audio segment) in which the control instruction can be recognized may be further processed in consideration of some special needs of the user. The processing of the target audio piece may include audio effect processing such as muting, enhancing, changing, and so on. For example, under a possible scene, electronic equipment is unmanned aerial vehicle, and unmanned aerial vehicle is shooing ground, and the user dubs the video of unmanned aerial vehicle shooting at the audio acquisition device that utilizes this application embodiment to provide. At this time, if the user feels that the angle of the unmanned aerial vehicle needs to be changed, and sends a voice command of "turn the lens 5 degrees upwards", the voice command will be recorded in the shot video, which is obviously not the voice that the user wishes to record. Corresponding to the scene, the audio acquisition device can mute the target audio segment after determining the target audio segment, so as to eliminate the voice command which is not desired to be input by the user.
In another possible scenario, the target audio segment may be further enhanced or vocalized to highlight the voice command. Highlighting the voice command, in one implementation, may be for the electronic device to perform a secondary recognition to more accurately recognize the voice command; in yet another implementation, the voice command may be eliminated for later editing of the video. Of course, for the sound change processing, there is also a possibility to increase the interest of the video.
The above is a detailed description of the first audio acquisition device provided in the embodiments of the present application. The first audio acquisition device that this application embodiment provided carries out communication connection through wireless network with audio frequency receiving arrangement. Because the audio acquisition device can be in the position that is very close to the user, therefore clear audio data can be gathered to audio acquisition device for speech recognition's accuracy improves greatly. In addition, considering that signals have certain loss in the wireless communication process, the stability of the wireless communication is not high enough, and in order to ensure that the voice command sent by the user can be accurately identified, the embodiment of the application identifies the audio data on the side of the audio acquisition device before the wireless transmission of the audio data, so that the accuracy of the voice identification is further ensured.
As can be seen from the foregoing, if the audio receiving apparatus performs the command identification processing, since the audio data needs to be wirelessly transmitted before being used for the command identification processing, and there is a certain loss in the wireless transmission process of the signal and the wireless transmission also has a stability problem (such as packet loss), the audio data received by the audio receiving apparatus may have some defects, so that the control command may still not be accurately identified when the audio receiving apparatus performs the command identification processing.
In order to solve the problem, the embodiment of the present application provides a second audio capture device. The audio acquisition device also includes a microphone, a processor, and a wireless transceiver. The difference from the first audio acquisition device provided by the embodiment of the present application is that the processor performs a preliminary identification process on the audio data acquired by the microphone, and the preliminary identification process obtains auxiliary identification information instead of a control instruction.
The recognized assistant recognition information and the audio data may be transmitted to the audio receiving apparatus. Wherein the audio data is still used for media processing, but the secondary identification information is used for secondary identification of the secondary electronic device. Specifically, the electronic device may perform secondary identification on the received audio data according to the auxiliary identification information, so as to obtain the control instruction through identification. The electronic device may be the audio receiving apparatus or another electronic device communicatively connected to the audio receiving apparatus, that is, the audio receiving apparatus itself may be the electronic device, and it may also be a relay to connect the electronic device and the audio capturing apparatus, and this part of the contents may refer to the related description of the first audio capturing apparatus provided in this embodiment of the application.
Since the auxiliary identification information is obtained by performing identification processing on the audio data before wireless transmission, the identification accuracy of the auxiliary identification information is ensured. When the electronic device performs secondary recognition, although the command recognition processing is performed on the audio data after wireless transmission, the auxiliary recognition information can be used in the recognition process, so that the recognition accuracy is improved. In addition, the audio acquisition device only needs to recognize the auxiliary identification information, so that the required calculation force can be reduced compared with the direct recognition of the control instruction.
The auxiliary identification information is information for assisting the electronic device in performing secondary identification. Specifically, the auxiliary identification information may be one or more of the following information: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
For the above-mentioned segment identification information, it can identify the audio segment corresponding to the control instruction in the audio data, so that when the electronic device performs secondary identification, the electronic device can determine the audio segment corresponding to the control instruction according to the segment identification information, thereby reducing the probability of control instruction omission. For the type of the audio data corresponding to the control instruction, in an embodiment, it may indicate that the audio data corresponding to the control instruction is a voice type (such as a human voice language) or a non-voice type (such as a clapping, a tapping, or the like); in another embodiment, it may also indicate that the audio data corresponding to the control instruction is in a language of different countries such as chinese, english, or japanese. The type of the audio data is indicated, so that the electronic equipment can be helped to extract audio features more accurately during secondary recognition, and therefore instruction recognition is more accurate. For the control content information corresponding to the control command, i.e. the specific content corresponding to the control command, for example, the control content information may be "adjust the focal length of the lens to 50 mm" or "switch to the anti-shake mode", etc. The control content information can help the electronic equipment to compare and correct during secondary identification, so that wrong control instructions are prevented from being identified.
As in the first audio acquisition apparatus provided in the embodiments of the present application, in terms of hardware form of the receiving side, one or more electronic devices may be provided, and one or more audio receiving apparatuses may also be provided.
For the processing of the auxiliary identification information at the audio acquisition device side, reference may also be made to the related processing of the control instruction in the first audio acquisition device provided in the embodiment of the present application, for example, the auxiliary identification information may be encapsulated into a data packet with the audio data, may also be converted into audio digital watermark embedded into the audio data, and so on. Referring to fig. 4a and 4b, fig. 4a is a flowchart illustrating an operation of a second audio capturing apparatus according to an exemplary embodiment of the present application, and fig. 4b is a flowchart illustrating an operation of a second audio receiving apparatus according to an exemplary embodiment of the present application.
It should be noted that, in the second audio acquisition apparatus provided in the embodiment of the present application, the processing performed by the processor on the audio segment corresponding to the control instruction includes, but is not limited to, one or more of the following: enhancing, reducing noise and moistening color. The processing of the audio clip can make the voice instruction in the audio data more prominent, so that the electronic equipment can be more accurate in secondary recognition.
For other contents of the second audio acquisition device provided in the embodiment of the present application, reference may be made to corresponding descriptions in the first audio acquisition device provided in the embodiment of the present application, and details are not repeated herein.
The second audio acquisition device provided by the embodiment of the application utilizes audio data before wireless transmission to perform identification processing to obtain auxiliary identification information. After the auxiliary identification information is sent to the audio receiving device side, the electronic equipment can be assisted to carry out secondary identification on the audio data, and therefore the accuracy of secondary identification of the electronic equipment is improved. In addition, compared with the first audio acquisition device, the second audio acquisition device only needs to identify the auxiliary identification information and does not need to identify the control command, so that the required calculation force is less, and the realization is easier.
Corresponding to the first audio acquisition device provided in the foregoing embodiment of the present application, the embodiment of the present application further provides a first audio receiving device. The audio receiving apparatus includes:
the method comprises the following steps: a wireless transceiver and a processor;
the processor is used for receiving the audio data and the control instruction sent by the audio acquisition device through the wireless transceiver; the control instruction is obtained by performing instruction identification processing on the acquired audio data by the audio acquisition device;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, the processor is further configured to decode the received audio data.
In an optional embodiment, the control instruction is obtained by performing instruction recognition processing on the audio data before encoding by the audio acquisition device.
In an optional embodiment, the processor is further configured to decapsulate the data packet received by the wireless transceiver to obtain the audio data and the control instruction.
In an alternative embodiment, the decapsulating of the data packet results in audio data in which the control instruction is embedded.
In an optional embodiment, the processor is further configured to separate the audio data embedded with the control instruction, so as to obtain an audio digital watermark and audio data converted by the control instruction.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an alternative embodiment, the audio data obtained by separating the audio data embedded with the control instruction is pre-coding audio data.
In an optional embodiment, the control instruction is obtained by intercepting an audio segment containing voice in audio data by the audio acquisition device, extracting audio features of the audio segment, and inputting the audio features into a specified voice recognition model.
In an alternative embodiment, the received control instruction further comprises another control instruction generated by a control sensor of the audio capture device in response to a user trigger.
In an optional embodiment, a target audio segment in the received audio data is processed by the audio acquisition device, and the target audio segment is an audio segment corresponding to the control instruction.
In an alternative embodiment, the target audio segment is processed to include one or more of the following: silencing, strengthening and changing sound.
In an optional embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the processor is further configured to send the received audio data and the control instruction to the electronic device.
In an alternative embodiment, the electronic device is the audio receiving apparatus;
the processor is further configured to perform media processing using the audio data, and perform an operation corresponding to the control instruction.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
It can be understood that, the specific implementation of the functions of the first audio receiving apparatus provided in the foregoing embodiments of the present application has been described in the foregoing text, and is not described herein again.
Corresponding to the second audio acquisition device provided in the foregoing embodiment of the present application, a second audio receiving device is also provided in the embodiment of the present application. The audio receiving apparatus includes:
the method comprises the following steps: a wireless transceiver and a processor;
the processor is used for receiving the audio data and the auxiliary identification information sent by the audio acquisition device through the wireless transceiver; the auxiliary identification information is obtained by identifying and processing the acquired audio data by the audio acquisition device;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, the auxiliary identification information includes one or more of the following information: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
In an alternative embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, the processor is further configured to decode the received audio data.
In an optional embodiment, the auxiliary identification information is obtained by performing identification processing on the audio data before encoding by the audio acquisition device.
In an optional embodiment, the processor is further configured to decapsulate the data packet received through the wireless transceiver to obtain the audio data and the assistant identification information.
In an alternative embodiment, the decapsulating of the data packet results in audio data with the embedded ancillary identification information.
In an optional embodiment, the processor is further configured to separate the audio data embedded with the auxiliary identification information, and obtain an audio digital watermark and audio data converted by the auxiliary identification information.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an alternative embodiment, the audio data obtained by separating the audio data embedded with the assistant identification information is pre-encoding audio data.
In an optional embodiment, the processor is further configured to receive, through the wireless transceiver, a control instruction sent by the audio capture device; the received control instruction is another control instruction generated by a control sensor of the audio acquisition device in response to a trigger of a user.
In an optional embodiment, a target audio segment in the received audio data is processed by the audio acquisition device, and the target audio segment is an audio segment corresponding to the control instruction.
In an alternative embodiment, the target audio segment is processed to include one or more of the following: enhancing, reducing noise and moistening color.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the processor is further configured to send the received audio data and the auxiliary identification information to the electronic device.
In an alternative embodiment, the electronic device is the audio receiving apparatus;
the processor is further configured to perform media processing using the audio data, and identify a control instruction from the audio data based on the ancillary identification information.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
It can be understood that the specific implementation of the functions of the second audio receiving apparatus provided in the foregoing embodiments of the present application has been described in the foregoing text, and is not described herein again.
It should be noted that, in practical implementation, the functions performed by the processor may be distributed to a plurality of different modules. For example, in one embodiment, referring to fig. 5, fig. 5 shows an audio capture system implemented by a module, which includes an audio capture device 10, an audio receiving device 20, and an electronic device 30. In the embodiment shown in fig. 5, on the audio pickup apparatus 10 side, a microphone picks up audio data, and the picked-up audio data is supplied to the instruction identifying module. The instruction identification module identifies the control instruction and provides the control instruction to the watermark embedding module. In the watermark embedding module, the control instruction is converted into audio digital watermark embedded audio data. The audio data embedded with the control instruction is encoded through an audio encoding module, encapsulated into a data packet through a data encapsulation module and sent to the opposite terminal through a wireless transceiver.
At the opposite end, the audio receiving apparatus 20 receives the data packet through the wireless transceiver, decapsulates the data packet through the data decapsulation module, and sends the audio data embedded with the control instruction obtained by the decapsulation to the audio decoding module for decoding. After decoding, the audio receiving apparatus 20 sends the decoded audio data embedded with the control instruction to the electronic device 30 through the hardware link, and the watermark extraction module of the electronic device 30 performs operations of separating and converting the audio data embedded with the control instruction, so as to obtain the audio data and the control instruction.
The specific implementation of the above process has been described in detail in the foregoing, and is not described herein again.
It is understood that fig. 5 is only an alternative embodiment, and in practical implementation, other modules may be used to implement the technical solution of the present application. However, whatever module is actually used for implementation, the module is substantially the same as the processor described in the embodiment of the present application, in other words, the processor described in the embodiment of the present application may actually refer to various modules for performing corresponding functions.
Corresponding to the first audio acquisition device provided in the embodiments of the present application, a first audio processing method is further provided in the embodiments of the present application, applied to an audio acquisition device, and the method includes:
carrying out instruction identification processing on the collected audio data to obtain a control instruction;
sending the audio data and the control instruction to an audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, before sending the audio data to the audio receiving apparatus, the method further comprises:
encoding the audio data.
In an alternative embodiment, the audio data subjected to the instruction recognition processing is audio data before encoding.
In an optional embodiment, sending the audio data and the control instruction to an audio receiving apparatus includes:
and encapsulating the audio data and the control command into a data packet and sending the data packet to the audio receiving device.
In an optional embodiment, encapsulating the audio data and the control instruction into a data packet includes:
embedding the control instruction into the audio data;
and encapsulating the audio data embedded with the control instruction into a data packet.
In an optional embodiment, before embedding the control instruction in the audio data, the method further comprises:
and converting the control instruction into an audio digital watermark.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an alternative embodiment, the audio data embedded by the control instruction is the audio data before encoding.
In an alternative embodiment, the instruction recognition processing is performed on the collected audio data, and includes:
intercepting an audio clip containing voice in the audio data;
extracting audio features of the audio segments;
and inputting the audio features into a specified voice recognition model, and recognizing the control command.
In an alternative embodiment, the control instruction further comprises another control instruction generated in response to a user trigger.
In an optional embodiment, further comprising:
and processing the target audio clip with the control instruction identified.
In an alternative embodiment, the processing of the target audio segment includes one or more of: silencing, strengthening and changing sound.
In an optional embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the audio data is used for the audio receiving device to send the audio data to the electronic equipment to execute media processing;
the control instruction is used for the audio receiving device to send the control instruction to the electronic equipment so as to execute control processing.
In an alternative embodiment, the electronic device is the audio receiving apparatus;
the audio data is used for the audio receiving device to execute media processing;
the control instruction is used for the audio receiving device to execute control processing.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
It can be understood that, the specific implementation of the functions of the first audio processing method provided in the foregoing embodiments of the present application has been described in the foregoing text, and is not described herein again.
Corresponding to the second audio acquisition device provided in the embodiment of the present application, a second audio processing method is further provided in the embodiment of the present application, and is applied to an audio acquisition device, where the method includes:
carrying out identification processing on the collected audio data to obtain auxiliary identification information;
transmitting the audio data and the auxiliary identification information to an audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, the auxiliary identification information includes one or more of the following information: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
In an alternative embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, before sending the audio data to the audio receiving apparatus, the method further comprises:
encoding the audio data.
In an alternative embodiment, the audio data subjected to the identification process is audio data before encoding.
In an alternative embodiment, the transmitting the audio data and the assistant identification information to an audio receiving apparatus includes:
and encapsulating the audio data and the auxiliary identification information into a data packet and sending the data packet to the audio receiving device.
In an optional embodiment, encapsulating the audio data and the auxiliary identification information into a data packet includes:
embedding the ancillary identification information into the audio data;
and encapsulating the audio data embedded with the auxiliary identification information into a data packet.
In an optional embodiment, before embedding the auxiliary identification information in the audio data, the method further comprises:
and converting the auxiliary identification information into an audio digital watermark.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an alternative embodiment, the audio data embedded with the auxiliary identification information is the audio data before encoding.
In an alternative embodiment, the control instruction further comprises another control instruction generated in response to a user trigger.
In an optional embodiment, further comprising:
and processing the target audio clip corresponding to the control instruction.
In an optional embodiment, the processing of the audio segment corresponding to the control instruction includes one or more of the following: enhancing, reducing noise and moistening color.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the audio data is used for the audio receiving device to send the audio data to the electronic equipment to execute media processing;
the auxiliary identification information is used for the audio receiving device to send the auxiliary identification information to the electronic equipment so as to identify a control instruction from the audio data according to the auxiliary identification information.
In an alternative embodiment, the electronic device is the audio receiving apparatus;
the audio data is used for the audio receiving device to execute media processing;
the auxiliary identification information is used for the audio receiving device to identify the control instruction from the audio data according to the auxiliary identification information.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
It can be understood that, the specific implementation of the function of the second audio processing method provided in the foregoing embodiment of the present application has been described in the foregoing text, and is not described herein again.
Corresponding to the first audio receiving apparatus provided in the foregoing embodiments of the present application, a third audio processing method is further provided in the embodiments of the present application, applied to an audio receiving apparatus, and includes:
receiving audio data and a control instruction sent by an audio acquisition device through a wireless network;
the control instruction is obtained by performing instruction identification processing on the acquired audio data by the audio acquisition device, the audio data is used for one or more electronic devices to execute media processing, the control instruction is used for one or more electronic devices to execute control processing, and the electronic devices are the audio receiving device or other electronic devices in communication connection with the audio receiving device.
In an optional embodiment, after receiving the audio data, the method further comprises:
decoding the received audio data.
In an optional embodiment, the control instruction is obtained by performing instruction recognition processing on the audio data before encoding by the audio acquisition device.
In an optional embodiment, the receiving, by the wireless network, the audio data and the control instruction sent by the audio acquisition device includes:
and receiving a data packet through a wireless network, and decapsulating the data packet to obtain the audio data and the control instruction.
In an optional embodiment, the decapsulating the data packet to obtain the audio data and the control instruction includes:
decapsulating the data packet to obtain audio data embedded with the control instruction;
and separating the audio data embedded with the control instruction to obtain the audio data and the control instruction.
In an optional embodiment, the separating the audio data embedded with the control instruction to obtain the audio data and the control instruction includes:
and separating the audio data embedded with the control instruction to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the control instruction.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an alternative embodiment, the audio data obtained by separating the audio data embedded with the control instruction is pre-coding audio data.
In an optional embodiment, the control instruction is obtained by intercepting an audio segment containing voice in audio data by the audio acquisition device, extracting audio features of the audio segment, and inputting the audio features into a specified voice recognition model.
In an alternative embodiment, the received control instruction further comprises another control instruction generated by a control sensor of the audio capture device in response to a user trigger.
In an optional embodiment, a target audio segment in the received audio data is processed by the audio acquisition device, and the target audio segment is an audio segment corresponding to the control instruction.
In an alternative embodiment, the target audio segment is processed to include one or more of the following: silencing, strengthening and changing sound.
In an optional embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected; the method further comprises the following steps:
and sending the received audio data and the control instruction to the electronic equipment.
In an alternative embodiment, the electronic device is the audio receiving apparatus; the method further comprises the following steps:
and executing media processing by using the audio data, and executing the operation corresponding to the control instruction.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
It can be understood that the functional specific implementation of the third audio processing method provided in the foregoing embodiments of the present application has been described in the foregoing text, and is not described herein again.
Corresponding to the second audio receiving apparatus provided in the foregoing embodiments of the present application, a fourth audio processing method is further provided in the embodiments of the present application, applied to an audio receiving apparatus, and includes:
receiving audio data and auxiliary identification information sent by an audio acquisition device through a wireless network;
the auxiliary identification information is obtained by identifying and processing the acquired audio data by the audio acquisition device; the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, the auxiliary identification information includes one or more of the following information: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
In an alternative embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, after receiving the audio data, the method further comprises:
decoding the received audio data.
In an optional embodiment, the auxiliary identification information is obtained by performing identification processing on the audio data before encoding by the audio acquisition device.
In an optional embodiment, the receiving, by the wireless network, the audio data and the auxiliary identification information sent by the audio acquisition device includes:
and receiving a data packet through a wireless network, and decapsulating the data packet to obtain the audio data and the auxiliary identification information.
In an optional embodiment, the decapsulating the data packet to obtain the audio data and the auxiliary identification information includes:
decapsulating the data packet to obtain audio data in which the auxiliary identification information is embedded;
and separating the audio data embedded with the auxiliary identification information to obtain the audio data and the auxiliary identification information.
In an optional embodiment, the separating the audio data embedded with the assistant identification information to obtain the audio data and the assistant identification information includes:
and separating the audio data embedded with the auxiliary identification information to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the auxiliary identification information.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an alternative embodiment, the audio data obtained by separating the audio data embedded with the assistant identification information is pre-encoding audio data.
In an optional embodiment, the method further comprises:
receiving a control instruction sent by the audio acquisition device through a wireless network; the received control instruction is another control instruction generated by a control sensor of the audio acquisition device in response to a trigger of a user.
In an optional embodiment, a target audio segment in the received audio data is processed by the audio acquisition device, and the target audio segment is an audio segment corresponding to the control instruction.
In an alternative embodiment, the target audio segment is processed to include one or more of the following: enhancing, reducing noise and moistening color.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected; the method further comprises the following steps:
and sending the received audio data and the auxiliary identification information to the electronic equipment.
In an alternative embodiment, the electronic device is the audio receiving apparatus; the method further comprises the following steps:
and executing media processing by using the audio data, and identifying a control instruction from the audio data according to the auxiliary identification information.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
It can be understood that, the functional specific implementation of the fourth audio processing method provided in the foregoing embodiment of the present application has been described in the foregoing text, and is not described herein again.
The embodiment of the present application further provides a first audio acquisition system, including:
the audio acquisition device and the audio receiving device;
the audio acquisition device is used for carrying out instruction identification processing on the acquired audio data to obtain a control instruction; transmitting the audio data and the control instruction to the audio receiving apparatus 20 through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, the audio acquisition device is configured to encode the acquired audio data and send the encoded audio data to the audio receiving device;
the audio receiving device is used for decoding the received audio data after receiving the audio data.
In an optional embodiment, the audio data that the audio acquisition device performs the instruction recognition processing is the audio data before encoding.
In an optional embodiment, the audio acquisition device is configured to encapsulate the audio data and the control instruction into a data packet, and send the data packet to the audio receiving device;
and the audio receiving device is used for de-encapsulating the data packet after receiving the data packet to obtain the audio data and the control instruction.
In an optional embodiment, the audio acquisition device is configured to embed the control instruction into the audio data, encapsulate the audio data embedded with the control instruction into a data packet, and send the data packet to the audio receiving device;
and the audio receiving device is used for de-encapsulating the data packet after receiving the data packet, separating the audio data embedded with the control instruction and obtained by de-encapsulation, and obtaining the audio data and the control instruction.
In an optional embodiment, the audio acquisition device is configured to convert the control instruction into an audio digital watermark to be embedded in the audio data, and encapsulate the audio data embedded with the control instruction into a data packet and send the data packet to the audio receiving device;
and the audio receiving device is used for decapsulating the data packet after receiving the data packet, separating the audio data embedded with the control instruction obtained by decapsulation to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the control instruction.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an optional embodiment, the audio acquisition device embeds the control instruction into the audio data before encoding.
In an optional embodiment, the audio acquisition device performs a manner of instruction recognition processing on the acquired audio data, including:
intercepting an audio clip containing voice in the audio data;
extracting audio features of the audio segments;
and inputting the audio features into a specified voice recognition model, and recognizing the control command.
In an alternative embodiment, the control instruction further comprises another control instruction generated by a control sensor of the audio capture device in response to a user trigger.
In an optional embodiment, the audio capture device is further configured to process the target audio segment in which the control instruction is identified.
In an optional embodiment, the processing of the target audio segment by the audio capture device includes one or more of: silencing, strengthening and changing sound.
In an optional embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the audio receiving device is further configured to send the received audio data and the control instruction to the electronic device.
In an alternative embodiment, the electronic device is the audio receiving apparatus;
the audio receiving device is further configured to execute media processing by using the audio data, and execute an operation corresponding to the control instruction.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
It can be understood that, the specific implementation of the function of the first audio acquisition system provided in the foregoing embodiments of the present application has been described in the foregoing text, and is not described herein again.
The embodiment of the present application further provides a second audio acquisition system, including:
the audio acquisition device and the audio receiving device;
the audio acquisition device is used for identifying the acquired audio data to obtain auxiliary identification information; transmitting the audio data and the auxiliary identification information to the audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
In an optional embodiment, the auxiliary identification information includes one or more of the following information: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
In an alternative embodiment, the type of the audio data corresponding to the control instruction includes: speech type and/or non-speech type.
In an optional embodiment, the audio acquisition device is configured to encode the acquired audio data and send the encoded audio data to the audio receiving device;
the audio receiving device is used for decoding the received audio data after receiving the audio data.
In an optional embodiment, the audio data that is identified and processed by the audio acquisition device is the audio data before encoding.
In an optional embodiment, the audio acquisition device is configured to encapsulate the audio data and the auxiliary identification information into a data packet, and send the data packet to the audio receiving device;
and after receiving the data packet, the audio receiving device decapsulates the data packet to obtain the audio data and the auxiliary identification information.
In an optional embodiment, the audio acquisition device is configured to embed the auxiliary identification information into the audio data, encapsulate the audio data embedded with the auxiliary identification information into a data packet, and send the data packet to the audio receiving device;
and the audio receiving device is used for de-encapsulating the data packet after receiving the data packet, separating the audio data embedded with the auxiliary identification information and obtained by de-encapsulation, and obtaining the audio data and the auxiliary identification information.
In an optional embodiment, the audio acquisition device is configured to convert the auxiliary identification information into an audio digital watermark to be embedded in the audio data, and encapsulate the audio data embedded with the auxiliary identification information into a data packet and send the data packet to the audio receiving device;
and the audio receiving device is used for decapsulating the data packet after receiving the data packet, separating the audio data embedded with the auxiliary identification information, which is obtained by decapsulation, to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the auxiliary identification information.
In an alternative embodiment, the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
In an optional embodiment, the audio acquisition device embeds the auxiliary identification information into the audio data before encoding.
In an alternative embodiment, the control instruction further comprises another control instruction generated by a control sensor of the audio capture device in response to a user trigger.
In an optional embodiment, the audio capture device is further configured to process a target audio segment corresponding to the control instruction.
In an optional embodiment, the processing of the audio segment corresponding to the control instruction by the audio acquisition device includes one or more of the following: enhancing, reducing noise and moistening color.
In an optional embodiment, the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the audio receiving device is further configured to send the received audio data and the auxiliary identification information to the electronic device.
In an alternative embodiment, the electronic device is the audio receiving apparatus;
the audio receiving device is further configured to perform media processing using the audio data, and identify a control instruction from the audio data according to the auxiliary identification information.
In an alternative embodiment, the electronic device includes one or more cameras.
In an alternative embodiment, the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
In an alternative embodiment, the media processing comprises: audio editing and/or audio-video editing.
It can be understood that the specific implementation of the functions of the second audio acquisition system provided in the foregoing embodiments of the present application has been described in the foregoing text, and is not described herein again.
The technical features provided in the above embodiments may be arbitrarily combined with each other to form various embodiments, as long as there is no conflict or contradiction between the combinations of the technical features, which will be apparent to those skilled in the art. The present application is not limited to the text, and the combination of various technical features is not described one by one, but the combination of various technical features is also within the scope of the disclosure of the present specification.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the embodiments of the application following, in general, the principles of the embodiments of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the embodiments of the application pertain. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the embodiments of the application being indicated by the following claims.
It is to be understood that the embodiments of the present application are not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the embodiments of the present application is limited only by the following claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims (180)

1. An audio acquisition device, comprising: a microphone, a processor and a wireless transceiver;
the processor is used for carrying out instruction identification processing on the audio data collected by the microphone to obtain a control instruction; the wireless transceiver is also used for sending the audio data and the control instruction to an audio receiving device;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
2. The audio capture device of claim 1, wherein the processor is further configured to encode the audio data prior to sending the audio data to the audio receiving device.
3. The audio acquisition device as claimed in claim 2, wherein the processor performs the instruction recognition processing on the audio data as the audio data before encoding.
4. The audio acquisition device as claimed in claim 1, wherein the audio data and the control command are encapsulated into a data packet and sent to the audio receiving device.
5. The audio capture device of claim 4, wherein the control instructions are embedded in the audio data prior to encapsulating the data packet.
6. The audio capture device of claim 5, wherein the control instructions are converted into an audio digital watermark and embedded into the audio data.
7. The audio acquisition device of claim 6, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside the human auditory frequency range.
8. The audio acquisition device as claimed in claim 5, wherein the audio data embedded by the control instruction is audio data before encoding.
9. The audio acquisition device according to claim 1, wherein the manner of performing the instruction recognition processing on the audio data acquired by the microphone by the processor specifically comprises:
intercepting an audio clip containing voice in the audio data;
extracting audio features of the audio segments;
and inputting the audio features into a specified voice recognition model, and recognizing the control command.
10. The audio acquisition device of claim 1, further comprising: a control sensor;
the control instructions may also include another control instruction generated by the control sensor in response to a user trigger.
11. The audio capture device of claim 1, wherein the processor is further configured to process a target audio segment for which the control instruction is identified.
12. The audio capture device of claim 11, wherein the processing of the target audio segment comprises one or more of: silencing, strengthening and changing sound.
13. The audio acquisition device according to claim 1, wherein the type of the audio data corresponding to the control instruction comprises: speech type and/or non-speech type.
14. The audio acquisition device according to claim 1, wherein the electronic equipment is other electronic equipment to which the audio receiving device is communicatively connected;
the audio data is used for the audio receiving device to send the audio data to the electronic equipment to execute media processing;
the control instruction is used for the audio receiving device to send the control instruction to the electronic equipment so as to execute control processing.
15. The audio acquisition device according to claim 1, wherein the electronic apparatus is the audio receiving device;
the audio data is used for the audio receiving device to execute media processing;
the control instruction is used for the audio receiving device to execute control processing.
16. The audio acquisition device of claim 1, wherein the electronic equipment comprises one or more cameras.
17. The audio acquisition device as recited in claim 1, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
18. The audio capture device of claim 1, wherein the media processing comprises: audio editing and/or audio-video editing.
19. An audio acquisition device, comprising: the processor is used for identifying and processing audio data collected by the microphone to obtain auxiliary identification information; the wireless transceiver is further used for transmitting the audio data and the auxiliary identification information to an audio receiving device;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
20. The audio capture device of claim 19, wherein the supplemental identification information comprises one or more of: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
21. The audio capturing apparatus according to claim 20, wherein the type of the audio data corresponding to the control command includes: speech type and/or non-speech type.
22. The audio capture device of claim 19, wherein the processor is further configured to encode the audio data prior to sending the audio data to the audio receiving device.
23. The audio acquisition device as recited in claim 22, wherein the audio data processed by the processor is the audio data before encoding.
24. The audio acquisition device as claimed in claim 19, wherein the audio data and the auxiliary identification information are encapsulated into a data packet and transmitted to the audio receiving device.
25. The audio capture device of claim 24, wherein the supplemental identification information is embedded in the audio data prior to encapsulating the data packet.
26. The audio capture device of claim 25, wherein the secondary identification information is embedded in the audio data after being converted into an audio digital watermark.
27. The audio capture device of claim 26, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
28. The audio acquisition device as claimed in claim 25, wherein the audio data embedded with the auxiliary identification information is pre-encoded audio data.
29. The audio acquisition device of claim 19, further comprising: a control sensor;
the control instructions may also include another control instruction generated by the control sensor in response to a user trigger.
30. The audio capture device of claim 19, wherein the processor is further configured to process an audio clip corresponding to the control instruction.
31. The audio capturing apparatus according to claim 30, wherein the processing of the audio segment corresponding to the control instruction includes one or more of: enhancing, reducing noise and moistening color.
32. The audio acquisition device as recited in claim 19, wherein the electronic equipment is other electronic equipment to which the audio receiving device is communicatively connected;
the audio data is used for the audio receiving device to send the audio data to the electronic equipment to execute media processing;
the auxiliary identification information is used for the audio receiving device to send the auxiliary identification information to the electronic equipment so as to identify a control instruction from the audio data according to the auxiliary identification information.
33. The audio acquisition device as recited in claim 19, wherein the electronic equipment is the audio receiving device;
the audio data is used for the audio receiving device to execute media processing;
the auxiliary identification information is used for the audio receiving device to identify the control instruction from the audio data according to the auxiliary identification information.
34. The audio capture device of claim 19, wherein the electronic equipment comprises one or more cameras.
35. The audio capture device of claim 19, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
36. The audio capture device of claim 19, wherein the media processing comprises: audio editing and/or audio-video clips.
37. An audio receiving apparatus, comprising: a wireless transceiver and a processor;
the processor is used for receiving the audio data and the control instruction sent by the audio acquisition device through the wireless transceiver; the control instruction is obtained by performing instruction identification processing on the acquired audio data by the audio acquisition device;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
38. The audio receiving device of claim 37, wherein the processor is further configured to decode the received audio data.
39. The audio receiving apparatus according to claim 38, wherein the control instruction is obtained by performing instruction recognition processing on the audio data before encoding by the audio capturing apparatus.
40. The audio receiving device of claim 39, wherein the processor is further configured to decapsulate the data packet received by the wireless transceiver to obtain the audio data and the control command.
41. The audio receiving apparatus according to claim 40, wherein the decapsulated packet is audio data in which the control instruction is embedded.
42. The audio receiving apparatus according to claim 41, wherein the processor is further configured to separate the audio data embedded with the control instruction, and obtain an audio digital watermark converted by the control instruction and the audio data.
43. The audio receiving device of claim 42, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
44. The audio receiving apparatus according to claim 41, wherein the audio data obtained by separating the audio data in which the control instruction is embedded is pre-encoding audio data.
45. The audio receiving apparatus according to claim 37, wherein the control instruction is obtained by the audio capturing apparatus intercepting an audio segment containing speech in the audio data, extracting an audio feature of the audio segment, and inputting the audio feature into a specified speech recognition model.
46. The audio receiving device of claim 37, wherein the received control instruction further comprises another control instruction generated by a control sensor of the audio capture device in response to a user trigger.
47. The audio receiving apparatus according to claim 37, wherein a target audio segment in the received audio data is processed by the audio capturing apparatus, and the target audio segment is an audio segment corresponding to the control instruction.
48. The audio receiving apparatus according to claim 47, wherein the target audio segment is processed by one or more of: silencing, strengthening and changing sound.
49. The audio receiving apparatus according to claim 37, wherein the type of the audio data corresponding to the control instruction comprises: speech type and/or non-speech type.
50. The audio receiving apparatus according to claim 37, wherein the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the processor is further configured to send the received audio data and the control instruction to the electronic device.
51. The audio receiving apparatus according to claim 37, wherein the electronic device is the audio receiving apparatus;
the processor is further configured to perform media processing using the audio data, and perform an operation corresponding to the control instruction.
52. The audio receiving apparatus according to claim 37, wherein the media processing comprises: audio editing and/or audio-video editing.
53. The audio receiving apparatus of claim 37, wherein the electronic device comprises one or more cameras.
54. The audio receiving apparatus according to claim 37, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
55. An audio receiving apparatus, comprising: a wireless transceiver and a processor;
the processor is used for receiving the audio data and the auxiliary identification information sent by the audio acquisition device through the wireless transceiver; the auxiliary identification information is obtained by identifying and processing the acquired audio data by the audio acquisition device;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
56. The audio receiving device of claim 55, wherein the auxiliary identification information comprises one or more of the following information: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
57. The audio receiving apparatus according to claim 56, wherein the type of the audio data corresponding to the control instruction comprises: speech type and/or non-speech type.
58. The audio receiving device of claim 55, wherein the processor is further configured to decode the received audio data.
59. The audio receiving apparatus according to claim 58, wherein the auxiliary identification information is obtained by the audio capturing apparatus performing identification processing on the audio data before encoding.
60. The audio receiving device of claim 55, wherein the processor is further configured to decapsulate the data packet received via the wireless transceiver to obtain the audio data and the assistant identification information.
61. The audio receiving apparatus according to claim 60, wherein the decapsulated packet is audio data in which the ancillary identification information is embedded.
62. The audio receiving apparatus of claim 61, wherein the processor is further configured to separate the audio data embedded with the auxiliary identification information, and obtain an audio digital watermark and audio data converted by the auxiliary identification information.
63. The audio receiving device of claim 62, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
64. The audio receiving apparatus according to claim 61, wherein the audio data obtained by separating the audio data in which the ancillary identification information is embedded is pre-encoding audio data.
65. The audio receiving apparatus according to claim 55, wherein the processor is further configured to receive, through the wireless transceiver, a control instruction sent by the audio capturing apparatus; the received control instruction is another control instruction generated by a control sensor of the audio acquisition device in response to a trigger of a user.
66. The audio receiving apparatus according to claim 55, wherein a target audio segment in the received audio data is processed by the audio capturing apparatus, and the target audio segment is an audio segment corresponding to the control instruction.
67. The audio receiving apparatus according to claim 66, wherein the target audio segment is processed by one or more of: enhancing, reducing noise and moistening color.
68. The audio receiving device of claim 55, wherein the media processing comprises: audio editing and/or audio-video editing.
69. The audio receiving apparatus according to claim 55, wherein the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the processor is further configured to send the received audio data and the auxiliary identification information to the electronic device.
70. The audio receiving apparatus according to claim 55, wherein the electronic device is the audio receiving apparatus;
the processor is further configured to perform media processing using the audio data, and identify a control instruction from the audio data based on the ancillary identification information.
71. The audio receiving device of claim 55, wherein the electronic equipment comprises one or more cameras.
72. The audio receiving apparatus according to claim 55, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
73. An audio processing method applied to an audio acquisition device, the method comprising:
carrying out instruction identification processing on the collected audio data to obtain a control instruction;
sending the audio data and the control instruction to an audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
74. The audio processing method of claim 73, wherein before sending the audio data to the audio receiving device, the method further comprises:
encoding the audio data.
75. The audio processing method according to claim 74, wherein the audio data subjected to the instruction recognition processing is audio data before encoding.
76. The audio processing method of claim 73, wherein sending the audio data and the control instruction to an audio receiving device comprises:
and encapsulating the audio data and the control command into a data packet and sending the data packet to the audio receiving device.
77. The audio processing method of claim 76, wherein encapsulating the audio data and the control command into a data packet comprises:
embedding the control instruction into the audio data;
and encapsulating the audio data embedded with the control instruction into a data packet.
78. The audio processing method according to claim 77, wherein before embedding the control instruction in the audio data, the method further comprises:
and converting the control instruction into an audio digital watermark.
79. The audio processing method of claim 78, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
80. The audio processing method according to claim 77, wherein the audio data in which the control instruction is embedded is audio data before encoding.
81. The audio processing method according to claim 73, wherein the performing of the instruction recognition process on the collected audio data comprises:
intercepting an audio clip containing voice in the audio data;
extracting audio features of the audio segments;
and inputting the audio features into a specified voice recognition model, and recognizing the control command.
82. The audio processing method of claim 73, wherein the control instruction further comprises another control instruction generated in response to a user trigger.
83. The audio processing method of claim 73, further comprising:
and processing the target audio clip with the control instruction identified.
84. The audio processing method according to claim 83, wherein the processing of the target audio segment comprises one or more of: silencing, strengthening and changing sound.
85. The audio processing method of claim 73, wherein the type of audio data corresponding to the control instruction comprises: speech type and/or non-speech type.
86. The audio processing method according to claim 73, wherein the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the audio data is used for the audio receiving device to send the audio data to the electronic equipment to execute media processing;
the control instruction is used for the audio receiving device to send the control instruction to the electronic equipment so as to execute control processing.
87. The audio processing method according to claim 73, wherein the electronic device is the audio receiving apparatus;
the audio data is used for the audio receiving device to execute media processing;
the control instruction is used for the audio receiving device to execute control processing.
88. The audio processing method of claim 73, wherein the electronic device comprises one or more cameras.
89. The audio processing method of claim 73, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
90. The audio processing method of claim 73, wherein the media processing comprises: audio editing and/or audio-video editing.
91. An audio processing method applied to an audio acquisition device, the method comprising:
carrying out identification processing on the collected audio data to obtain auxiliary identification information;
transmitting the audio data and the auxiliary identification information to an audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
92. The audio processing method of claim 91, wherein the auxiliary identification information comprises one or more of: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
93. The audio processing method of claim 92, wherein the type of audio data corresponding to the control command comprises: speech type and/or non-speech type.
94. The audio processing method of claim 91, wherein before sending the audio data to the audio receiving device, the method further comprises:
encoding the audio data.
95. The audio processing method of claim 94, wherein the audio data subjected to the identification process is audio data before encoding.
96. The audio processing method of claim 91, wherein sending the audio data and the ancillary identification information to an audio receiving device comprises:
and encapsulating the audio data and the auxiliary identification information into a data packet and sending the data packet to the audio receiving device.
97. The audio processing method of claim 96, wherein encapsulating the audio data and the ancillary identification information into a data packet comprises:
embedding the ancillary identification information into the audio data;
and encapsulating the audio data embedded with the auxiliary identification information into a data packet.
98. The audio processing method of claim 97, wherein prior to embedding the ancillary identification information in the audio data, the method further comprises:
and converting the auxiliary identification information into an audio digital watermark.
99. The audio processing method of claim 98, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
100. The audio processing method of claim 97, wherein the audio data embedded with the assistant identification information is pre-encoded audio data.
101. The audio processing method of claim 91, wherein the control instruction further comprises another control instruction generated in response to a user trigger.
102. The audio processing method of claim 91, further comprising:
and processing the target audio clip corresponding to the control instruction.
103. The audio processing method according to claim 102, wherein the processing of the audio segment corresponding to the control instruction comprises one or more of: enhancing, reducing noise and moistening color.
104. The audio processing method according to claim 91, wherein the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected;
the audio data is used for the audio receiving device to send the audio data to the electronic equipment to execute media processing;
the auxiliary identification information is used for the audio receiving device to send the auxiliary identification information to the electronic equipment so as to identify a control instruction from the audio data according to the auxiliary identification information.
105. The audio processing method according to claim 91, wherein the electronic device is the audio receiving apparatus;
the audio data is used for the audio receiving device to execute media processing;
the auxiliary identification information is used for the audio receiving device to identify the control instruction from the audio data according to the auxiliary identification information.
106. The audio processing method of claim 91, wherein the electronic device comprises one or more cameras.
107. The audio processing method of claim 91, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
108. The audio processing method according to claim 91, wherein the media processing comprises: audio editing and/or audio-video editing.
109. An audio processing method applied to an audio receiving device, the method comprising:
receiving audio data and a control instruction sent by an audio acquisition device through a wireless network;
the control instruction is obtained by performing instruction identification processing on the acquired audio data by the audio acquisition device, the audio data is used for one or more electronic devices to execute media processing, the control instruction is used for one or more electronic devices to execute control processing, and the electronic devices are the audio receiving device or other electronic devices in communication connection with the audio receiving device.
110. The audio processing method of claim 109, wherein after receiving the audio data, the method further comprises:
decoding the received audio data.
111. The audio processing method according to claim 110, wherein the control instruction is obtained by performing instruction recognition processing on the audio data before encoding by the audio capturing apparatus.
112. The audio processing method of claim 109, wherein the receiving the audio data and the control command sent by the audio capturing device via the wireless network comprises:
and receiving a data packet through a wireless network, and decapsulating the data packet to obtain the audio data and the control instruction.
113. The audio processing method of claim 112, wherein the decapsulating the data packet to obtain the audio data and the control instruction comprises:
decapsulating the data packet to obtain audio data embedded with the control instruction;
and separating the audio data embedded with the control instruction to obtain the audio data and the control instruction.
114. The audio processing method of claim 113, wherein separating the audio data embedded with the control command to obtain the audio data and the control command comprises:
and separating the audio data embedded with the control instruction to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the control instruction.
115. The audio processing method of claim 114, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
116. The audio processing method of claim 113, wherein the audio data obtained by separating the audio data embedded with the control instruction is pre-coding audio data.
117. The audio processing method of claim 109, wherein the control instruction is obtained by the audio capture device intercepting an audio segment containing speech in the audio data, extracting audio features of the audio segment, and inputting the audio features into a specified speech recognition model.
118. The audio processing method of claim 109, wherein the received control instruction further comprises another control instruction generated by a control sensor of the audio capture device in response to a user trigger.
119. The audio processing method according to claim 109, wherein a target audio segment in the received audio data is processed by the audio capture device, and the target audio segment is an audio segment corresponding to the control instruction.
120. The audio processing method of claim 119, wherein the processing of the target audio segment comprises one or more of: silencing, strengthening and changing sound.
121. The audio processing method of claim 120, wherein the type of audio data corresponding to the control instruction comprises: speech type and/or non-speech type.
122. The audio processing method of claim 109, wherein the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected; the method further comprises the following steps:
and sending the received audio data and the control instruction to the electronic equipment.
123. The audio processing method of claim 109, wherein the electronic device is the audio receiving apparatus; the method further comprises the following steps:
and executing media processing by using the audio data, and executing the operation corresponding to the control instruction.
124. The audio processing method of claim 109, wherein the media processing comprises: audio editing and/or audio-video editing.
125. The audio processing method of claim 109, wherein the electronic device comprises one or more cameras.
126. The audio processing method of claim 109, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
127. An audio processing method applied to an audio receiving device, the method comprising:
receiving audio data and auxiliary identification information sent by an audio acquisition device through a wireless network;
the auxiliary identification information is obtained by identifying and processing the acquired audio data by the audio acquisition device; the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
128. The audio processing method of claim 127, wherein the auxiliary identification information comprises one or more of: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
129. The audio processing method of claim 128, wherein the type of audio data corresponding to the control instruction comprises: speech type and/or non-speech type.
130. The audio processing method of claim 127, wherein after receiving the audio data, the method further comprises:
decoding the received audio data.
131. The audio processing method of claim 130, wherein the auxiliary identification information is obtained by the audio capturing device performing identification processing on pre-encoded audio data.
132. The audio processing method of claim 127, wherein the receiving the audio data and the auxiliary identification information sent by the audio capturing device via the wireless network comprises:
and receiving a data packet through a wireless network, and decapsulating the data packet to obtain the audio data and the auxiliary identification information.
133. The audio processing method of claim 132, wherein the decapsulating the data packet to obtain the audio data and the assistant identification information comprises:
decapsulating the data packet to obtain audio data in which the auxiliary identification information is embedded;
and separating the audio data embedded with the auxiliary identification information to obtain the audio data and the auxiliary identification information.
134. The audio processing method of claim 133, wherein separating the audio data embedded with the assistant identification information to obtain the audio data and the assistant identification information comprises:
and separating the audio data embedded with the auxiliary identification information to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the auxiliary identification information.
135. The audio processing method of claim 134, wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is a frequency range outside of a human auditory frequency range.
136. The audio processing method of claim 133, wherein the audio data obtained by separating the audio data embedded with the assistant identification information is pre-encoding audio data.
137. The audio processing method of claim 127, wherein the method further comprises:
receiving a control instruction sent by the audio acquisition device through a wireless network; the received control instruction is another control instruction generated by a control sensor of the audio acquisition device in response to a trigger of a user.
138. The audio processing method according to claim 127, wherein a target audio segment in the received audio data is processed by the audio capturing device, and the target audio segment is an audio segment corresponding to the control instruction.
139. The audio processing method according to claim 138, wherein the processing of the target audio segment comprises one or more of: enhancing, reducing noise and moistening color.
140. The audio processing method of claim 127, wherein the media processing comprises: audio editing and/or audio-video editing.
141. The audio processing method according to claim 127, wherein the electronic device is another electronic device to which the audio receiving apparatus is communicatively connected; the method further comprises the following steps:
and sending the received audio data and the auxiliary identification information to the electronic equipment.
142. The audio processing method of claim 127, wherein the electronic device is the audio receiving apparatus; the method further comprises the following steps:
and executing media processing by using the audio data, and identifying a control instruction from the audio data according to the auxiliary identification information.
143. The audio processing method of claim 127, wherein the electronic device comprises one or more cameras.
144. The audio processing method of claim 127, wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
145. An audio acquisition system, comprising:
the audio acquisition device and the audio receiving device;
the audio acquisition device is used for carrying out instruction identification processing on the acquired audio data to obtain a control instruction; sending the audio data and the control instruction to the audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, and the control instruction is used for one or more electronic devices to execute control processing, wherein the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
146. The audio acquisition system of claim 145 wherein the audio acquisition device is configured to encode the acquired audio data and send the encoded audio data to the audio receiving device;
the audio receiving device is used for decoding the received audio data after receiving the audio data.
147. The audio acquisition system of claim 146 wherein the audio data that the audio acquisition device performs the instruction recognition processing is pre-encoded audio data.
148. The audio acquisition system of claim 145 wherein the audio acquisition device is configured to encapsulate the audio data and the control command into data packets and send the data packets to the audio receiving device;
and the audio receiving device is used for de-encapsulating the data packet after receiving the data packet to obtain the audio data and the control instruction.
149. The audio acquisition system of claim 148 wherein the audio acquisition device is configured to embed the control command into the audio data, encapsulate the audio data embedded with the control command into a data packet, and send the data packet to the audio receiving device;
and the audio receiving device is used for de-encapsulating the data packet after receiving the data packet, separating the audio data embedded with the control instruction and obtained by de-encapsulation, and obtaining the audio data and the control instruction.
150. The audio acquisition system of claim 149, wherein the audio acquisition device is configured to convert the control command into an audio digital watermark to be embedded in the audio data, and encapsulate the audio data embedded with the control command into a data packet and send the data packet to the audio receiving device;
and the audio receiving device is used for decapsulating the data packet after receiving the data packet, separating the audio data embedded with the control instruction obtained by decapsulation to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the control instruction.
151. The audio acquisition system of claim 150 wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is outside the human auditory frequency range.
152. The audio acquisition system of claim 149 wherein the audio acquisition device embeds the control commands into audio data that is pre-encoded.
153. The audio capture system of claim 145, wherein the audio capture device performs instruction recognition processing on the captured audio data, comprising:
intercepting an audio clip containing voice in the audio data;
extracting audio features of the audio segments;
and inputting the audio features into a specified voice recognition model, and recognizing the control command.
154. The audio acquisition system of claim 145 wherein the control instructions further comprise another control instruction generated by a control sensor of the audio acquisition device in response to a user trigger.
155. The audio capture system of claim 145, wherein the audio capture device is further configured to process a target audio clip that identifies the control directive.
156. The audio acquisition system of claim 155 wherein the processing of the target audio segment by the audio acquisition device comprises one or more of: silencing, strengthening and changing sound.
157. The audio acquisition system of claim 145 wherein the type of audio data to which the control instructions correspond comprises: speech type and/or non-speech type.
158. The audio acquisition system of claim 145 wherein the electronic device is another electronic device to which the audio receiving device is communicatively coupled;
the audio receiving device is further configured to send the received audio data and the control instruction to the electronic device.
159. The audio acquisition system of claim 145 wherein the electronic device is the audio receiving means;
the audio receiving device is further configured to execute media processing by using the audio data, and execute an operation corresponding to the control instruction.
160. The audio acquisition system of claim 145 wherein the electronic device comprises one or more cameras.
161. The audio acquisition system of claim 145 wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
162. The audio acquisition system of claim 145 wherein the media processing comprises: audio editing and/or audio-video editing.
163. An audio acquisition system, comprising:
the audio acquisition device and the audio receiving device;
the audio acquisition device is used for identifying the acquired audio data to obtain auxiliary identification information; transmitting the audio data and the auxiliary identification information to the audio receiving device through a wireless network;
the audio data is used for one or more electronic devices to execute media processing, the auxiliary identification information is used for one or more electronic devices to identify a control instruction from the audio data according to the auxiliary identification information, and the electronic devices are the audio receiving device or other electronic devices connected with the audio receiving device in a communication mode.
164. The audio acquisition system of claim 163 wherein the ancillary identification information comprises one or more of the following: the audio clip identification information is used for indicating the audio clip corresponding to the control instruction, the type of the audio data corresponding to the control instruction and the control content information corresponding to the control instruction.
165. The audio acquisition system of claim 164 wherein the type of audio data to which the control instructions correspond comprises: speech type and/or non-speech type.
166. The audio acquisition system of claim 163 wherein the audio acquisition device is configured to encode the acquired audio data and send the encoded audio data to the audio receiving device;
the audio receiving device is used for decoding the received audio data after receiving the audio data.
167. The audio acquisition system of claim 166 wherein the audio data recognized by the audio acquisition device is pre-encoded audio data.
168. The audio acquisition system of claim 163 wherein the audio acquisition device is configured to encapsulate the audio data and the auxiliary identification information into data packets and send the data packets to the audio receiving device;
and after receiving the data packet, the audio receiving device decapsulates the data packet to obtain the audio data and the auxiliary identification information.
169. The audio acquisition system of claim 168 wherein the audio acquisition device is configured to embed the ancillary identification information into the audio data, encapsulate the audio data embedded with the ancillary identification information into a data packet, and send the data packet to the audio receiving device;
and the audio receiving device is used for de-encapsulating the data packet after receiving the data packet, separating the audio data embedded with the auxiliary identification information and obtained by de-encapsulation, and obtaining the audio data and the auxiliary identification information.
170. The audio acquisition system of claim 169 wherein the audio acquisition device is configured to convert the auxiliary identification information into an audio digital watermark to embed the audio data, encapsulate the audio data embedded with the auxiliary identification information into a data packet, and send the data packet to the audio receiving device;
and the audio receiving device is used for decapsulating the data packet after receiving the data packet, separating the audio data embedded with the auxiliary identification information, which is obtained by decapsulation, to obtain an audio digital watermark and audio data, and converting the audio data watermark to obtain the auxiliary identification information.
171. The audio acquisition system of claim 170 wherein the audio digital watermark has a frequency within a specified frequency range, wherein the specified frequency range is outside the human auditory frequency range.
172. The audio acquisition system of claim 169 wherein the audio acquisition device embeds the supplemental identification information into the audio data as pre-encoded audio data.
173. The audio acquisition system of claim 163 wherein the control instructions further comprise another control instruction generated by a control sensor of the audio acquisition device in response to a user trigger.
174. The audio capturing system of claim 163, wherein the audio capturing apparatus is further configured to process a target audio segment corresponding to the control instruction.
175. The audio acquisition system of claim 174 wherein the processing of the audio segment corresponding to the control instruction by the audio acquisition device comprises one or more of: enhancing, reducing noise and moistening color.
176. The audio acquisition system of claim 163 wherein the electronic device is another electronic device to which the audio receiving means is communicatively coupled;
the audio receiving device is further configured to send the received audio data and the auxiliary identification information to the electronic device.
177. The audio acquisition system of claim 163 wherein the electronic device is the audio receiving means;
the audio receiving device is further configured to perform media processing using the audio data, and identify a control instruction from the audio data according to the auxiliary identification information.
178. The audio acquisition system of claim 163 wherein the electronic device comprises one or more cameras.
179. The audio acquisition system of claim 163 wherein the electronic device comprises any one of: unmanned aerial vehicle, camera, cloud platform, unmanned car.
180. The audio acquisition system of claim 163 wherein the media processing comprises: audio editing and/or audio-video editing.
CN202080004930.8A 2020-03-19 2020-03-19 Audio acquisition device, audio receiving device and audio processing method Pending CN112639963A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080268 WO2021184315A1 (en) 2020-03-19 2020-03-19 Audio acquisition apparatus, audio receiving apparatus, and audio processing method

Publications (1)

Publication Number Publication Date
CN112639963A true CN112639963A (en) 2021-04-09

Family

ID=75291266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080004930.8A Pending CN112639963A (en) 2020-03-19 2020-03-19 Audio acquisition device, audio receiving device and audio processing method

Country Status (2)

Country Link
CN (1) CN112639963A (en)
WO (1) WO2021184315A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758669A (en) * 2022-06-13 2022-07-15 深圳比特微电子科技有限公司 Audio processing model training method and device, audio processing method and device and electronic equipment
WO2023004776A1 (en) * 2021-07-30 2023-02-02 深圳市大疆创新科技有限公司 Signal processing method for microphone array, microphone array, and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030016583A (en) * 2001-08-21 2003-03-03 (주)마크텍 Transmitting/receiving system using watermark as control signal and method thereof
CN102737629A (en) * 2011-11-11 2012-10-17 东南大学 Embedded type speech emotion recognition method and device
CN104008132A (en) * 2014-05-04 2014-08-27 深圳市北科瑞声科技有限公司 Voice map searching method and system
CN104010057A (en) * 2014-06-05 2014-08-27 深圳市易科泰科技有限公司 Voice recognition calling system
US20170019580A1 (en) * 2015-07-16 2017-01-19 Gopro, Inc. Camera Peripheral Device for Supplemental Audio Capture and Remote Control of Camera

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542956B1 (en) * 2012-01-09 2017-01-10 Interactive Voice, Inc. Systems and methods for responding to human spoken audio
JP2019087927A (en) * 2017-11-09 2019-06-06 東京瓦斯株式会社 Infrared operation system
MX2019005047A (en) * 2018-04-30 2019-11-01 Tti Macao Commercial Offshore Ltd Garage door opener system having an intelligent automated assistant and method of controlling the same.

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030016583A (en) * 2001-08-21 2003-03-03 (주)마크텍 Transmitting/receiving system using watermark as control signal and method thereof
CN102737629A (en) * 2011-11-11 2012-10-17 东南大学 Embedded type speech emotion recognition method and device
CN104008132A (en) * 2014-05-04 2014-08-27 深圳市北科瑞声科技有限公司 Voice map searching method and system
CN104010057A (en) * 2014-06-05 2014-08-27 深圳市易科泰科技有限公司 Voice recognition calling system
US20170019580A1 (en) * 2015-07-16 2017-01-19 Gopro, Inc. Camera Peripheral Device for Supplemental Audio Capture and Remote Control of Camera

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023004776A1 (en) * 2021-07-30 2023-02-02 深圳市大疆创新科技有限公司 Signal processing method for microphone array, microphone array, and system
CN114758669A (en) * 2022-06-13 2022-07-15 深圳比特微电子科技有限公司 Audio processing model training method and device, audio processing method and device and electronic equipment
CN114758669B (en) * 2022-06-13 2022-09-02 深圳比特微电子科技有限公司 Audio processing model training method and device, audio processing method and device and electronic equipment

Also Published As

Publication number Publication date
WO2021184315A1 (en) 2021-09-23

Similar Documents

Publication Publication Date Title
US20050070337A1 (en) Wireless headset for use in speech recognition environment
CN110691204B (en) Audio and video processing method and device, electronic equipment and storage medium
WO2003049003A3 (en) Systems and methods for tv navigation with compressed voice-activated commands
CN112639963A (en) Audio acquisition device, audio receiving device and audio processing method
CN112770212B (en) Wireless earphone, video recording system and method, and storage medium
CN110379439A (en) A kind of method and relevant apparatus of audio processing
CN105049802B (en) A kind of speech recognition law-enforcing recorder and its recognition methods
WO2016029393A1 (en) Earphone recognition method and apparatus, earphone control method and apparatus, and earphone
CN107071152A (en) Method and system based on verbal system state adjust automatically stereo set volume
CN114141230A (en) Electronic device, and voice recognition method and medium thereof
US6959095B2 (en) Method and apparatus for providing multiple output channels in a microphone
CN111976924A (en) Real-time information communication device for diving full mask
EP3552508A1 (en) Smart helmet having remote control, and remote control method thereof
US20220180886A1 (en) Methods for clear call under noisy conditions
CN111343022A (en) Method and system for realizing network configuration processing of intelligent equipment by directly interacting with user
US11842745B2 (en) Method, system, and computer-readable medium for purifying voice using depth information
CN107580785A (en) Wear-type audio collection module
US20170289712A1 (en) A method for operating a hearing system as well as a hearing system
JP2012151544A (en) Imaging apparatus and program
CN104754261A (en) Projection equipment and projection method
CN112104964B (en) Control method and control system of following type sound amplification robot
CN111713119A (en) Headset, headset system and method in headset system
CN111988705B (en) Audio processing method, device, terminal and storage medium
KR101892268B1 (en) method and apparatus for controlling mobile in video conference and recording medium thereof
CN109903767B (en) Voice processing method, device, equipment and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination