CN115019803B - Audio processing method, electronic device, and storage medium - Google Patents
Audio processing method, electronic device, and storage medium Download PDFInfo
- Publication number
- CN115019803B CN115019803B CN202111165223.XA CN202111165223A CN115019803B CN 115019803 B CN115019803 B CN 115019803B CN 202111165223 A CN202111165223 A CN 202111165223A CN 115019803 B CN115019803 B CN 115019803B
- Authority
- CN
- China
- Prior art keywords
- audio data
- data
- electronic device
- layer
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000003860 storage Methods 0.000 title claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 41
- 230000005540 biological transmission Effects 0.000 claims description 17
- 230000009467 reduction Effects 0.000 claims description 10
- 238000004891 communication Methods 0.000 description 35
- 230000006854 communication Effects 0.000 description 35
- 230000006870 function Effects 0.000 description 26
- 238000007726 management method Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 13
- 230000005236 sound signal Effects 0.000 description 12
- 238000010295 mobile communication Methods 0.000 description 11
- 210000000988 bone and bone Anatomy 0.000 description 10
- 230000000694 effects Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 101000941170 Homo sapiens U6 snRNA phosphodiesterase 1 Proteins 0.000 description 1
- 102100031314 U6 snRNA phosphodiesterase 1 Human genes 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000010009 beating Methods 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011982 device technology Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000010349 pulsation Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 230000003238 somatosensory effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the application provides an audio processing method, electronic equipment and a computer readable storage medium. The audio processing method comprises the following steps: the hardware driving layer acquires audio data to be played; the hardware driving layer carries out voice enhancement processing on the audio data to obtain enhanced data of the audio data; the hardware driving layer sends the audio data and the enhanced data of the audio data to an application layer of the electronic equipment; the application layer obtains a voice recognition result for performing voice recognition on the audio data based on the enhanced data; the application layer controls the electronic equipment to output the audio data, and controls the electronic equipment to output the voice recognition result when the audio data is output. In this application, the speech enhancement processing is performed at the hardware-driven layer, not at the application layer. The method and the device can remarkably improve the problem of desynchrony of the subtitles and the audios.
Description
Technical Field
The present application relates to the field of electronic device technologies, and in particular, to an audio processing method, an electronic device, and a computer-readable storage medium.
Background
At present, more and more intelligent voice devices appear in the market, such as intelligent home, intelligent security, intelligent vehicle-mounted and the like. After the intelligent voice equipment collects the voice data of the user, the collected voice data can be identified, so that the functionality of the intelligent voice equipment is provided.
For example, in some video conferencing or live video scenes, by recognizing the voice of the user, the smart voice device can display the speaking content of the user in the form of subtitles while playing the audio of the user.
In the prior art, the intelligent voice equipment consumes a long time for voice recognition, subtitles have obvious lag relative to the played audio, and audiences can feel that the subtitles are displayed in a pause phenomenon.
Disclosure of Invention
Some embodiments of the present application provide an audio processing method, an electronic device, and a computer-readable storage medium, which are described below in various aspects, and embodiments and advantages of the following aspects may be mutually referenced.
In a first aspect, an embodiment of the present application provides an audio processing method, which is used for an electronic device, and the method includes: a hardware driving layer of the electronic equipment acquires audio data to be played, wherein the audio data comprises voice data; the hardware driving layer carries out voice enhancement processing on the audio data to obtain enhanced data of the audio data; the hardware driving layer sends the audio data and the enhanced data of the audio data to an application layer of the electronic equipment; the application layer obtains a voice recognition result for performing voice recognition on the audio data based on the enhanced data; the application layer controls the electronic equipment to output audio data (for example, playing audio), and controls the electronic equipment to output a voice recognition result when the audio data is output; wherein, the hardware driver layer carries out speech enhancement processing to the audio data, including: the hardware driving layer carries out voice enhancement processing on the audio data through one or more voice enhancement algorithms, wherein the algorithm type of each voice enhancement algorithm in the one or more voice enhancement algorithms and the execution sequence of each voice enhancement algorithm are determined by the hardware driving layer according to algorithm configuration issued by the application layer; the execution sequence of each speech enhancement algorithm comprises: at least two speech enhancement algorithms are executed in series; and/or, at least two speech enhancement algorithms are executed in parallel; the one or more speech enhancement algorithms comprise a speech noise reduction algorithm and a human voice enhancement algorithm; the hardware driver layer sends the audio data and the enhancement data of the audio data to an application layer of the electronic equipment, and the hardware driver layer comprises: the hardware driving layer sends the audio data and the enhancement data of the audio data to the application layer through a plurality of different sound channels respectively, wherein the plurality of different sound channels are determined by the hardware driving layer according to the sound channel configuration issued by the application layer.
According to the embodiment of the application, the voice enhancement processing is executed in a hardware driving layer, but not in an application layer. In this way, the step of performing speech enhancement processing on the audio data by the application layer is not included between the electronic device outputting the speech recognition result (for example, displaying subtitles) and the electronic device playing the audio. Therefore, the embodiment of the application can remarkably improve the problem of delay of voice recognition result output (such as subtitle display) relative to audio playing.
In some embodiments, the speech noise reduction algorithm and the human voice enhancement algorithm are performed in series.
In some embodiments, a system default recorded data transmission channel is included in the electronic device; the hardware driving layer transmits the audio data and the enhancement data of the audio data to the application layer through a plurality of different channels respectively, and comprises the following steps: the hardware driving layer transmits the audio data through the recording data transmission sound channel, and transmits the enhancement data of the audio data through other sound channels except the recording data transmission sound channel.
In some embodiments, the hardware driver layer sends the raw data and the enhanced data of the audio data to an application layer of the electronic device, wherein: and the hardware driving layer transmits the original data and the enhanced data of the audio data to the application layer in a frame alignment mode.
In some embodiments, the application layer obtains a speech recognition result of the audio data based on the enhancement data, including: and the application layer sends a voice recognition request carrying the enhanced data to the cloud server so as to obtain a voice recognition result of the audio data from the cloud server.
In some embodiments, the algorithm configuration issued by the application layer to the hardware driver layer is associated with the cloud server.
In some embodiments, the application layer controls the electronic device to output a speech recognition result, including: the application layer controls a display screen of the electronic equipment to output the voice recognition result in the form of subtitles.
In a second aspect, an embodiment of the present application provides an electronic device, including: a memory to store instructions for execution by one or more processors of an electronic device; the processor, when executing the instructions in the memory, may cause the electronic device to perform the audio processing method provided in any embodiment of the first aspect of the present application. The beneficial effects that can be achieved by the second aspect can refer to the beneficial effects of any implementation manner of the first aspect of the present application, and are not described herein again.
In a third aspect, the present application provides a computer-readable storage medium, on which instructions are stored, and when executed on a computer, the instructions cause the computer to execute the audio processing method provided in any one of the embodiments of the first aspect of the present application. The beneficial effects that can be achieved by the third aspect can refer to the beneficial effects of any one of the embodiments of the first aspect of the present application, and are not described herein again.
Drawings
FIG. 1 illustrates an exemplary application scenario of an embodiment of the present application;
FIG. 2 illustrates an audio processing method in some implementations;
FIG. 3 illustrates an audio processing method provided by an embodiment of the present application;
fig. 4 shows an exemplary flowchart of an audio processing method provided by an embodiment of the present application;
fig. 5 illustrates a transmission manner of audio data inside an electronic device according to an embodiment of the present application;
FIG. 6 is a diagram illustrating an exemplary architecture of an electronic device provided by an embodiment of the application;
FIG. 7 shows a block diagram of an electronic device provided by an embodiment of the application;
fig. 8 shows a schematic structural diagram of a System On Chip (SOC) provided in an embodiment of the present application.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings.
The embodiment of the application is used for providing an audio processing method for electronic equipment. The embodiment can improve the problem of the lag of the voice recognition result (such as subtitles) relative to the played audio so as to improve the user experience.
The present embodiment does not limit the type of the electronic device 100. For example, the electronic device 100 may be a mobile phone, a laptop, a tablet, a wearable device (e.g., a watch, smart glasses, a helmet), an Augmented Reality (AR)/Virtual Reality (VR) device, a Personal Digital Assistant (PDA), and the like, and the present application is not limited thereto. A cellular phone is described as an example of the electronic device 100.
Fig. 1 illustrates an exemplary application scenario, specifically a live video scenario, in an embodiment of the present application. Referring to fig. 1, when a user a is playing a video live broadcast, an electronic device 100 (specifically, a mobile phone) may collect user image data through a camera and collect audio data through a microphone (the microphone referred to herein is a microphone of the electronic device 100). Then, the electronic device 100 outputs the captured image data (i.e., playing video) through the display screen and outputs the captured audio data (i.e., playing audio) through the speaker, so that the viewer can view the live content of the user a. To improve the viewing experience of the viewer, the electronic device 100 displays the speaking content of the user's nail in the form of subtitles 101 while playing the audio. The subtitles 101 are obtained by performing voice recognition on audio data collected by a microphone.
Fig. 2 illustrates a process for speech recognition of audio data in some implementations. Referring to fig. 2, the microphone of the electronic device 100 transfers the audio data to the hardware driver layer of the electronic device 100 after acquiring the audio data. The hardware driver layer passes the audio data through the multimedia framework layer to the application layer. After receiving the audio data, the application layer sends the audio data to the cloud server 200, so as to perform voice recognition on the audio data through the cloud server 200.
Because the audio data collected by the microphone may include non-voice data such as noise and background sound in addition to the voice data of the user a, the application layer performs voice enhancement processing on the audio data before sending the audio data to the cloud server 200, and sends the enhanced audio data to the cloud server 200. The voice enhancement processing can eliminate noise, background sound and other non-voice components in the audio data, so that the accuracy of voice recognition is improved.
The timing of the steps in the method of fig. 2 is described below. At the time of T1, the application layer acquires audio data uploaded by the hardware driving layer and plays the audio data in real time; at the time T2, the application layer completes the voice enhancement processing on the audio data, and sends the enhanced audio data to the cloud server 200; at the time T3, the application layer receives the speech recognition result returned by the cloud server 200, and displays the speech recognition result in the form of a subtitle. That is to say, in the implementation method shown in fig. 2, between the electronic device 100 displaying the subtitles (at time T3) and the electronic device 100 playing the audio data (at time T1), the application layer of the electronic device 100 needs to perform a speech enhancement process, upload the enhanced audio data to the cloud, and so on, so that there is a relatively obvious lag between the displaying of the subtitles (at time T3) on the electronic device 100 and the playing of the audio data (at time T1) on the electronic device 100, and thus, the user feels that the subtitles are displayed with a pause phenomenon.
In addition, before the audio data is passed to the application layer, the driver layer and/or the multimedia framework layer may process the audio data through a recording algorithm (e.g., reverberation removal, etc.) to improve the playing effect of the audio data. This process may cause loss of speech components in the audio data, thereby affecting the accuracy of speech recognition.
Therefore, the present embodiment provides an audio processing method to solve the above technical problems. Referring to fig. 3, in this embodiment, after the hardware driver layer acquires the audio data, the hardware driver layer performs a speech enhancement process on the audio data at the hardware driver layer, and sends the enhanced audio data and the audio data that is not subjected to the speech enhancement together to the application layer. After receiving the audio data sent by the hardware driving layer, the application layer plays the audio data which is not subjected to voice enhancement; and acquiring a voice recognition result of the audio data based on the enhanced audio data. For example, the application layer sends the enhanced audio data to the cloud server 200 to obtain a voice recognition result of the audio data from the cloud server 200. After obtaining the voice recognition result, the application layer controls the electronic device 100 to output the voice recognition result. For example, the voice recognition result is output in the form of displaying a subtitle.
That is, in the present embodiment, the speech enhancement processing is performed at the hardware-driven layer, not at the application layer. In this way, between the electronic device 100 displaying the subtitles and the electronic device 100 playing the audio, the step of performing the speech enhancement processing on the audio data by the application layer is not included. Therefore, the embodiment can significantly improve the problem of delay of the speech recognition result output (e.g., subtitle display) relative to the audio playback compared to the implementation shown in fig. 2.
In addition, in this embodiment, the hardware driver layer passes the enhanced audio data to the application layer. The enhanced audio data is specially used for subsequent voice recognition and is data which is not processed by a recording algorithm. Therefore, compared to the method shown in fig. 2, in the present embodiment, the enhanced audio data does not lose the speech component, so that the accuracy of speech recognition can be improved.
In addition, in the present embodiment, the speech enhancement processing is performed in a hardware driver layer. Because the hardware driver layer is closer to the audio hardware than the application layer, the electronic device 100 may perform speech enhancement processing on the audio data with greater efficiency.
It should be noted that the scenario shown in fig. 1 is only an exemplary scenario of an application scenario of the present application. The method and the device can also be applied to other scenes besides the scene shown in figure 1, such as a video conference scene and a short video recording scene. In addition, the electronic device 100 may play the audio data and simultaneously play the video data collected by the camera; it is also possible to play only audio data without playing video data.
Specific steps of the audio processing method provided in this embodiment are described below with reference to the scenario shown in fig. 1, but it should be understood that the application is not limited thereto. Referring to fig. 4 in conjunction with fig. 3, the audio processing method provided in this embodiment includes:
s110: and the hardware driving layer acquires audio data to be played.
Referring to fig. 4, a hardware driver layer is located at the bottom layer of the software architecture of the electronic device 100 for isolating the multimedia framework layer from hardware (e.g., a microphone). Illustratively, referring to fig. 3, for an android device, the multimedia framework layer includes an AudioRecorder module (for audio compression, etc.) and an audioflanger module (for mixing, volume adjustment, etc.) arranged from top to bottom, and the hardware driver layer is located below the audioflanger module.
When a user clicks a start recording button in an audio application (in this embodiment, a video recorder application) of the application layer, the microphone starts to collect audio data, and transmits the collected audio data to the hardware driver layer. Thus, the hardware driver layer can acquire audio data to be played (hereinafter referred to as "audio data a"). In the present embodiment, a video recorder application is taken as an example of an audio application. In other embodiments, the audio application may be other applications of the application layer, e.g., a beautiful show TM Jitter tone TM And the like. In addition, the audio application may be a system application or a third party application.
The audio data a is audio data that can be captured by the microphone of the electronic apparatus 100. Specifically, the audio data a may include background sound data (e.g., background music), noise data (e.g., noisy sound), and the like, in addition to the voice data of the user a.
S120: the hardware driver layer copies the audio data a to obtain two pieces of audio data a. Herein, the two pieces of audio data a are referred to as audio data A1 and audio data A2, respectively.
S130: the hardware driver layer performs speech enhancement processing on the audio data A2 to obtain enhanced audio data (hereinafter, the enhanced audio data is referred to as "audio data B2").
The speech enhancement process is used to weaken (e.g., remove) non-speech data (e.g., noise, background sound) in the audio data A2 and/or to strengthen speech data (specifically, speech components of the user's nail) in the audio data A2 to improve accuracy in a subsequent speech recognition step.
The speech enhancement processing on the audio data A2 is to perform one or more speech enhancement algorithms on the audio data A2. The one or more speech enhancement algorithms may include a number of speech noise reduction algorithms and may also include a number of human voice enhancement algorithms.
Wherein the speech noise reduction algorithm is used to remove noise data in the audio data A2. For example, noise data is removed from the audio data A2 according to preset template noise. Different kinds of speech noise reduction algorithms correspond to different template noises. The human voice enhancement algorithm is used to weaken the background sound component in the audio data A2 and/or to strengthen the effective speech component (i.e., the speech component of the user a) in the audio data A2. The background sound may include background music, background voice, and the like. The human voice enhancement algorithm may adopt a spectral subtraction method, a minimum mean square error method, and the like, and the embodiment is not limited.
The hardware driver layer may perform at least one speech enhancement algorithm on the audio data A2. In this embodiment, the type of the speech enhancement algorithm executed by the hardware driver layer and the execution sequence among the algorithms are determined according to the algorithm configuration issued by the application layer.
The application layer may determine the kind of speech enhancement algorithm to be performed on the audio data A2 according to actual needs. In some examples, the application layer determines the speech enhancement algorithm from the cloud server 200 that subsequently recognizes the speech. For example, when the cloud server 200 subsequently performing speech recognition is the cloud server 200 provided by the vendor a (e.g., glory, hundredth), the application layer determines the speech enhancement algorithm provided by the vendor a as the speech enhancement algorithm performed on the audio data A2, so as to improve the accuracy of the subsequent speech recognition. That is, in this example, the algorithm configuration issued by the application layer is associated with the cloud server 200 for performing voice recognition subsequently. In other examples, the application layer may also determine the speech enhancement algorithm to be performed on the audio data A2 according to the hardware configuration of the electronic device 100, the calculation amount of the speech enhancement algorithm, and the like. For example, when the hardware configuration of the electronic apparatus 100 is high, the application layer may configure a speech enhancement algorithm, which has a good enhancement effect but a slightly larger calculation amount, as an algorithm to be executed on the audio data A2.
The application layer may also configure the order of execution between algorithms. The application layer may configure the two algorithms to execute in series, or to execute in parallel. Illustratively, when algorithm 1 and algorithm 2 are configured to execute serially, the two algorithms will be executed in sequence. The serial execution mode can simplify the program setting of the hardware driving layer.
When algorithm 1 and algorithm 2 are configured to execute in parallel, the two algorithms will be executed synchronously. For example, the audio data A1 is copied into two copies, one of which is executed with algorithm 1 and the other of which is executed with algorithm 2 synchronously. And after the algorithm 1 and the algorithm 2 are executed, merging the two pieces of audio data A1. The parallel execution may improve the efficiency of the speech enhancement step.
For consistency of description, other details of the application layer issuing algorithm configuration will be described later.
In this embodiment, according to the algorithm configuration delivered by the application layer, the hardware driving layer executes two speech enhancement algorithms on the audio data A2, one is a speech noise reduction algorithm (the algorithm of the algorithm is referred to as "ReNoise a" herein), and the other is a human speech enhancement algorithm (the algorithm of the algorithm is referred to as "voice a" herein). Wherein the ReNoise A algorithm and the EnVoice A algorithm are executed serially. After the ReNoise a algorithm and the voice a algorithm are executed, the hardware driver layer obtains enhanced voice data B2 (also called enhanced data of the voice data A2).
S140: the hardware driver layer sends the audio data A1 and the audio data B2 to the application layer. Specifically, the hardware driver layer sends the audio data A1 and B2 to the recorder application in the application layer via the multimedia framework layer.
The audio data A1 may include audio data of a plurality of channels. For example, for a 2.1 channel electronic device 100, the audio data A1 may include 3 channels of audio data; for a 5.1 channel electronic device 100, the audio data A1 may include 6 channels of audio data.
The audio data B2 is data obtained by performing speech enhancement processing on the audio data A2 (the same as the audio data A1), and the number of channels of the audio data B2 is determined by a speech enhancement algorithm that is performed on the audio data A2. Generally, the audio data B2 is audio data of a single channel (i.e., 1 channel), and may be audio data of multiple channels (e.g., 2 channels) in some cases. In this embodiment, the audio data B2 is taken as 2 channels of audio data for example.
In this embodiment, the hardware driver layer transmits the audio data A1 through a recording data transmission channel default by the system, and transmits the audio data B2 through a channel other than the recording data transmission channel.
The default recording data transmission channel of the system is described below. Take the example where the maximum number of channels supported by the software architecture (e.g., operating system) of the electronic device 100 is 24 channels. In this embodiment, the channels supported by the hardware (e.g., a speaker) of the electronic device 100 are 5.1 channels (i.e., the hardware supports 6 channels), and the electronic device 100 may use the 1 st to 6 th channels as default recording data transmission channels of the system. When the electronic device 100 plays audio, the electronic device 100 may transmit audio data to be played through a recorded data transmission channel (i.e., channels 1 to 6).
In this embodiment, when the hardware driver layer transmits audio data to the recorder application, the hardware driver layer transmits the audio data A1 through the 1 st to 6 th channels, and transmits the audio data B2 through channels (specifically, the 7 th and 8 th channels) other than the 1 st to 6 th channels. Since the audio data A1 is the audio subsequently played by the electronic device 100, the audio data A1 is transmitted through the recording data transmission channel default by the system, and the original architecture of the system for playing the audio is not changed, thereby reducing the amount of change of the system by the method.
And the audio data B2 is newly added audio data specially used for subsequent voice recognition, so that the transmission of the audio data B2 does not occupy the default audio data transmission channel of the system, but transmits through other channels (for example, channels 7 and 8) besides the default audio data transmission channel. In the present embodiment, the channel for transmitting the audio data B2 is also referred to as an "extension channel".
In this embodiment, the hardware driver layer determines the transmission channels of the audio data A1 and B2 according to the channel configuration delivered by the application layer. For consistency of description, the details of the application layer channel down configuration will be described later.
Further, the hardware driver layer sends the audio data A1 and the audio data B2 to the recorder application in a frame alignment manner. The frame alignment is performed in such a manner that the audio data A1 and the audio data B2 are aligned on the time axis. This is described in detail in connection with fig. 5.
Referring to fig. 5, when the hardware driver layer transmits audio data to the audio application, the hardware driver layer writes the audio data A1 and B2 to be transmitted into the buffer D1 of the memory. Then, the multimedia framework layer reads the audio data A1 and B2 from the buffer D1, and puts the read audio data A1 and B2 into the memory buffer D2. The recorder application then reads the audio data A1, B2 from the buffer D2. Through the above process, the hardware driver layer may transfer the audio data A1, B2 to the application layer.
And the hardware driving layer writes the audio data A1 and B2 into the buffer D1 in a frame alignment mode. Referring to fig. 5, audio data A1 and audio data B2 are each a data stream, each including a plurality of data at a plurality of points in time. For example, the audio data A1 includes 6 data A1t1_1 to A1t1_6 at a time point t1, 6 data A1t2_1 to A1t2_6 at a time point t2, \8230 \ 8230;, 6 data A1tn _1 to A1tn _6 at a time point tn, and the like. Similarly, the audio data B2 includes 2 data B2t1_1 and B2t1_2 at the time point t1, and 2 data B2t2_1 and B2t2_2 at the time point t2, etc., which are not described herein again. In this embodiment, the specific value of n is not limited. For example, n =15, 20, 30, etc.
When writing data into the buffer D1, the hardware driving layer writes the audio data A1 and the audio data B2 corresponding to the same time point into the buffer D1 at one time. For example, the hardware driver layer writes 6 pieces of audio data A1 (specifically, A1t1_1 to A1t1_ 6) and 2 pieces of audio data B2 (specifically, B2t1_1 and B2t1_ 2) corresponding to the time point t1 into the buffer area D1 at a time. In this way, the multimedia framework layer can acquire the audio data A1 and B2 corresponding to the same time point from the buffer D1 at a time and write the audio data into the buffer D2. Further, the recorder application can acquire the audio data A1, B2 corresponding to the same point in time at once from the buffer D2. That is, the audio data A1, B2 corresponding to the same point in time can arrive at the recorder application at the same time. Therefore, the audio data A1 and B2 are transmitted in a frame synchronization manner, so that the problem of audio and subtitle non-synchronization can be further improved.
In this embodiment, the hardware driver layer may not perform enhancement processing on the audio data A1. That is, in the present embodiment, the recorder application receives audio data as audio data A1, B2. In other embodiments, during the process of sending the audio data A1 to the recorder application, the hardware driver layer and/or the multimedia framework layer may further perform a recording algorithm on the audio data A1 (e.g., remove reverberation components in the audio data A1) to improve the playing effect of the audio data A1. Herein, the audio data A1 after the recording algorithm is executed is referred to as "audio data B1". That is, in this embodiment, the recorder application receives audio data as audio data B1, B2.
S150: the application layer controls the electronic device 100 to play the audio data A1, and controls the electronic device 100 to output a voice recognition result when playing the audio data A1.
The recorder application, upon receiving the audio data A1, B2, controls audio hardware (e.g., a microphone) of the electronic device 100 to play the audio data A1.
The recorder application also sends the audio data B2 to the cloud server 200 that can provide voice recognition service, so as to perform voice recognition on the audio data B2 through the cloud server 200. Specifically, the video recorder application sends a voice recognition request carrying the audio data B2 to the cloud server 200, and after receiving the voice recognition request, the cloud server 200 performs voice recognition on the audio data B2 and returns a voice recognition result to the electronic device 100. In this embodiment, the electronic device 100 performs voice recognition by using the computing power of the cloud server 200, so that the requirement for the computing power of the electronic device can be reduced. But the application is not limited thereto. In other embodiments, the electronic device 100 may perform voice recognition on the audio data B2 locally, or the electronic device 100 may transmit the audio data B2 to other devices in the local area network (e.g., a notebook computer in the local area network) to perform voice recognition on the audio data B2 by the power of the other devices in the local area network.
After the electronic device 100 receives the voice recognition result returned by the cloud server 200, the video recorder application controls the electronic device 100 to output the voice recognition result. For example, the recorder application controls the display screen of the electronic device 100 to display the speech recognition results in the form of subtitles. Specifically, the video recorder application may display the subtitles in the video preview area so that the viewer can simultaneously see the subtitles displayed on the display screen while watching the video live.
In summary, the present embodiment provides an audio processing method. In the audio processing method provided by the embodiment, the speech enhancement processing is performed at the hardware driver layer, but not at the application layer. In this way, between the electronic device 100 displaying the subtitles and the electronic device 100 playing the audio, the step of performing the speech enhancement processing on the audio data by the application layer is not included. Therefore, compared with the implementation manner shown in fig. 2, the embodiment can significantly improve the problem of delay of the speech recognition result output (e.g., subtitle display) with respect to audio playing, so as to improve the user experience.
The embodiment is an exemplary illustration of the technical solution of the present application. Other variations will occur to those skilled in the art, provided they are satisfied by the objects of the invention. The sequence numbers of the steps are not intended to limit the order of execution of the steps. The order of execution of the steps can be adjusted by those skilled in the art.
The following describes an exemplary method for the application layer to issue algorithm configuration and channel configuration to the hardware-driven layer.
Before the hardware driver layer starts to acquire the audio data a, the recorder application issues the algorithm configuration and the channel configuration (hereinafter, the algorithm configuration and the channel configuration are simply referred to as "configuration") to the hardware driver layer. For example, when the recorder application starts, the recorder application issues the configuration to the hardware driver layer before the user clicks the start recording button.
The video recorder application can issue the configuration to the multimedia framework layer through the interface function setparameter () provided by the multimedia framework layer, and then the multimedia framework layer issues the configuration to the hardware driving layer through the interface function setparameter () provided by the hardware driving layer. The following is presented as an example of the procedure by which the recorder application passes the configuration to the multimedia framework layer. The process of transferring the configuration from the multimedia application to the hardware driver layer may refer to the process of transferring the configuration from the video recorder application to the multimedia framework layer, which is not described in detail.
Specifically, the recorder application passes the configuration to the multimedia framework layer by setting a parameter of the interface function setparameter (). Illustratively, the recorder application sets the parameters in the interface function setparameterer () to setparameterer (algorithm _1= renoise a 2= entvoice a sequence = 0.
That is, the parameters set in the interface function setparameter () of the recorder application are pairs of key-value pairs, and the adjacent key-value pairs pass; "spaced apart, each pair of key-value pairs is used to set a configuration parameter. Each pair of key-value pairs includes two character strings connected by "=", the character string on the left of "=" represents a parameter name, and the character string on the right of "=" represents a parameter value.
The first to third pairs of key-value pairs are configured for an algorithm. Taking the first pair of key-value pairs as an example, the parameter name algorithm _1 indicates that the key-value pair is used for setting the first speech enhancement algorithm, and the parameter value ReNoise A indicates that the algorithm with the algorithm name ReNoise A is set as the first speech enhancement algorithm. Similarly, the second pair of key-value pairs sets the algorithm named EnVoice A as the second mid-speech enhancement algorithm.
The third pair of key-value pairs represents the order in which the speech enhancement algorithms are executed. When the value of sequence is set to 0, it indicates that each speech enhancement algorithm is executed in series; when the value of sequence is set to 1, it indicates that the speech enhancement algorithms are executed in parallel. In other embodiments, when more than 3 speech enhancement algorithms are executed (e.g., including algorithm _1, algorithm _2, algorithm _3), key-value pairs may also be added to setparameter () to specify which speech enhancement algorithms are executed in parallel. For example, add "sequence _ parallel = 2" key-value pair to indicate that algorithm _2 and algorithm _3 are executed in parallel, and the other algorithms are executed in series.
The fourth and fifth pairs of key-value pairs are channel configurations. The fourth pair of key values represents that the number of channels of the audio data A1 (audio data for playback) is 6, and the fifth pair of key values represents that the number of channels of the audio data B2 (audio data for speech recognition) is 2. After acquiring the channel configuration, the hardware driver determines 1 to 6 channels as channels for transmitting audio data A1, and determines the 7 th and 8 th channels as channels for transmitting audio data B2.
Other variations may be made by those skilled in the art for the above exemplary implementation in which the application layer issues the algorithm configuration and the channel configuration to the hardware-driven layer. For example, the parameter name in the key value pair is changed into other expression methods, and each parameter is set by other methods besides the key value pair. The present embodiment is not limited as long as the configuration can be like hardware driver layer transfer algorithm configuration and channel configuration.
Fig. 6 shows a schematic structural diagram of the electronic device 100.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose-output (GPIO) interface, and a Subscriber Identity Module (SIM) interface.
The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the electronic device 100.
The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of answering a call through a bluetooth headset.
The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.
The MIPI interface may be used to connect the processor 110 with peripheral devices such as the display screen 194, the camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 100. Processor 110 and display screen 194 communicate via a DSI interface to implement display functions of electronic device 100.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.
It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The USB connector 130 is a connector conforming to a USB standard specification, and may be used to connect the electronic device 100 and a peripheral device, and may specifically be a standard USB connector (e.g., type C connector), a Mini USB connector, a Micro USB connector, or the like. The USB connector 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The connector may also be used to connect other electronic devices, such as AR devices and the like. In some embodiments, processor 110 may support a Universal Serial Bus (Universal Serial Bus), which may have standard specifications of USB1.X, USB2.0, USB3.X, USB4.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB connector 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be disposed in the same device.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.
The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when a user takes a picture, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, an optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and converting into an image visible to the naked eye. The ISP can also carry out algorithm optimization on noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV and other formats. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The NPU is a neural-network (NN) computing processor, which processes input information quickly by referring to a biological neural network structure, for example, by referring to a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into a sound signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a call or voice information, it can receive voice by placing the receiver 170B close to the ear of the person.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking near the microphone 170C through the mouth. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and so on.
The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be the USB connector 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association) standard interface of the USA.
The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. Pressure sensor 180A
Such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but have different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.
The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.
The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude from barometric pressure values measured by barometric pressure sensor 180C to assist in positioning and navigation.
The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip phone, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device 100 may utilize range sensor 180F to range for fast focus.
The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there are no objects near the electronic device 100. The electronic device 100 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.
The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, photograph the fingerprint, answer an incoming call with the fingerprint, and so on.
The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is below another threshold to avoid abnormal shutdown of the electronic device 100 due to low temperature. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
The touch sensor 180K is also called a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation acting thereon or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided via the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.
The bone conduction sensor 180M can acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human voice vibrating a bone mass. The bone conduction sensor 180M may also contact the human body pulse to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so that the heart rate detection function is realized.
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects in response to touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic apparatus 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
Referring now to FIG. 7, shown is a block diagram of an electronic device 400 in accordance with one embodiment of the present application. The electronic device 400 may include one or more processors 401 coupled to a controller hub 403. For at least one embodiment, the controller hub 403 communicates with the processor 401 via a multi-drop Bus such as a Front Side Bus (FSB), a point-to-point interface such as a QuickPath Interconnect (QPI), or similar connection 406. Processor 401 executes instructions that control general types of data processing operations. In one embodiment, the Controller Hub 403 includes, but is not limited to, a Graphics Memory Controller Hub (GMCH) (not shown) and an Input/Output Hub (IOH) (which may be on separate chips) (not shown), where the GMCH includes a Memory and a Graphics Controller and is coupled with the IOH.
The electronic device 400 may also include a coprocessor 402 and memory 404 coupled to the controller hub 403. Alternatively, one or both of the memory and GMCH may be integrated within the processor (as described herein), with the memory 404 and coprocessor 402 coupled directly to the processor 401 and controller hub 403, with the controller hub 403 and IOH in a single chip.
The Memory 404 may be, for example, a Dynamic Random Access Memory (DRAM), a Phase Change Memory (PCM), or a combination of the two. Memory 404 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions therein. A computer-readable storage medium has stored therein instructions, and in particular, temporary and permanent copies of the instructions. The instructions may include: instructions that, when executed by at least one of the processors, cause the electronic device 400 to implement the method shown in fig. 4. When the instructions are executed on a computer, the instructions cause the computer to execute the method disclosed by the embodiment of the application.
In one embodiment, the coprocessor 402 is a special-purpose processor, such as, for example, a high-throughput Integrated Core (MIC) processor, a network or communication processor, compression engine, graphics processor, general-purpose computing on graphics processing unit (GPGPU), embedded processor, or the like. The optional nature of coprocessor 402 is represented in FIG. 7 by dashed lines.
In one embodiment, electronic device 400 may further include a Network Interface Controller (NIC) 406. Network interface 406 may include a transceiver to provide a radio interface for electronic device 400 to communicate with any other suitable device (e.g., front end module, antenna, etc.). In various embodiments, the network interface 406 may be integrated with other components of the electronic device 400. The network interface 406 may implement the functions of the communication unit in the above-described embodiments.
The electronic device 400 may further include an Input/Output (I/O) device 405.I/O405 may include: a user interface designed to enable a user to interact with the electronic device 400; the design of the peripheral component interface enables peripheral components to also interact with the electronic device 400; and/or sensors are designed to determine environmental conditions and/or location information associated with electronic device 400.
It is noted that fig. 7 is merely exemplary. That is, although fig. 7 shows that the electronic device 400 includes a plurality of devices, such as a processor 401, a controller hub 403, a memory 404, etc., in practical applications, the device using the methods of the present application may include only a part of the devices of the electronic device 400, for example, may include only the processor 401 and the network interface 406. The nature of the alternative device in fig. 7 is shown in dashed lines.
Referring now to fig. 8, shown is a block diagram of a System on Chip (SoC) 500 in accordance with an embodiment of the present application. In fig. 8, like parts have the same reference numerals. In addition, the dashed box is an optional feature of more advanced socs. In fig. 8, the SoC500 includes: an interconnect unit 550 coupled to the processor 510; a system agent unit 580; a bus controller unit 590; an integrated memory controller unit 540; a set or one or more coprocessors 520 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) unit 530; a Direct Memory Access (DMA) unit 560. In one embodiment, coprocessor 520 comprises a special-purpose processor, such as, for example, a network or communication processor, compression engine, general-purpose computing on graphics processing units (GPGPU), a high-throughput MIC processor, or an embedded processor, among others.
Static Random Access Memory (SRAM) unit 530 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. The computer readable storage medium has stored therein instructions, in particular, temporary and permanent copies of the instructions. The instructions may include: instructions that when executed by at least one of the processors cause the SoC to implement the method as shown in fig. 4. The instructions, when executed on a computer, cause the computer to perform the methods disclosed in the embodiments of the present application.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
The method embodiments of the present application may be implemented in software, magnetic components, firmware, etc.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described herein are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
One or more aspects of at least one embodiment may be implemented by representative instructions stored on a computer-readable storage medium, which represent various logic in a processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. These representations, known as "Intellectual Property (IP) cores," may be stored on a tangible computer-readable storage medium and provided to a number of customers or production facilities to load into the manufacturing machines that actually manufacture the logic or processors.
In some cases, an instruction converter may be used to convert instructions from a source instruction set to a target instruction set. For example, the instruction converter may transform (e.g., using a static binary transform, a dynamic binary transform including dynamic compilation), morph, emulate, or otherwise convert the instruction into one or more other instructions to be processed by the core. The instruction converter may be implemented in software, hardware, firmware, or a combination thereof. The instruction converter may be on the processor, off-processor, or partially on and partially off-processor.
Claims (9)
1. An audio processing method for an electronic device, the method comprising:
a hardware driving layer of the electronic equipment acquires audio data to be played, wherein the audio data comprises voice data;
the hardware driving layer carries out voice enhancement processing on the audio data to obtain enhanced data of the audio data;
the hardware driving layer sends the audio data and the enhancement data of the audio data to an application layer of the electronic equipment;
the application layer obtains a voice recognition result for performing voice recognition on the audio data based on the enhancement data;
the application layer controls the electronic equipment to output the audio data, and controls the electronic equipment to output the voice recognition result when the audio data is output; wherein,
the hardware driving layer carries out voice enhancement processing on the audio data, and the processing comprises the following steps:
the hardware driving layer carries out voice enhancement processing on the audio data through one or more voice enhancement algorithms, wherein the algorithm type of each voice enhancement algorithm in the one or more voice enhancement algorithms and the execution sequence of each voice enhancement algorithm are determined by the hardware driving layer according to algorithm configuration issued by the application layer; the execution sequence of the speech enhancement algorithms comprises: at least two speech enhancement algorithms are executed in series; and/or at least two speech enhancement algorithms are executed in parallel; the one or more speech enhancement algorithms comprise a speech noise reduction algorithm and a human voice enhancement algorithm;
the hardware driver layer sends the audio data and the enhancement data of the audio data to an application layer of the electronic device, and the method comprises the following steps:
and the hardware driving layer sends the audio data and the enhancement data of the audio data to the application layer through a plurality of different sound channels respectively, wherein the plurality of different sound channels are determined by the hardware driving layer according to the sound channel configuration issued by the application layer.
2. The method of claim 1, wherein the speech noise reduction algorithm and the human voice enhancement algorithm are performed in series.
3. The method of claim 1, wherein a system default recording data transmission channel is included in the electronic device; the hardware driver layer transmits audio data and enhancement data of the audio data to the application layer through a plurality of different channels, respectively, including:
and the hardware driving layer transmits the audio data through the recording data transmission sound channel and transmits the enhanced data of the audio data through other sound channels except the recording data transmission sound channel.
4. The method of claim 1, wherein the hardware driver layer sends the raw data of the audio data and the enhancement data to an application layer of the electronic device, wherein:
and the hardware driving layer sends the original data and the enhanced data of the audio data to the application layer in a frame alignment mode.
5. The method of claim 1, wherein the application layer obtains a speech recognition result of the audio data based on the enhancement data, and comprises:
and the application layer sends a voice recognition request carrying the enhanced data to a cloud server so as to obtain a voice recognition result of the audio data from the cloud server.
6. The method of claim 5, wherein the algorithm configuration issued by the application layer to the hardware driver layer is associated with the cloud server.
7. The method of claim 1, wherein the application layer controls the electronic device to output the speech recognition result, and comprises:
and the application layer controls a display screen of the electronic equipment to output the voice recognition result in a subtitle mode.
8. An electronic device, comprising:
a memory to store instructions for execution by one or more processors of the electronic device;
a processor that, when executing the instructions in the memory, causes the electronic device to perform the method of any of claims 1-7.
9. A computer-readable storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111165223.XA CN115019803B (en) | 2021-09-30 | 2021-09-30 | Audio processing method, electronic device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111165223.XA CN115019803B (en) | 2021-09-30 | 2021-09-30 | Audio processing method, electronic device, and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115019803A CN115019803A (en) | 2022-09-06 |
CN115019803B true CN115019803B (en) | 2023-01-10 |
Family
ID=83064404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111165223.XA Active CN115019803B (en) | 2021-09-30 | 2021-09-30 | Audio processing method, electronic device, and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115019803B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104575509A (en) * | 2014-12-29 | 2015-04-29 | 乐视致新电子科技(天津)有限公司 | Voice enhancement processing method and device |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN110060685A (en) * | 2019-04-15 | 2019-07-26 | 百度在线网络技术(北京)有限公司 | Voice awakening method and device |
CN110097891A (en) * | 2019-04-22 | 2019-08-06 | 广州视源电子科技股份有限公司 | Microphone signal processing method, device, equipment and storage medium |
CN110364154A (en) * | 2019-07-30 | 2019-10-22 | 深圳市沃特沃德股份有限公司 | Voice is converted into the method, apparatus, computer equipment and storage medium of text in real time |
CN113593567A (en) * | 2021-06-23 | 2021-11-02 | 荣耀终端有限公司 | Method for converting video and sound into text and related equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7110951B1 (en) * | 2000-03-03 | 2006-09-19 | Dorothy Lemelson, legal representative | System and method for enhancing speech intelligibility for the hearing impaired |
US20090076825A1 (en) * | 2007-09-13 | 2009-03-19 | Bionica Corporation | Method of enhancing sound for hearing impaired individuals |
WO2016033269A1 (en) * | 2014-08-28 | 2016-03-03 | Analog Devices, Inc. | Audio processing using an intelligent microphone |
-
2021
- 2021-09-30 CN CN202111165223.XA patent/CN115019803B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104575509A (en) * | 2014-12-29 | 2015-04-29 | 乐视致新电子科技(天津)有限公司 | Voice enhancement processing method and device |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN110060685A (en) * | 2019-04-15 | 2019-07-26 | 百度在线网络技术(北京)有限公司 | Voice awakening method and device |
CN110097891A (en) * | 2019-04-22 | 2019-08-06 | 广州视源电子科技股份有限公司 | Microphone signal processing method, device, equipment and storage medium |
CN110364154A (en) * | 2019-07-30 | 2019-10-22 | 深圳市沃特沃德股份有限公司 | Voice is converted into the method, apparatus, computer equipment and storage medium of text in real time |
CN113593567A (en) * | 2021-06-23 | 2021-11-02 | 荣耀终端有限公司 | Method for converting video and sound into text and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN115019803A (en) | 2022-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113810601B (en) | Terminal image processing method and device and terminal equipment | |
CN112312366B (en) | Method, electronic equipment and system for realizing functions through NFC (near field communication) tag | |
CN112860428A (en) | High-energy-efficiency display processing method and equipment | |
WO2020056684A1 (en) | Method and device employing multiple tws earpieces connected in relay mode to realize automatic interpretation | |
CN113129202A (en) | Data transmission method, data transmission device, data processing system and storage medium | |
CN114398020A (en) | Audio playing method and related equipment | |
CN113467735A (en) | Image adjusting method, electronic device and storage medium | |
CN114339429A (en) | Audio and video playing control method, electronic equipment and storage medium | |
CN114880251A (en) | Access method and access device of storage unit and terminal equipment | |
CN113593567B (en) | Method for converting video and sound into text and related equipment | |
CN114257737A (en) | Camera shooting mode switching method and related equipment | |
CN114661258A (en) | Adaptive display method, electronic device, and storage medium | |
CN114827098A (en) | Method and device for close shooting, electronic equipment and readable storage medium | |
CN113473216A (en) | Data transmission method, chip system and related device | |
CN115119336B (en) | Earphone connection system, earphone connection method, earphone, electronic device and readable storage medium | |
CN109285563B (en) | Voice data processing method and device in online translation process | |
CN113496477A (en) | Screen detection method and electronic equipment | |
CN112532508A (en) | Video communication method and video communication device | |
WO2022095752A1 (en) | Frame demultiplexing method, electronic device and storage medium | |
WO2022033344A1 (en) | Video stabilization method, and terminal device and computer-readable storage medium | |
CN114554037B (en) | Data transmission method and electronic equipment | |
CN115019803B (en) | Audio processing method, electronic device, and storage medium | |
CN116939559A (en) | Bluetooth audio coding data distribution method, electronic equipment and storage medium | |
CN111026285B (en) | Method for adjusting pressure threshold and electronic equipment | |
CN113923351A (en) | Method, apparatus, storage medium, and program product for exiting multi-channel video shooting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |