CN110992953A - Voice data processing method, device, system and storage medium - Google Patents

Voice data processing method, device, system and storage medium Download PDF

Info

Publication number
CN110992953A
CN110992953A CN201911293058.9A CN201911293058A CN110992953A CN 110992953 A CN110992953 A CN 110992953A CN 201911293058 A CN201911293058 A CN 201911293058A CN 110992953 A CN110992953 A CN 110992953A
Authority
CN
China
Prior art keywords
audio
audio data
voice
data
voice interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911293058.9A
Other languages
Chinese (zh)
Inventor
李玉澄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201911293058.9A priority Critical patent/CN110992953A/en
Publication of CN110992953A publication Critical patent/CN110992953A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6033Substation equipment, e.g. for use by subscribers including speech amplifiers for providing handsfree use or a loudspeaker mode in telephone sets
    • H04M1/6041Portable telephones adapted for handsfree use
    • H04M1/6058Portable telephones adapted for handsfree use involving the use of a headset accessory device connected to the portable telephone
    • H04M1/6066Portable telephones adapted for handsfree use involving the use of a headset accessory device connected to the portable telephone including a wireless connection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W76/00Connection management
    • H04W76/10Connection setup
    • H04W76/14Direct-mode setup
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/02Details of telephonic subscriber devices including a Bluetooth interface

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a voice data processing method, a device, a system and a storage medium. The method is realized by a voice data processing device on the Bluetooth headset, and firstly, the voice data is written into a voice frequency cache region while the voice frequency data is awakened and identified; and then, judging the recognition result, if the recognition result is awakening, establishing an audio transmission path with the voice interaction equipment terminal, and sending the audio data in the audio buffer area to the voice interaction equipment terminal. Therefore, the problem of user voice loss when a synchronous link (SCO) channel of vertical connection is established can be solved by reserving data in the audio cache region, and the experience of 'one-word-to-one-word' can be realized. And secondly, because the awakening words and the awakening words can be sent to the voice interaction equipment, secondary awakening verification can be carried out at the voice interaction equipment so as to prevent the user from being interrupted by the awakening triggered by mistake.

Description

Voice data processing method, device, system and storage medium
Technical Field
The invention relates to the field of artificial intelligence voice interaction, in particular to a method, a device and a system for processing voice data by using a Bluetooth headset and a storage medium.
Background
Along with the continuous development and progress of artificial intelligence and electronic communication technology, intelligent voice interaction equipment such as intelligent watches and intelligent sound boxes are increasingly popular with people. Recently, products for voice interaction through bluetooth headsets have been released by various big businesses.
At present, these products performing voice interaction through a bluetooth headset integrate a wakeup algorithm in a bluetooth chip or on a Digital Signal Processing (DSP) chip, and when voice wakeup is triggered, voice interaction is realized through standard Hands-free profile (HFP). In this case, the mobile terminal receives the wake-up command and applies for establishing a Connection-Oriented Synchronous link (SCO) channel, and the time from the time when the SCO channel is actually established may have an interval of several seconds; the voice of the user is lost during the period of time, so that the experience of 'one word arrive' cannot be realized by speaking the awakening word and the command word continuously.
In addition, due to the existing mode, after the detection of the wakeup word, only the statement behind the wakeup word is sent, and deeper "secondary wakeup check" cannot be performed at the mobile device side. Thus, when a wake-up word is triggered by mistake, the current behavior of the user is interrupted, resulting in poor user experience.
Disclosure of Invention
In view of the above problems, the present inventors have innovatively provided a method, apparatus, system, and storage medium for performing speech processing.
According to a first aspect of the embodiments of the present invention, a voice data processing method is applied to a voice data processing apparatus on a bluetooth headset, and the method includes: awakening and identifying the first audio data, and writing the first audio data into an audio cache region to form second audio data, wherein the result obtained by awakening and identifying is an identification result; and judging the recognition result, if the recognition result is awakening, establishing an audio transmission path with the voice interaction equipment terminal, and sending the second audio data in the audio buffer area to the voice interaction equipment terminal.
According to an embodiment of the present invention, before performing the wake-up recognition on the first audio data, the method further includes: collecting an original audio signal; and carrying out signal processing on the original audio signal to obtain first audio data.
According to an embodiment of the present invention, wherein the acquiring of the original audio signal comprises: acquiring at least two paths of original audio signals through a microphone array; accordingly, signal processing an original audio signal to obtain first audio data includes: performing signal processing on the at least two paths of original audio signals to obtain at least two paths of processing results; and combining the at least two paths of processing results to obtain first audio data, wherein the first audio data is one path of audio data.
According to an embodiment of the present invention, wherein writing the first audio data into the audio buffer to form the second audio data includes: and writing the first audio data into the audio buffer zone according to a time sequence segmentation and overflow covering mode to form second audio data.
According to an embodiment of the present invention, the performing wake-up recognition on the first audio data includes: judging whether a wake-up word exists in the first audio data judgment according to the wake-up model, if so, setting and determining the recognition result as wake-up, recording a write-in point for writing in second audio data in the audio cache region, and taking the write-in point as an obtained wake-up point; correspondingly, the sending of the second audio data in the audio buffer to the voice interaction device includes: and sending the audio data in a preset time period before the wake-up point and after the wake-up to the voice interaction equipment terminal.
According to an embodiment of the present invention, sending the second audio data in the audio buffer to the voice interaction device includes: and sending all contents in the second audio data containing the awakening words in the audio cache region to the voice interaction equipment end for secondary awakening verification.
According to a second aspect of the embodiments of the present invention, a speech data processing apparatus, the apparatus comprising: the awakening identification module is used for awakening and identifying the first audio data; the audio writing module is used for writing the first audio data into the audio buffer area to form second audio data; the identification result judging module is used for judging the identification result; and the sending module is used for establishing an audio transmission path with the voice interaction equipment terminal and sending the second audio data in the audio buffer area to the voice interaction equipment terminal if the identification result is awakening.
According to an embodiment of the present invention, the apparatus further comprises: the signal acquisition module is used for acquiring an original audio signal; and the signal processing module is used for carrying out signal processing on the original audio signal to obtain first audio data.
According to an embodiment of the present invention, the signal acquisition module includes: the microphone array unit is used for acquiring at least two paths of original audio signals through a microphone array; accordingly, the signal processing module comprises: the multi-path signal processing unit is used for carrying out signal processing on the at least two paths of original audio signals to obtain at least two paths of processing results; the signal merging unit is used for merging the at least two paths of processing results to obtain first audio data, and the first audio data is one path of audio data.
According to an embodiment of the present invention, the audio writing module is specifically configured to: and writing the first audio data into the audio buffer zone according to a time sequence segmentation and overflow covering mode to form second audio data.
According to an embodiment of the present invention, the wake-up identification module includes: the judging unit judges whether a wakeup word exists in the first audio data according to the wakeup model; the identification result determining unit is used for determining the identification result as awakening if the identification result is positive; the wake-up point recording unit is used for recording a write-in point for writing second audio data in the audio cache region and taking the write-in point as an obtained wake-up point; correspondingly, the sending module is specifically configured to establish an audio transmission path with the voice interaction device end if the recognition result is wakeup, and send the second audio data in the audio buffer area to the voice interaction device end.
According to an embodiment of the present invention, the sending module is specifically configured to send the second audio data in the audio buffer area, including the wakeup word, to the voice interaction device.
According to a third aspect of embodiments of the present invention, there is provided a speech data processing system, the system including: the voice data processing device is arranged on the Bluetooth headset and used for executing any one of the voice data processing methods; and the mobile equipment is connected with the Bluetooth headset and used for receiving the second audio data sent by the voice data processing device and processing the second audio data.
According to a fourth aspect of embodiments of the present invention, there is provided a computer storage medium comprising a set of computer executable instructions for performing any of the above-described speech data processing methods when the instructions are executed.
The embodiment of the invention provides a voice data processing method, a device, a system and a storage medium. Firstly, the audio data are simultaneously written into an audio buffer area when the audio data are awakened and identified; and then, judging the recognition result, if the recognition result is awakening, establishing an audio transmission path with the voice interaction equipment terminal, and sending the audio data in the audio buffer area to the voice interaction equipment terminal. Therefore, the problem that the voice of the user is lost when the SCO channel is established can be solved by reserving the data in the audio cache region, and the experience of' one word can be achieved. And secondly, because the awakening words can be sent to the mobile equipment together, secondary awakening verification can be carried out at the voice interaction equipment end so as to prevent the user from being interrupted by the mistakenly triggered awakening.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a schematic flow chart of a voice data processing method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a configuration of an audio data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The voice data processing method of the embodiment of the invention is applied to a voice data processing device on a Bluetooth headset, and the device is usually a DSP chip and can also be other hardware devices capable of processing audio signals or audio data. The device can be built in the Bluetooth headset or hung on the Bluetooth headset.
Fig. 1 shows an implementation flow of a voice data processing method according to an embodiment of the present invention. Referring to fig. 1, the method includes: operation 110, performing wakeup identification on the first audio data, and writing the first audio data into an audio buffer to form second audio data, where a result obtained by the wakeup identification is an identification result; operation 120, determining the recognition result, if the recognition result is awakening, establishing an audio transmission path with the voice interaction device, and sending the second audio data in the audio buffer to the voice interaction device.
In operation 110, the first audio data may be directly acquired audio signals or audio data that has been subjected to signal processing, and the audio data that has been subjected to signal processing is recommended to be used in order to improve the accuracy of wake-up recognition. The awakening identification mainly refers to inputting first audio data into an awakening module, and executing an awakening algorithm by the awakening module to obtain an identification result. The main purpose is to identify the currently input audio and determine whether the audio is awake. The existing wake-up algorithm relates to a key word detection technology, which can perform keyword search, namely wake-up word search, in input audio data, and if a wake-up word is recognized, a recognition result can be set to wake up. The first audio data can be written into the audio buffer area to form second audio data, if the cost allows, the audio buffer area can be set to be long enough, the longer the audio buffer area is set, the more the audio input data of the user can be stored, the less the audio data is lost, and the better the implementation effect is. The second audio data, here, ideally retains all of the content of the first audio data, including the wake-up word.
In operation 120, if the recognition result is wake-up, an audio transmission path with the voice interactive apparatus is established. The Bluetooth headset is used as an input and output device of audio data, and the main function of the Bluetooth headset is to receive the input of the audio data, return useful audio to the voice interaction device and output the audio returned by the voice interaction device. The method executed by the embodiment of the invention is also only a method for preprocessing the audio data, so that the audio data suspected to be problematic is finally transmitted to the voice interaction equipment for processing, and then the response audio data returned by the voice interaction equipment is returned to the user. The voice interaction device can be any voice interaction device connected with the Bluetooth headset, such as a smart watch, a smart sound box, a smart car device and the like. When establishing a channel with a voice interactive device, an integrated circuit built-in audio bus (I2S) is typically used. And the establishment of the I2S channel may be accomplished by triggering a wake-up event of the bluetooth chip. And after receiving the awakening event, the Bluetooth chip opens an I2S audio channel and sends audio data in a specified time period before the awakening point in the DSP chip and after the awakening point to the voice interaction equipment.
According to an embodiment of the present invention, before performing the wake-up recognition on the first audio data, the method further includes: collecting an original audio signal; and carrying out signal processing on the original audio signal to obtain first audio data.
In the embodiment of the present invention, the voice data processing apparatus on the bluetooth headset has a certain processing capability and an independent memory, so that not only the original audio signal can be collected, but also the original audio signal can be processed, for example, directional signal transmission is performed by using a Beamforming (Beamforming) technology or denoising processing is performed on the received signal.
According to an embodiment of the present invention, wherein the acquiring of the original audio signal comprises: acquiring at least two paths of original audio signals through a microphone array; accordingly, signal processing an original audio signal to obtain first audio data includes: performing signal processing on the at least two paths of original audio signals to obtain at least two paths of processing results; and combining the at least two paths of processing results to obtain first audio data, wherein the first audio data is one path of audio data.
In the embodiment of the invention, at least two paths of original audio signals are acquired by the microphone array, so that better denoising is performed, and audio data really expected in the voice interaction process is obtained. The audio data obtained by denoising the original audio signal is easier to identify, and accordingly the accuracy of awakening identification can be improved.
According to an embodiment of the present invention, wherein writing the first audio data into the audio buffer to form the second audio data includes: and writing the first audio data into the audio buffer zone according to a time sequence segmentation and overflow covering mode to form second audio data.
In the embodiment of the invention, the first audio data is simultaneously sent to the wake-up module and the audio buffer, and the first audio data is segmented according to the time sequence, so that the operation can be conveniently carried out according to smaller units, such as recording or reading data, and the processing is easier. For example, the first audio data may be stored in 10ms per frame. The overflow override is to ensure that new data is written when more history data is saved.
According to an embodiment of the present invention, the performing wake-up recognition on the first audio data includes: judging whether a wake-up word exists in the first audio data judgment according to the wake-up model, if so, setting and determining the recognition result as wake-up, recording a write-in point for writing in second audio data in the audio cache region, and taking the write-in point as an obtained wake-up point; correspondingly, the sending of the second audio data in the audio buffer to the voice interaction device includes: and sending the audio data in a preset time period before the wake-up point and after the wake-up to the voice interaction equipment terminal.
In the embodiment of the invention, a wakeup point is set, the wakeup point is carried out while wakeup identification is carried out, once a wakeup word is detected, a write-in point for writing second audio data in an audio cache area is recorded immediately, the point is the wakeup point, the position is recorded, the audio data input by a user can be divided into a wakeup word part and a suspected problem part, and then the suspected problem part can be divided to be used as the input of a voice interaction program.
According to an embodiment of the present invention, sending the second audio data in the audio buffer to the voice interaction device includes: and sending all contents in the second audio data containing the awakening words in the audio cache region to the voice interaction equipment end for secondary awakening verification.
In the embodiment of the present invention, in addition to sending the suspected problem part after the wakeup word to the voice interaction device for voice interaction processing, all the contents of the second audio data including the wakeup word part may also be sent to the voice interaction device. In this way, the voice interaction device can also perform more complex secondary wake-up verification on the audio data. When the second wake-up check is carried out, whether the real intention of the user is wake-up or a false trigger containing a wake-up word is judged according to the context of the conversation. The processing of such contexts usually requires the establishment of more complex recognition models, such as convolutional neural network models, etc., requires greater processing and computational power, and usually requires processing to a dialog processing module sent to the voice interaction device. Through the secondary awakening verification, the bad experience that the existing program of the user is interrupted due to the fact that the voice interaction process is triggered by mistake can be greatly reduced.
Furthermore, the embodiment of the invention also provides a voice data processing device. As shown in fig. 2, the apparatus 20 includes: a wake-up recognition module 201, configured to perform wake-up recognition on the first audio data; the audio writing module 202 is configured to write the first audio data into an audio buffer to form second audio data; the identification result judging module 203 is used for judging the identification result; and the sending module 204 is configured to establish an audio transmission path with the voice interaction device end if the recognition result is wakeup, and send the second audio data in the audio buffer to the voice interaction device end.
According to an embodiment of the present invention, the apparatus 20 further comprises: the signal acquisition module is used for acquiring an original audio signal; and the signal processing module is used for carrying out signal processing on the original audio signal to obtain first audio data.
According to an embodiment of the present invention, the signal acquisition module includes: the microphone array unit is used for acquiring at least two paths of original audio signals through a microphone array; accordingly, the signal processing module comprises: the multi-path signal processing unit is used for carrying out signal processing on the at least two paths of original audio signals to obtain at least two paths of processing results; the signal merging unit is used for merging the at least two paths of processing results to obtain first audio data, and the first audio data is one path of audio data.
According to an embodiment of the present invention, the audio writing module 202 is specifically configured to: and writing the first audio data into the audio buffer zone according to a time sequence segmentation and overflow covering mode to form second audio data.
According to an embodiment of the present invention, the wake-up recognition module includes 201: the judging unit is used for judging whether a wakeup word exists according to the wakeup model and the first audio data; the identification result determining unit is used for determining the identification result as awakening if the identification result is positive; the wake-up point recording unit is used for recording a write-in point for writing second audio data in the audio cache region and taking the write-in point as an obtained wake-up point; correspondingly, the sending module is specifically configured to establish an audio transmission path with the voice interaction device end if the recognition result is wakeup, and send the second audio data in the audio buffer area to the voice interaction device end.
According to an embodiment of the present invention, the sending module 204 is specifically configured to send the second audio data in the audio buffer area, including the wakeup word, to the voice interaction device.
According to a third aspect of embodiments of the present invention, there is provided a speech data processing system, the system including: the voice data processing device is arranged on the Bluetooth headset and used for executing any one of the voice data processing methods; and the mobile equipment is connected with the Bluetooth headset and used for receiving the second audio data sent by the voice data processing device and processing the second audio data.
According to a fourth aspect of embodiments of the present invention, there is provided a computer storage medium comprising a set of computer executable instructions for performing any of the above-described speech data processing methods when the instructions are executed.
Here, it should be noted that: the above description of the embodiment of the voice data processing apparatus, the above description of the embodiment of the voice data processing system, and the above description of the embodiment of the computer storage medium are similar to the descriptions of the foregoing method embodiments, and have similar beneficial effects to the foregoing method embodiments, and therefore, the description thereof is omitted. For the technical details of the present invention that have not been disclosed yet in the description of the embodiment of the voice data processing apparatus, the description of the embodiment of the voice data processing system, and the description of the embodiment of the computer storage medium, please refer to the description of the foregoing method embodiment of the present invention for understanding, and therefore, the description is not repeated for brevity.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage medium, a Read Only Memory (ROM), a magnetic disk, and an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage medium, a ROM, a magnetic disk, an optical disk, or the like, which can store the program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A voice data processing method implemented by a voice data processing apparatus on a bluetooth headset, the method comprising:
awakening and identifying first audio data, and writing the first audio data into an audio cache region to form second audio data, wherein the result obtained by awakening and identifying is an identification result;
and judging the recognition result, if the recognition result is awaken, establishing an audio transmission path with a voice interaction equipment end, and sending the second audio data in the audio buffer area to the voice interaction equipment end.
2. The method of claim 1, wherein prior to the wake up identification of the first audio data, the method further comprises:
collecting an original audio signal;
and carrying out signal processing on the original audio signal to obtain first audio data.
3. The method of claim 2, wherein the capturing the original audio signal comprises:
acquiring at least two paths of original audio signals through a microphone array;
correspondingly, the signal processing of the original audio signal to obtain first audio data includes:
performing signal processing on the at least two paths of original audio signals to obtain at least two paths of processing results;
and combining the at least two paths of processing results to obtain first audio data, wherein the first audio data is one path of audio data.
4. The method of claim 1, wherein writing the first audio data into an audio buffer to form second audio data comprises:
and writing the first audio data into an audio buffer zone according to a time sequence segmentation and overflow covering mode to form second audio data.
5. The method of claim 4, wherein performing wake up recognition on the first audio data comprises:
judging whether a wakeup word exists in the first audio data according to a wakeup model, if so, determining a recognition result as wakeup, recording a write-in point written in the audio cache area and the second audio data as a wakeup point;
correspondingly, the sending the second audio data in the audio buffer to the voice interaction device includes:
and sending the audio data in the preset time period before the wake-up point and after wake-up to the voice interaction equipment terminal.
6. The method according to claim 1, wherein the sending the second audio data in the audio buffer to the voice interaction device side comprises:
and sending all contents in the second audio data containing the awakening words in the audio cache region to a voice interaction equipment end for secondary awakening verification.
7. A voice data processing device arranged on a Bluetooth headset, the device comprising:
the awakening identification module is used for awakening and identifying the first audio data;
the audio writing module is used for writing the first audio data into an audio buffer area to form second audio data;
the identification result judging module is used for judging the identification result;
and the sending module is used for establishing an audio transmission path with the voice interaction equipment terminal and sending the second audio data in the audio buffer area to the voice interaction equipment terminal if the identification result is awaken.
8. The apparatus of claim 7, further comprising:
the signal acquisition module is used for acquiring an original audio signal;
and the signal processing module is used for carrying out signal processing on the original audio signal to obtain first audio data.
9. A speech data processing system, characterized in that the system comprises:
a voice data processing device, disposed on a bluetooth headset, for performing the voice data processing method of any one of claims 1 to 6;
and the voice interaction equipment is connected with the Bluetooth headset and is used for receiving the second audio data sent by the voice data processing device and processing the second audio data.
10. A storage medium on which program instructions are stored, wherein the program instructions are operable when executed to perform a speech data processing method according to any one of claims 1 to 6.
CN201911293058.9A 2019-12-16 2019-12-16 Voice data processing method, device, system and storage medium Pending CN110992953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911293058.9A CN110992953A (en) 2019-12-16 2019-12-16 Voice data processing method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911293058.9A CN110992953A (en) 2019-12-16 2019-12-16 Voice data processing method, device, system and storage medium

Publications (1)

Publication Number Publication Date
CN110992953A true CN110992953A (en) 2020-04-10

Family

ID=70094009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911293058.9A Pending CN110992953A (en) 2019-12-16 2019-12-16 Voice data processing method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN110992953A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681675A (en) * 2020-06-03 2020-09-18 西安Tcl软件开发有限公司 Dynamic data transmission method, device, equipment and storage medium
CN112634897A (en) * 2020-12-31 2021-04-09 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN114363835A (en) * 2021-12-16 2022-04-15 四川腾盾科技有限公司 Automatic PTT method based on unmanned aerial vehicle data chain vocoded voice
CN111681675B (en) * 2020-06-03 2024-06-07 西安通立软件开发有限公司 Data dynamic transmission method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962240A (en) * 2018-06-14 2018-12-07 百度在线网络技术(北京)有限公司 A kind of sound control method and system based on earphone
CN109378000A (en) * 2018-12-19 2019-02-22 科大讯飞股份有限公司 Voice awakening method, device, system, equipment, server and storage medium
CN110097876A (en) * 2018-01-30 2019-08-06 阿里巴巴集团控股有限公司 Voice wakes up processing method and is waken up equipment
WO2019192250A1 (en) * 2018-04-04 2019-10-10 科大讯飞股份有限公司 Voice wake-up method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097876A (en) * 2018-01-30 2019-08-06 阿里巴巴集团控股有限公司 Voice wakes up processing method and is waken up equipment
WO2019192250A1 (en) * 2018-04-04 2019-10-10 科大讯飞股份有限公司 Voice wake-up method and apparatus
CN108962240A (en) * 2018-06-14 2018-12-07 百度在线网络技术(北京)有限公司 A kind of sound control method and system based on earphone
CN109378000A (en) * 2018-12-19 2019-02-22 科大讯飞股份有限公司 Voice awakening method, device, system, equipment, server and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681675A (en) * 2020-06-03 2020-09-18 西安Tcl软件开发有限公司 Dynamic data transmission method, device, equipment and storage medium
CN111681675B (en) * 2020-06-03 2024-06-07 西安通立软件开发有限公司 Data dynamic transmission method, device, equipment and storage medium
CN112634897A (en) * 2020-12-31 2021-04-09 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN114363835A (en) * 2021-12-16 2022-04-15 四川腾盾科技有限公司 Automatic PTT method based on unmanned aerial vehicle data chain vocoded voice

Similar Documents

Publication Publication Date Title
CN107240405B (en) Sound box and alarm method
CN108154140A (en) Voice awakening method, device, equipment and computer-readable medium based on lip reading
CN111933112A (en) Awakening voice determination method, device, equipment and medium
CN108986833A (en) Sound pick-up method, system, electronic equipment and storage medium based on microphone array
WO2022033556A1 (en) Electronic device and speech recognition method therefor, and medium
CN115312068B (en) Voice control method, equipment and storage medium
CN110968353A (en) Central processing unit awakening method and device, voice processor and user equipment
CN110992953A (en) Voice data processing method, device, system and storage medium
CN111524513A (en) Wearable device and voice transmission control method, device and medium thereof
CN111370004A (en) Man-machine interaction method, voice processing method and equipment
US20200051548A1 (en) Method for updating a speech recognition model, electronic device and storage medium
CN111028838A (en) Voice wake-up method, device and computer readable storage medium
CN111739515B (en) Speech recognition method, equipment, electronic equipment, server and related system
CN115132212A (en) Voice control method and device
CN110197663B (en) Control method and device and electronic equipment
CN114360546A (en) Electronic equipment and awakening method thereof
WO2023124248A1 (en) Voiceprint recognition method and apparatus
CN109922397A (en) Audio intelligent processing method, storage medium, intelligent terminal and smart bluetooth earphone
CN114121042A (en) Voice detection method and device under wake-up-free scene and electronic equipment
CN111028846B (en) Method and device for registration of wake-up-free words
CN110941455B (en) Active wake-up method and device and electronic equipment
CN114333817A (en) Remote controller and remote controller voice recognition method
CN110083392A (en) Audio wakes up method, storage medium, terminal and its bluetooth headset pre-recorded
CN116030817B (en) Voice wakeup method, equipment and storage medium
CN115331672B (en) Device control method, device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Building 14, Tengfei science and Technology Park, 388 Xinping street, Suzhou Industrial Park, Suzhou area, China (Jiangsu) pilot Free Trade Zone, Suzhou, Jiangsu 215000

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215024 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410