WO2019104889A1 - 一种声音处理系统、方法及声音识别装置和声音接收装置 - Google Patents
一种声音处理系统、方法及声音识别装置和声音接收装置 Download PDFInfo
- Publication number
- WO2019104889A1 WO2019104889A1 PCT/CN2018/077237 CN2018077237W WO2019104889A1 WO 2019104889 A1 WO2019104889 A1 WO 2019104889A1 CN 2018077237 W CN2018077237 W CN 2018077237W WO 2019104889 A1 WO2019104889 A1 WO 2019104889A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- audio file
- instruction
- voice
- code
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012545 processing Methods 0.000 title claims abstract description 47
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 12
- 238000003672 processing method Methods 0.000 claims description 22
- 230000001755 vocal effect Effects 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 20
- 230000004044 response Effects 0.000 claims description 14
- 230000005236 sound signal Effects 0.000 claims description 6
- 230000001131 transforming effect Effects 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 abstract description 5
- 238000012546 transfer Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 230000011664 signaling Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B11/00—Transmission systems employing sonic, ultrasonic or infrasonic waves
Definitions
- the present invention relates to the field of Internet communications, and in particular, to a sound processing system, method, and voice recognition device and sound receiving device.
- the technical problem to be solved by the present invention is to provide a sound processing system, method, voice recognition device and sound receiving device, which enhance the limitation of existing interactive applications by means of voice transmission information and control signaling.
- a technical solution adopted by the present invention is: a sound processing system, the system comprising: a voice recognition device, for receiving an instruction or a voice of a voice, identifying the input content to generate a corresponding control instruction Processing the control instruction and synthesizing with the first audio file to obtain a second audio file containing the corresponding vocoding, and transmitting the second audio file; and the sound receiving device is configured to receive the second audio file, Detecting whether the second audio file includes a sound wave signal, parsing the second audio file to obtain a corresponding sound code when confirming that the sound wave signal is included, and decoding the sound code to obtain corresponding data information.
- the sound recognition device includes: an analysis unit, configured to: identify an input content to generate the control instruction when receiving an instruction or a voice of a voice; and an encoding unit, configured to encode the control instruction to generate a corresponding sound a first conversion unit, configured to perform a Fourier forward transform on the vocode to obtain an acoustic signal, and an audio processing unit, configured to synthesize the acoustic signal with the first audio file to obtain the a second audio file of the vocoding; wherein the first audio file is a high frequency file; and a sending unit is configured to send the second audio file generated by the audio processing unit.
- the sound receiving device includes: a receiving unit, configured to receive the second audio file sent by the voice recognition device; and a detecting unit, configured to analyze and detect whether the second audio file includes an acoustic wave signal; a second transforming unit, configured to parse the second audio file to obtain a corresponding acoustic wave signal when the detecting unit confirms that the sound wave signal is included, and perform Fourier inverse transform on the sound wave signal to obtain a corresponding sound a decoding unit configured to decode the vocode to obtain corresponding data information.
- the sound receiving device further includes: an instruction processing unit, configured to determine whether the data information generated by the decoding unit is basic information or extended information: when determining that the data information is basic information, playing or displaying the The content of the data information; when it is determined that the data information is the extended information, accessing the corresponding address, and executing a corresponding instruction; wherein the basic information includes at least the instruction or the vocal voice content, and the extended information includes at least the webpage link address , execution instructions, instruction links.
- an instruction processing unit configured to determine whether the data information generated by the decoding unit is basic information or extended information: when determining that the data information is basic information, playing or displaying the The content of the data information; when it is determined that the data information is the extended information, accessing the corresponding address, and executing a corresponding instruction; wherein the basic information includes at least the instruction or the vocal voice content, and the extended information includes at least the webpage link address , execution instructions, instruction links.
- the system further includes a server; the instruction processing unit determines that the data information is extension information, and sends corresponding instruction information to the server; wherein the instruction information is an access instruction or a webpage link address;
- the server is configured to respond to the instruction information to perform a corresponding function or invoke a corresponding webpage to obtain a corresponding extended application;
- the sound receiving apparatus is further configured to receive an execution result of the server in response to the instruction information.
- another technical solution adopted by the present invention is to provide a sound processing method, the method comprising: when the voice recognition device receives an instruction or a voice of a voice, identifying the input content to generate a corresponding control instruction, Processing the control instruction and synthesizing with the first audio file to obtain a second audio file containing the corresponding vocoding, and transmitting the second audio file; and the sound receiving device receiving the second audio file, detecting the Whether the second audio file includes a sound wave signal, and parsing the second audio file to obtain a corresponding sound code when confirming that the sound wave signal is included, and decoding the sound code to obtain corresponding data information.
- the process of processing the control command and synthesizing with the first audio file to obtain a second audio file including the corresponding vocoding includes: encoding the control instruction to generate a corresponding vocoding; Performing a Fourier forward transform to obtain an acoustic signal; and synthesizing the acoustic signal with the first audio file to obtain a second audio file containing the vocoding; wherein the first audio file is a high frequency file.
- the method for parsing the second audio file to obtain the corresponding vocode when the sound signal is included includes: when the detecting unit confirms that the sound wave signal is included, parsing the second audio file to obtain a corresponding
- the acoustic signal is subjected to Fourier inverse transform of the acoustic signal to obtain a corresponding vocoding.
- the method further includes: the sound receiving device determining whether the data information is basic information or extended information: wherein the basic information includes at least instruction or vocal voice content, and the extended information includes at least a webpage link address, Executing an instruction, an instruction link; when determining that the data information is basic information, the sound receiving device plays or displays content of the data information; when determining that the data information is extended information, the sound receiving device is The server sends a corresponding access instruction; the server responds to the access instruction to execute a corresponding function or invokes a corresponding webpage, and transmits a corresponding execution result to the sound receiving apparatus.
- a voice recognition device comprising: an analysis unit, for receiving an instruction or a voice of a voice, identifying an input content to generate the control
- An encoding unit configured to encode the control instruction to generate a corresponding vocoding code
- a first converting unit configured to perform Fourier forward transform on the vocoding to obtain an acoustic signal
- an audio processing unit configured to perform Fourier forward transform on the vocoding to obtain an acoustic signal.
- the first audio file is a high frequency file
- a sending unit configured to send the audio processing And generating, by the unit, the second audio file to a sound receiving device, so that the sound receiving device identifies data information corresponding to the sound code included in the second audio file.
- a sound receiving device comprising: a receiving unit, configured to receive a second audio file sent by a voice recognition device;
- the second audio file is a file containing the corresponding sound code generated by the voice recognition device according to the received command or the voice of the voice;
- the detecting unit is configured to analyze and detect whether the sound signal is included in the second audio file;
- a transforming unit configured to confirm, on the detecting unit, an acoustic wave signal, parse the second audio file to obtain an acoustic wave signal, perform Fourier inverse transform on the acoustic wave signal to obtain a corresponding sound code; and decode the unit And for decoding the vocoding to obtain corresponding data information.
- another technical solution adopted by the present invention is to provide a sound processing method, the method comprising: when receiving an instruction or a voice of a voice, identifying an input content to generate a corresponding control instruction; Controlling instructions to encode to generate a corresponding vocoding; performing a Fourier forward transform on the vocode to obtain an acoustic signal; synthesizing the acoustic signal with the first audio file to obtain a vowel containing the vocode a second audio file; wherein the first audio file is a high frequency file; and the second audio file is sent to enable a sound receiving device to identify data information corresponding to the sound code included in the second audio file.
- another technical solution adopted by the present invention is to provide a sound processing method, the method comprising: receiving a second audio file sent by a voice recognition device, and detecting whether the second audio file includes sound waves a signal; wherein the second audio file is a file containing the corresponding sound code generated by the voice recognition device according to the received command or vocal voice; and when the voice signal is confirmed to be included, the second audio file is Performing analysis to obtain a corresponding acoustic signal; performing Fourier inverse transform on the acoustic signal to obtain a corresponding vocoding; and decoding the vocoding to obtain corresponding data information.
- the method further includes: determining whether the data information is basic information or extended information; wherein the basic information includes at least instruction or vocal voice content, and the extended information includes at least a webpage link address, an execution instruction, and an instruction link.
- the data information is basic information, playing or displaying content of the data information;
- the instruction executes the corresponding function or calls the corresponding web page and feeds back the corresponding execution result.
- the voice recognition device audibly encodes the received command or the voice of the voice, and outputs the voice into the high-frequency voice file, so that the voice receiving device can recognize the sound contained in the voice sound device when receiving the voice file.
- the code is decoded to obtain corresponding information or instructions, and the instructions and information are transmitted through the high-frequency sound file to avoid interference from other factors.
- FIG. 1 is a schematic structural view of a sound processing system in a first embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a voice recognition device in an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a sound receiving device in an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a sound processing system in a second embodiment of the present invention.
- FIG. 5 is a schematic flow chart of a sound processing method in a first embodiment of the present invention.
- FIG. 6 is a schematic flow chart of a sound processing method in a second embodiment of the present invention.
- FIG. 7 is a schematic flow chart of a sound processing method in a third embodiment of the present invention.
- FIG. 8 is a schematic flow chart of a sound processing method in a fourth embodiment of the present invention.
- FIG. 9 is a schematic flow chart of a music processing method in a fifth embodiment of the present invention.
- FIG. 10 is a schematic flowchart diagram of a music processing method in a sixth embodiment of the present invention.
- FIG. 11 is a flow chart showing a music processing method in a seventh embodiment of the present invention.
- Artificial intelligence a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that responds in a manner similar to human intelligence. Research in this area includes robotics, speech recognition, image recognition, and nature. Language processing and expert systems.
- Intelligent recognition Based on artificial intelligence calculation, the input content is transformed, identified, analyzed and judged by human intelligence, and the output result is generated.
- Internal control command A command signal within a computer or computer device that is intended to trigger a function of the device.
- External control signaling External communication signals established in common communication protocols, which can be received by other receiving devices, such as WIFI-based control signaling, Bluetooth-based control signaling, iBeacon-based control signaling, or other electromagnetic wave transmission. Control signaling.
- Trigger response content Output correspondence based on intelligent recognition of input content, such as text, sound, voice, image, web page link, service link, control signal, device internal control command, or external control signaling.
- Audio The frequency of the sound.
- Acoustic signal The original signal that can be parsed by the received and parsed audio.
- High-frequency sound More than most normal human ears can hear the sound of audio.
- Vocal code The acoustic signal is subjected to Fourier inverse transform to obtain meaningful data.
- Audio document A document or memory format that can play or record sound, such as wav, mp3 documents.
- FIG. 1 is a schematic structural diagram of a sound processing system according to an embodiment of the present invention.
- the system 100 includes a voice recognition device 10 and a voice receiving device 20.
- the system 100 includes a voice recognition device 10 and a voice receiving device 20, that is, the voice recognition device 10 and the voice receiving device 20 are in a one-to-one network connection relationship.
- the voice recognition device 10 is configured to recognize the input content to generate a corresponding control instruction when receiving the command or the voice of the voice, and process the control command and synthesize with the first audio file to obtain a second audio file including the corresponding voice code. And send the second audio file.
- the sound receiving device 20 is configured to receive the second audio file, detect whether the second audio file includes an acoustic wave signal, parse the second audio file when the sound wave signal is included to obtain a corresponding sound code, and obtain the corresponding sound code.
- the code is decoded to obtain corresponding data information.
- FIG. 2 is a schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention.
- the voice recognition device 10 may be a smart mobile device, a computer, or the like, having a microphone, for acquiring audio information such as voice, music, and the like.
- the voice recognition device 10 includes an analysis unit 11, an encoding unit 12, a first conversion unit 13, an audio processing unit 14, and a transmission unit 15.
- the analyzing unit 11 is configured to recognize the input content to generate the control instruction when receiving the instruction or the vocal voice.
- the command may be a corresponding operation command generated by the voice recognition device 10 in response to a user operation, or may be an operation command, a control command, and the like sent by the voice recognition device 10 from other terminals and devices.
- the encoding unit 12 is configured to encode the control command to generate a corresponding vocoding.
- the first converting unit 13 is configured to perform Fourier forward transform on the vocoding generated by the encoding unit 12 to obtain an acoustic signal.
- the audio processing unit 14 is configured to synthesize the sound wave signal generated by the first converting unit 13 with the first audio file to obtain a second audio file containing the sound code.
- the first audio file is a high frequency file.
- the audio processing unit 14 synthesizes the acoustic signal with the first audio file in a plurality of consecutive intervals to form a second audio file. Since the human ear cannot hear the acoustic signal above a certain frequency range, the first audio file is a high frequency file. When an audio file is transmitted, since people cannot hear the high-frequency file carrying the vocoding, there is no sound at all, so the audio transmission does not affect the user or the environment.
- the transmitting unit 15 is configured to send the second audio file generated by the audio processing unit 14.
- the voice recognition device 20 may be a smart mobile device, a computer or the like having a microphone for acquiring an audio file.
- the sound receiving device 20 includes a receiving unit 21, a detecting unit 22, a second transform unit 23, and a decoding unit 24.
- the receiving unit 21 is for receiving a second audio file transmitted by the voice recognition device 10.
- the detecting unit 22 is configured to analyze and detect whether the sound signal is included in the second audio file. Specifically, the detecting unit 22 determines whether the sound signal is included in the second audio file by performing spectrum analysis on the second audio file.
- the second transforming unit 23 is configured to parse the second audio file to obtain a corresponding sound wave signal when the detecting unit 22 confirms that the sound wave signal is included, and perform Fourier inverse transform on the sound wave signal to obtain a corresponding sound code. .
- the decoding unit 24 is configured to decode the voice code generated by the second transform unit 23 to obtain corresponding data information.
- the data information includes basic information of the control command and extension information.
- the basic information includes at least the content of the instruction or the vocal voice, and the content of the vocal voice is “open *** shopping webpage”.
- the associated information includes at least: a webpage link address, an execution instruction, and an instruction link.
- the web link address is the address of the "*** Shopping Page”.
- the voice recognition device 10 further includes various functional units of the voice receiving device 20, the sound receiving device 20 further including various functional units of the voice recognition device 10, such that the voice recognition device 10 receives the recognition process
- the received audio file may be subjected to recognition processing to obtain corresponding data information.
- the sound receiving device 20 recognizes and processes the received audio file to obtain corresponding data information.
- the received command or vocal voice can be identified and processed to form an audio file carrying the vocoding.
- the specific working principle is as described above, and will not be described here.
- the sound receiving device 20 further includes an instruction processing unit 25 for determining the type of data information generated by the decoding unit 24, and executing a corresponding instruction according to the determination result.
- the data information includes basic information and extended information.
- the instruction processing unit 25 determines whether the generated data information is basic information or extended information:
- the sound receiving device 20 plays or displays the content of the data information
- the sound receiving device 20 accesses the corresponding address and executes the corresponding instruction. For example, when the data information is a webpage link address, the sound receiving device 20 accesses the corresponding webpage through the webpage link address.
- FIG. 4 is a schematic structural diagram of a sound processing system according to a second embodiment of the present invention.
- the system 300 includes a voice recognition device 31, a voice receiving device 32, and a server 33.
- the instruction information is an access instruction or a webpage link address.
- the server 33 is configured to respond to the instruction information to perform a corresponding function or call a corresponding web page to obtain an extended application.
- server 33 is further configured to feed back the execution result to the sound receiving device 32.
- the server 33 is a web server.
- the voice recognition device 31 When the voice recognition device 31 receives the instruction of “opening the *** shopping webpage” or the vocal voice, the input content is recognized to generate a corresponding “open *** shopping webpage” control instruction, and the control instruction is as described above. After processing, a second audio file containing the corresponding vocoding is obtained, and the second audio file is sent.
- the sound receiving device 32 After receiving the second audio file, the sound receiving device 32 detects that the audio file includes an acoustic wave signal, and parses and processes the second audio file to obtain corresponding data information, where the data information includes a link address of the shopping webpage. And a control instruction to open the link address. Then, the sound receiving device 32 transmits a request to access the *** shopping web page to the server 33 in response to the control command.
- the server 33 invokes the contents of the *** shopping web page in response to the access request, so that the sound receiving device 32 can display the contents of the *** shopping web page.
- the voice recognition device 31 receives the "purchase**" command or the vocal voice, the input content is recognized to generate a corresponding "purchase**" control command, and the control command is processed as described above.
- a second audio file containing the corresponding vocoding and transmitting the second audio file.
- the sound receiving device 32 After receiving the second audio file, the sound receiving device 32 detects that the audio file contains the sound wave signal, and parses and processes the second audio file to obtain corresponding data information, where the data information includes a purchase ** control command. Then, the sound receiving device 32 transmits a request for purchase ** to the server 33 in response to the control command.
- the server 33 performs corresponding processing on the data held by the server in response to the request to perform the function of being purchased, that is, completing the network order.
- the voice recognition device 31 When the voice recognition device 31 receives the command of “enter password ****” or the voice of the voice, the voice recognition device 31 recognizes the input content to generate a corresponding control command of “enter password ****”, and performs the control command.
- the second audio file containing the corresponding vocoding is obtained after the processing as described above, and the second audio file is transmitted.
- the sound receiving device 32 After receiving the second audio file, the sound receiving device 32 detects that the audio file contains the sound wave signal, and parses and processes the second audio file to obtain corresponding data information, where the data information includes a control command for inputting the password and the password is ******. Then, the sound receiving device 32 transmits an instruction to input a password to the server 33 in response to the control command, receives a payment instruction link fed back by the server 33, and inputs a password ****** to complete the payment.
- Application scenario 2 In this application scenario, the server 33 is a banking system server.
- the voice recognition device 31 When the voice recognition device 31 receives the instruction of "transfer ** yuan to **" or vocal voice, the input content is recognized to generate a corresponding "transfer ** yuan to **" control command, and the control command is as described above. After the processing, a second audio file containing the corresponding vocoding is obtained, and the second audio file is sent.
- the sound receiving device 32 After receiving the second audio file, the sound receiving device 32 detects that the audio file includes an acoustic wave signal, and parses and processes the second audio file to obtain corresponding data information, where the data information includes a transfer instruction, a transfer amount, and a transfer object. . Then, the sound receiving device 32 transmits the request to the server 33 in response to the transfer order.
- the server 33 performs corresponding processing on the data held by the server in response to the request to perform the transfer, that is, completing the filling of the electronic bank transfer information, and feeding back the corresponding transfer confirmation page to the sound receiving device 32.
- the voice recognition device 31 recognizes the input content to generate a corresponding "input password ****" control command when receiving the "input password ****” command or the vocal voice, and the control command is as described above. After processing, a second audio file containing the corresponding vocoding is obtained, and the second audio file is sent.
- the sound receiving device 32 After receiving the second audio file, the sound receiving device 32 detects that the audio file contains an acoustic wave signal, and parses and processes the second audio file to obtain corresponding data information, where the data information includes an input password and a password content. Then, the sound receiving device 32 transmits the request to the server 33 in response to the input password command, so that the server 33 completes the input of the transfer password and performs the transfer.
- the voice recognition device 31 When the voice recognition device 31 receives the "receive **PPT" command or the vocal voice, the input content is recognized to generate a corresponding "receive **PPT" control command, and the control command is processed as described above to obtain the Corresponding to the second audio file of the vocoding and transmitting the second audio file.
- the sound receiving device 32 After receiving the second audio file, the sound receiving device 32 detects that the audio file contains an acoustic wave signal, and parses and processes the second audio file to obtain corresponding data information, where the data information includes receiving the PPT and the PPT file. Then, the sound receiving device 32 downloads and receives the **PPT in response to the instruction, and completes the sharing of the file.
- the system 100 may further include a voice recognition device 10 and a plurality of voice receiving devices 20, that is, the voice recognition device 10 and the voice receiving device 20 are in a one-to-many network connection relationship.
- the working principle is the same and will not be described here.
- system 100 may further include a plurality of voice recognition devices 10 and a plurality of voice receiving devices 20, that is, the voice recognition device 10 and the voice receiving device 20 are in a many-to-many network connection relationship.
- the working principle is the same and will not be described here.
- the system 100 may further include a voice recognition device 10, a plurality of voice receiving devices 20, and a server, that is, the voice recognition device 10 and the voice receiving device 20 have a one-to-many network connection relationship, and sound receiving The device 20 and the server are in a many-to-one network connection relationship.
- the working principle is the same and will not be described here.
- the system 100 may further include a voice recognition device 10, a plurality of voice receiving devices 20, and a plurality of servers, that is, the voice recognition device 10 and the voice receiving device 20 have a one-to-many network connection relationship.
- Servers can be the same server or different servers.
- each of the voice receiving devices 20 can be in communication with one or more servers. The working principle is the same and will not be described here.
- FIG. 5 is a schematic flowchart of a sound processing method according to a first embodiment of the present invention.
- the method shown in this embodiment is applied to the sound processing system as described above.
- the method includes:
- Step S50 when the voice recognition device receives the command or the voice of the voice, the input content is recognized to generate a corresponding control command, and the control command is processed and synthesized with the first audio file to obtain a second audio file including the corresponding voice code. And sending the second audio file.
- Step S501 when receiving an instruction or a voice of a voice, identifying the input content to generate a corresponding control instruction;
- Step S502 encoding the control instruction to generate a corresponding vocoding code
- step S503 the vocoding is subjected to Fourier forward transform to obtain an acoustic signal.
- Step S504 synthesizing the sound wave signal with the first audio file to obtain a second audio file including the sound code; wherein the first audio file is a high frequency file.
- Step S51 the sound receiving device receives the second audio file, detects whether the second audio file contains an acoustic wave signal, parses the second audio file when the sound wave signal is included to obtain a corresponding sound code, and obtains the corresponding sound code. Decode to get the corresponding data information.
- Step S511 receiving the second audio file, detecting whether the second audio file includes an acoustic wave signal; if yes, proceeding to step S512; otherwise, the process ends.
- Step S512 parsing the second audio file to obtain a corresponding sound wave signal
- Step S513 performing a Fourier inverse transform on the acoustic signal to obtain a corresponding vocoding code
- Step S514 decoding the voice code to obtain corresponding data information.
- the data information includes at least basic information of the control instruction and extended information; the basic information includes at least the instruction or the vocal voice content, and the extended information includes at least a webpage link address, an execution instruction, and an instruction link.
- FIG. 8 is a schematic flowchart of a sound processing method according to a fourth embodiment of the present invention. After the sound code is decoded to obtain corresponding data information, the method further includes:
- step S63 the sound receiving device determines whether the data information is basic information or extended information: if it is basic information, the process proceeds to step S64; if the information is extended, the process proceeds to step S65.
- the basic information includes at least an instruction or a vocal voice content
- the extended information includes at least a webpage link address, an execution instruction, and an instruction link.
- Step S64 when it is determined that the data information is basic information, the sound receiving device plays or displays the content of the data information; then, the flow ends.
- Step S65 when it is determined that the data information is extension information, the sound receiving device sends a corresponding access instruction to a server.
- Step S66 The server responds to the access instruction to execute a corresponding function or invoke a corresponding webpage, and sends a corresponding execution result to the sound receiving apparatus. Then the process ends.
- FIG. 9 is a schematic flowchart of a music processing method according to a fifth embodiment of the present invention. The method shown in the embodiment is applied to the voice recognition apparatus as described above, and includes:
- Step S70 When receiving an instruction or a voice of a voice, identifying the input content to generate a corresponding control instruction;
- Step S71 encoding the control instruction to generate a corresponding vocoding code
- Step S72 performing a Fourier forward transform on the vocoding to obtain an acoustic signal
- Step S73 synthesizing the sound wave signal with the first audio file to obtain a second audio file including the sound code; wherein the first audio file is a high frequency file;
- Step S74 the second audio file is sent, so that a sound receiving device identifies the data information corresponding to the sound code included in the second audio file.
- FIG. 10 is a schematic flowchart of a sound processing method according to a sixth embodiment of the present invention. The method shown in the embodiment is applied to the sound receiving apparatus as described above, and includes:
- Step S81 receiving a second audio file, and determining whether the second audio file includes an acoustic signal. If yes, go to step S82; otherwise, the flow ends.
- the second audio file is a file that carries a corresponding voice code generated by a voice recognition device according to the received command or voice.
- Step S82 parsing the second audio file to obtain a corresponding sound wave signal
- Step S83 performing Fourier transform on the acoustic signal to obtain a corresponding vocoding code
- Step S84 decoding the sound code to obtain corresponding data information.
- FIG. 11 is a schematic flowchart diagram of a sound processing method according to a seventh embodiment of the present invention. The method illustrated in the embodiment is applied to the sound receiving apparatus as described above, and includes:
- Step S91 receiving a second audio file, and determining whether the second audio file includes an acoustic signal. If yes, go to step S92; otherwise, go to step S95.
- the second audio file is a file that the voice recognition device generates to generate the corresponding voice code according to the received command or the voice of the voice.
- Step S92 parsing the second audio file to obtain a corresponding sound wave signal
- Step S93 performing Fourier transform on the acoustic signal to obtain a corresponding vocoding code
- Step S94 decoding the sound code to obtain corresponding data information.
- step S95 it is determined whether the data information is basic information or extended information: if it is basic information, the process proceeds to step S96; if it is extended information, the process proceeds to step S97.
- the basic information includes at least an instruction or a vocal voice content
- the extended information includes at least a webpage link address, an execution instruction, and an instruction link.
- Step S96 when the content of the data information is played or displayed; then, the process ends.
- Step S97 Send a corresponding access instruction to a server, so that the server responds to the access instruction to execute a corresponding function or invoke a corresponding webpage, and feed back a corresponding execution result. Then the process ends.
- the voice recognition device audibly encodes the received command or the voice of the voice, and outputs the voice into the high-frequency voice file, so that the voice receiving device can recognize the content when receiving the high-frequency voice file.
- the vocoding is decoded to obtain corresponding information or instructions, and the instructions and information are transmitted through the high-frequency sound file to avoid interference from other factors.
- the disclosed system, terminal and method may be implemented in other manners.
- the terminal embodiment described above is schematic, and the division of the unit is a logical function division, and the actual implementation may have another division manner.
- the units described as separate components may or may not be physically separate, ie may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- all or part of the technical solution of the present invention may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a computer device (which may be a personal computer, The management server, or network device, etc. or processor, performs all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (English: read-only memory, abbreviation: ROM), a random access memory (English: Random Access Memory, abbreviation: RAM), a magnetic disk or an optical disk, and the like.
- a U disk a mobile hard disk
- a read only memory English: read-only memory, abbreviation: ROM
- a random access memory English: Random Access Memory, abbreviation: RAM
- magnetic disk or an optical disk and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
本发明公开了一种声音处理系统、方法及声音识别装置和声音接收装置,其中,该系统包括:声音识别装置,用于接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送所述第二音频文件;声音接收装置,用于接收所述第二音频文件,检测所述第二音频文件是否包含声波信号,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,并对所述声码进行解码以得到对应的数据信息。通过上述方式,实现通过高频声音文件进行指令、信息的传输,避免其他因素的干扰。
Description
本发明涉及互联网通信领域,特别涉及一种声音处理系统、方法及声音识别装置和声音接收装置。
目前,在不同的智能设备间彼此对外交互方式,常见的交互方式包括:基于WIFI的控制信令、基于蓝牙的控制信令、基于iBeacon的控制信令。但是,这三种对外交互方式,存在以下不足之处:
1.进行外部控制时,发送硬件与接收硬件成本较高;
2.同时交互连线的设备数量有一定限制;
3.无法有效定位于密闭空间中的接收设备、信号溢出;
4.当发送或接收设备功能模组故障或弱网环境时,安全保障易发生问题。
【发明内容】
本发明主要解决的技术问题是提供一种声音处理系统、方法及声音识别装置和声音接收装置,通过声音传递信息与控制信令的方式,提升既有交互应用的限制。
为解决上述技术问题,本发明采用的一个技术方案是:一种声音处理系统,所述系统包括:声音识别装置,用于接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送所述第二音频文件;声音接收装置,用于接收所述第二音频文件,检测所述第二音频文件是否包含声波信号,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,并对所述声码进行解码以得到对应的数据信息。
其中,所述声音识别装置包括:分析单元,用于接收到指令或人声语音时,识别输入内容以产生所述控制指令;编码单元,用于将所述控制指令进行编码以产生对应的声码;第一转换单元,用于将所述声码进行傅里叶正向变换以得到声波信号;音频处理单元,用于将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件;发送单元,用于发送所述音频处理单元生成的所述第二音频文件。
其中,所述声音接收装置包括:接收单元,用于接收由所述声音识别装置发送的所述第二音频文件;检测单元,用于分析并检测所述第二音频文件中是否包含声波信号;第二变换单元,用于在所述检测单元确认包含声波信号时对所述第二音频文件进行解析以得到对应的声波信号,并对所述声波信号进行傅里叶逆向变换以得到对应的声码;解码单元,用于对所述声码进行解码以得到对应的数据信息。
其中,所述声音接收装置还包括:指令处理单元,用于判断所述解码单元产生的所述数据信息为基本信息还是延伸信息:当确定所述数据信息为基本信息时,播放或显示所述数据信息的内容;当确定所述数据信息为延伸信息时,访问对应的地址,执行对应的指令;其中,所述基本信息至少包括指令或人声语音内容,所述延伸信息至少包括网页链接地址、执行指令、指令链接。
其中,所述系统还包括服务器;所述指令处理单元确定所述数据信息为延伸信息时,向所述服务器发送对应的指令信息;其中,所述指令信息为访问指令或网页链接地址;所述服务器用于响应所述指令信息以执行相应的功能或调用对应网页,以获取对应的延伸应用;所述声音接收装置还用于接收所述服务器响应所述指令信息的执行结果。
为解决上述技术问题,本发明采用的另一个技术方案是:提供一种声音处理方法,所述方法包括:声音识别装置接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送所述第二音频文件;以及声音接收 装置接收所述第二音频文件,检测所述第二音频文件是否包含声波信号,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,并对所述声码进行解码以得到对应的数据信息。
其中,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,具体包括:将所述控制指令进行编码以产生对应的声码;将所述声码进行傅里叶正向变换以得到声波信号;以及将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件。
其中,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,具体包括:在所述检测单元确认包含声波信号时,对所述第二音频文件进行解析以得到对应的声波信号,并对所述声波信号进行傅里叶逆向变换以得到对应的声码。
其中,所述方法还包括:所述声音接收装置判断所述数据信息为基本信息还是延伸信息:其中,所述基本信息至少包括指令或人声语音内容,所述延伸信息至少包括网页链接地址、执行指令、指令链接;当确定所述数据信息为基本信息时,所述声音接收装置播放或显示所述数据信息的内容;当确定所述数据信息为延伸信息时,所述声音接收装置向一服务器发送对应的访问指令;所述服务器响应所述访问指令以执行相应的功能或调用对应网页,并向所述声音接收装置发送对应的执行结果。
为解决上述技术问题,本发明采用的另一个技术方案是:提供一种声音识别装置,所述装置包括:分析单元,用于接收到指令或人声语音时,识别输入内容以产生所述控制指令;编码单元,用于将所述控制指令进行编码以产生对应的声码;第一转换单元,用于将所述声码进行傅里叶正向变换以得到声波信号;音频处理单元,用于将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件;发送单元,用于发送所述音频处理单元生成的所述第二音频文件至一声音接收装置,使所 述声音接收装置识别所述第二音频文件包含的声码所对应的数据信息。
为解决上述技术问题,本发明采用的另一个技术方案是:提供一种声音接收装置,所述装置包括:接收单元,用于接收一声音识别装置发送的第二音频文件;其中,所述第二音频文件为所述声音识别装置根据接收到的指令或人声语音而生成的包含对应声码的文件;检测单元,用于分析并检测所述第二音频文件中是否包含声波信号;第二变换单元,用于在所述检测单元确认包含声波信号,对所述第二音频文件进行解析得到声波信号,并对所述声波信号进行傅里叶逆向变换以得到对应的声码;以及解码单元,用于对所述声码进行解码以得到对应的数据信息。
为解决上述技术问题,本发明采用的另一个技术方案是:提供一种声音处理方法,所述方法包括:接收到指令或人声语音时,识别输入内容以产生对应的控制指令;将所述控制指令进行编码以产生对应的声码;将所述声码进行傅里叶正向变换以得到声波信号;将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件;以及发送所述第二音频文件,使一声音接收装置识别所述第二音频文件包含的声码所对应的数据信息。
为解决上述技术问题,本发明采用的另一个技术方案是:提供一种声音处理方法,所述方法包括:接收一声音识别装置发送的第二音频文件,检测所述第二音频文件是否包含声波信号;其中,所述第二音频文件为所述声音识别装置根据接收到的指令或人声语音而生成的包含对应声码的文件;在确认包含声波信号时对,对所述第二音频文件进行解析以得到对应的声波信号;对所述声波信号进行傅里叶逆向变换以得到对应的声码;以及对所述声码进行解码以得到对应的数据信息。
其中,所述方法还包括:判断所述数据信息为基本信息还是延伸信息;其中,所述基本信息至少包括指令或人声语音内容,所述延伸信息至少包括网页链接地址、执行指令、指令链接;当确定所述数据信息为基本信息时,播放或 显示所述数据信息的内容;当确定所述数据信息为延伸信息时,向一服务器发送对应的访问指令,使所述服务器响应所述访问指令执行相应的功能或调用对应网页以及反馈对应的执行结果。
以上方案中,声音识别装置对接收到的指令,或人声语音进行声音编码,并嵌入高频声音文件中输出,使得声音接收装置能够在接收到该高频声音文件时识别出其包含的声码,从而进行解码以得到对应的信息或指令,实现通过高频声音文件进行指令、信息的传输,避免其他因素的干扰。
图1是本发明第一实施方式中的一种声音处理系统的结构示意图;
图2是本发明实施方式中的声音识别装置的结构示意图;
图3是本发明实施方式中的声音接收装置的结构示意图;
图4是本发明第二实施方式中的一种声音处理系统的结构示意图;
图5是本发明第一实施方式中的一种声音处理方法的流程示意图;
图6是本发明第二实施方式中的一种声音处理方法的流程示意图;
图7是本发明第三实施方式中的一种声音处理方法的流程示意图;
图8是本发明第四实施方式中的一种声音处理方法的流程示意图;
图9是本发明第五实施方式中的一种音乐处理方法的流程示意图;
图10是本发明第六实施方式中的一种音乐处理方法的流程示意图;
图11是本发明第七实施方式中的一种音乐处理方法的流程示意图。
首先对本发明实施方式所需引用的现有技术名词进行解释。
人工智能:是计算机科学的一个分支,企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。
智能识别:基于人工智能运算,对输入的内容进行以人类智能相似的转换、识别、分析、判断,所产生的输出结果。
内部控制指令:计算机或计算机装置内的指令信号,目的是触发装置的某项功能。
外部控制信令:建立于常见通信协议的外部通信讯号,可以由其它接收装置接收,例如基于WIFI的控制信令、基于蓝牙的控制信令、基于iBeacon的控制信令、或其它以电磁波形式发送的控制信令。
触发响应内容:基于智能识别输入内容所做出的输出对应,例如文字、声音、语音、图像、网页链接、服务链接、控制信号、装置内部控制指令,或外部控制信令等。
音频:声音的频率。
声波信号:可以被接收并解析的音频所解析出的原始信号。
高频声音:超过大部份正常人耳可以听到音频的声音。
声码:声波信号进行傅里叶逆向变换后得到有意义的数据。
音频文档:可以播放或记录声音的文档或记忆格式,例如wav、mp3文档。
为详细说明本发明的技术内容、构造特征、所实现目的及效果,以下结合附图和实施例对本发明进行详细说明。
请参阅图1,为本发明实施方式的一种声音处理系统的结构示意图。该系统100包括声音识别装置10和声音接收装置20。在本实施方式中,该系统100包括一个声音识别装置10和一个声音接收装置20,即,声音识别装置10和声音接收装置20是一对一的网络连接关系。
该声音识别装置10用于接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对该控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送该第二音频文件。
该声音接收装置20用于接收该第二音频文件,检测该第二音频文件是否包含声波信号,在确认包含声波信号时对该第二音频文件进行解析以得到对应的 声码,并对该声码进行解码以得到对应的数据信息。
具体地,请同时参阅图2,为本发明实施方式中的声音识别装置的结构示意图。在本实施方式中,该声音识别装置10可以是智能移动设备、计算机等,具有诸如话筒,用于获取人声语音、音乐等音频信息。
该声音识别装置10包括分析单元11、编码单元12、第一转换单元13、音频处理单元14以及发送单元15。
该分析单元11用于接收到指令或人声语音时,识别输入内容以产生该控制指令。其中,该指令可以是该声音识别装置10响应用户的操作而产生的对应操作指令,还可以是该声音识别装置10来自于其他终端、设备发送的操作指令、控制指令等。
该编码单元12用于将该控制指令进行编码以产生对应的声码。
该第一转换单元13用于将编码单元12生成的声码进行傅里叶正向变换以得到声波信号。
该音频处理单元14用于将第一转换单元13生成的声波信号与第一音频文件合成,以得到包含该声码的第二音频文件。
其中,该第一音频文件为高频文件。在一实施方式中,该音频处理单元14将声波信号以多个连续间隔的方式与第一音频文件合成,形成第二音频文件。由于人耳听不到高于一定频率范围的声波信号,因此,第一音频文件为高频文件。当传送音频文件时,由于人们听不到携带声码的高频文件,完全感觉不到有声音存在,因此在进行音频传输时不会对用户或环境造成影响。
该发送单元15用于发送该音频处理单元14生成的第二音频文件。
请同时参阅图3,为本发明实施方式中的声音接收装置20的结构示意图。在本实施方式中,该声音识别装置20可以是智能移动设备、计算机等,具有诸如麦克风,用于获取音频文件。具体地,该声音接收装置20包括接收单元21、检测单元22、第二变换单元23以及解码单元24。
该接收单元21用于接收由声音识别装置10发送的第二音频文件。
该检测单元22用于分析并检测该第二音频文件中是否包含声波信号。具体地,该检测单元22通过对第二音频文件进行频谱分析以判断第二音频文件中是否包含声波信号。
该第二变换单元23用于在检测单元22确认包含声波信号时,对该第二音频文件进行解析以得到对应的声波信号,并对该声波信号进行傅里叶逆向变换以得到对应的声码。
该解码单元24用于对该第二变换单元23生成的声码进行解码,以得到对应的数据信息。
在本实施方式中,该数据信息包含控制指令的基本信息以及延伸信息。其中,基本信息至少包括指令或人声语音的内容,人声语音的内容为“开启***购物网页”。关联信息至少包括:网页链接地址、执行指令、指令链接。例如,网页链接地址为“***购物网页”的地址。
在其他实施方式中,该声音识别装置10还包括声音接收装置20的各个功能单元,该声音接收装置20还包括声音识别装置10的各个功能单元,这样,该声音识别装置10在识别处理接收到的指令或人声语音的同时,还可以对接收到的音频文件进行识别处理以得到对应的数据信息,同样地,该声音接收装置20在识别处理接收到的音频文件以得到对应的数据信息的同时,还可以对接收到的指令或人声语音进行识别处理并形成携带声码的音频文件。具体工作原理如上所述,在此不加赘述。
进一步地,该声音接收装置20还包括指令处理单元25,用于判断该解码单元24产生的数据信息类别,并根据判断结果执行相应的指令。具体地,该数据信息包含基本信息和延伸信息。该指令处理单元25判断产生的数据信息为基本信息还是延伸信息:
当确定数据信息为基本信息时,该声音接收装置20播放或显示该数据信息的内容;
当确定数据信息为延伸信息时,该声音接收装置20访问对应的地址,执行 对应的指令。例如,该数据信息为网页链接地址时,该声音接收装置20通过该网页链接地址访问对应的网页。
请参阅图4,为本发明第二实施方式中的声音处理系统的结构示意图。该系统300包括声音识别装置31、声音接收装置32以及服务器33。
当该声音接收装置32确定该数据信息为延伸信息时,向该服务器33发送对应的指令信息。其中,该指令信息为访问指令或网页链接地址。
该服务器33用于响应该指令信息以执行相应的功能或调用对应的网页,以获取延伸应用。
进一步地,该服务器33还用于将执行结果反馈至声音接收装置32。
下面将结合具体应用场景对本发明进行举例说明。
应用场景一,在本应用场景中,服务器33为一网站服务器。
当声音识别装置31接收到“开启***购物网页”的指令或人声语音时,识别输入内容以产生对应的“开启***购物网页”的控制指令,对该控制指令进行如上所述的处理后得到包含对应声码的第二音频文件,并发送将该第二音频文件。
声音接收装置32接收到第二音频文件后,检测该音频文件包含声波信号时,对该第二音频文件进行解析、处理以得到对应的数据信息,该数据信息包括***购物网页的链接地址以及开启该链接地址的控制指令。然后,声音接收装置32响应该控制指令向服务器33发送访问***购物网页的请求。
服务器33响应该访问请求调用***购物网页的内容,使得声音接收装置32能够显示***购物网页的内容。
进一步地,声音识别装置31接收到“购买**”的指令或人声语音时,识别输入内容以产生对应的“购买**”的控制指令,对该控制指令进行如上所述的处理后得到包含对应声码的第二音频文件,并发送将该第二音频文件。
声音接收装置32接收到第二音频文件后,检测该音频文件包含声波信号时,对该第二音频文件进行解析、处理以得到对应的数据信息,该数据信息包括购 买**控制指令。然后,声音接收装置32响应该控制指令向服务器33发送购买**的请求。
服务器33响应该请求对服务器所保存的数据进行相应的处理以执行**被购买的功能,即,完成网络下单。
声音识别装置31接收到“输入密码******”的指令或人声语音时,识别输入内容以产生对应的“输入密码******”的控制指令,对该控制指令进行如上所述的处理后得到包含对应声码的第二音频文件,并发送将该第二音频文件。
声音接收装置32接收到第二音频文件后,检测该音频文件包含声波信号时,对该第二音频文件进行解析、处理以得到对应的数据信息,该数据信息包括输入密码的控制指令以及密码为******。然后,声音接收装置32响应该控制指令向服务器33发送输入密码的指令,接收服务器33反馈的付款指令链接,并输入密码******,完成支付。
应用场景二,在本应用场景中,该服务器33为一银行系统服务器。
声音识别装置31接收到“转账**元给**”的指令或人声语音时,识别输入内容以产生对应的“转账**元给**”的控制指令,对该控制指令进行如上所述的处理后得到包含对应声码的第二音频文件,并发送将该第二音频文件。
声音接收装置32接收到第二音频文件后,检测该音频文件包含声波信号时,对该第二音频文件进行解析、处理以得到对应的数据信息,该数据信息包括转账指令、转账金额以及转账对象。然后,声音接收装置32响应该转账令向服务器33发送该请求。
服务器33响应该请求对服务器所保存的数据进行相应的处理以执行转账,即,完成电子银行转账信息的填写,并向该声音接收装置32反馈相应的转账确认页面。
该声音识别装置31接收到“输入密码****”的指令或人声语音时,识别输入内容以产生对应的“输入密码****”的控制指令,对该控制指令进行如上所述的处理后得到包含对应声码的第二音频文件,并发送将该第二音频文件。
声音接收装置32接收到第二音频文件后,检测该音频文件包含声波信号时,对该第二音频文件进行解析、处理以得到对应的数据信息,该数据信息包括输入密码以及密码内容。然后,声音接收装置32响应该输入密码指令向服务器33发送该请求,以使服务器33完成转账密码的输入,执行转账。
应用场景三
声音识别装置31接收到“接收**PPT”的指令或人声语音时,识别输入内容以产生对应的“接收**PPT”的控制指令,对该控制指令进行如上所述的处理后得到包含对应声码的第二音频文件,并发送将该第二音频文件。
声音接收装置32接收到第二音频文件后,检测该音频文件包含声波信号时,对该第二音频文件进行解析、处理以得到对应的数据信息,该数据信息包括接收PPT以及PPT文件。然后,声音接收装置32响应该指令下载、接收**PPT,完成文件的分享。
在其他实施方式中,该系统100还可以包括一个声音识别装置10以及多个声音接收装置20,即,声音识别装置10和声音接收装置20为一对多的网络连接关系。工作原理相同,在此不加赘述。
再一实施方式中,该系统100还可以包括多个声音识别装置10以及多个声音接收装置20,即,声音识别装置10和声音接收装置20为多对多的网络连接关系。工作原理相同,在此不加赘述。
再一实施方式中,该系统100还可以包括一个声音识别装置10、多个声音接收装置20以及一个服务器,即,声音识别装置10和声音接收装置20为一对多的网络连接关系,声音接收装置20与服务器为多对一的网络连接关系。工作原理相同,在此不加赘述。
再一实施方式中,该系统100还可以包括一个声音识别装置10、多个声音接收装置20以及多个服务器,即,声音识别装置10和声音接收装置20为一对多的网络连接关系,多个服务器可以是相同的服务器,也可以是不同的服务器。同样地,当该系统100包括多个声音识别装置10和多个声音接收装置20时, 每个声音接收装置20可以与一个或多个服务器进行通信连接。工作原理相同,在此不加赘述。
请参阅图5,为本发明第一实施方式中的一种声音处理方法的流程示意图,该实施方式示出的方法应用于如上所述的声音处理系统。该方法包括:
步骤S50,声音识别装置接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对该控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送该第二音频文件。
请同时参阅图6,具体地:
步骤S501,接收到指令或人声语音时,识别输入内容以产生对应的控制指令;
步骤S502,将该控制指令进行编码以产生对应的声码;
步骤S503,将该声码进行傅里叶正向变换以得到声波信号。
步骤S504,将该声波信号与第一音频文件合成以得到包含该声码的第二音频文件;其中,该第一音频文件为高频文件。
步骤S51,声音接收装置接收第二音频文件,检测该第二音频文件是否包含声波信号,在确认包含声波信号时对该第二音频文件进行解析以得到对应的声码,并对该声码进行解码以得到对应的数据信息。
请同时参阅图7,具体地:
步骤S511,接收该第二音频文件,检测该第二音频文件是否包含声波信号;若是,进入步骤S512;否则,流程结束。
步骤S512,对该第二音频文件进行解析以得到对应的声波信号;
步骤S513,将该声波信号进行傅里叶逆向变换以得到对应的声码;
步骤S514,对该声码进行解码以得到对应的数据信息。
其中,该数据信息至少包含控制指令的基本信息以及延伸信息;该基本信息至少包括指令或人声语音内容,该延伸信息至少包括网页链接地址、执行指令、指令链接。
请参阅图8,为本发明第四实施方式中的声音处理方法的流程示意图,在对该声码进行解码以得到对应的数据信息之后,该方法还包括:
步骤S63,该声音接收装置判断该数据信息为基本信息还是延伸信息:若为基本信息,则进入步骤S64;若为延伸信息,则进入步骤S65。
其中,该基本信息至少包括指令或人声语音内容,该延伸信息至少包括网页链接地址、执行指令、指令链接。
步骤S64,当确定该数据信息为基本信息时,该声音接收装置播放或显示该数据信息的内容;然后,流程结束。
步骤S65,当确定该数据信息为延伸信息时,该声音接收装置向一服务器发送对应的访问指令。
步骤S66,该服务器响应该访问指令以执行相应的功能或调用对应网页,并向该声音接收装置发送对应的执行结果。然后,流程结束。
请参阅图9,为本发明第五实施方式中的音乐处理方法的流程示意图,该实施方式示出的方法应用于如上该的声音识别装置,包括:
步骤S70,接收到指令或人声语音时,识别输入内容以产生对应的控制指令;
步骤S71,将该控制指令进行编码以产生对应的声码;
步骤S72,将该声码进行傅里叶正向变换以得到声波信号;
步骤S73,将该声波信号与该第一音频文件合成以得到包含该声码的第二音频文件;其中,该第一音频文件为高频文件;以及
步骤S74,发送该第二音频文件,使一声音接收装置识别该第二音频文件包含的声码所对应的数据信息。
请参阅图10,为本发明第六实施方式中的声音处理方法的流程示意图,该实施方式示出的方法应用于如上该的声音接收装置,包括:
步骤S81,接收第二音频文件,判断该第二音频文件是否包含声波信号。若是,则进入步骤S82;否则,流程结束。
其中,该第二音频文件为一声音识别装置根据接收到的指令或人声语音而 生成的携带对应声码的文件。
步骤S82,对该第二音频文件进行解析以得到对应的声波信号;
步骤S83,对该声波信号进行傅里叶逆向变换以得到对应的声码;
步骤S84,对该声码进行解码以得到对应的数据信息。
请参阅图11,为本发明第七实施方式中的声音处理方法的流程示意图,该实施方式示出的方法应用于如上所述的声音接收装置,包括:
步骤S91,接收第二音频文件,判断该第二音频文件是否包含声波信号。若是,则进入步骤S92;否则,进入步骤S95。
其中,该第二音频文件为一声音识别装置根据接收到的指令或人声语音而生成的携带对应声码的文件。
步骤S92,对该第二音频文件进行解析以得到对应的声波信号;
步骤S93,对该声波信号进行傅里叶逆向变换以得到对应的声码;
步骤S94,对该声码进行解码以得到对应的数据信息。
步骤S95,判断该数据信息为基本信息还是延伸信息:若为基本信息,则进入步骤S96;若为延伸信息,则进入步骤S97。
其中,该基本信息至少包括指令或人声语音内容,该延伸信息至少包括网页链接地址、执行指令、指令链接。
步骤S96,当播放或显示该数据信息的内容;然后,流程结束。
步骤S97,向一服务器发送对应的访问指令,使该服务器响应该访问指令以执行相应的功能或调用对应网页,并反馈对应的执行结果。然后,流程结束。
本发明实施方式中,声音识别装置对接收到的指令,或人声语音进行声音编码,并嵌入高频声音文件中输出,使得声音接收装置能够在接收到该高频声音文件时识别出其包含的声码,从而进行解码以得到对应的信息或指令,实现通过高频声音文件进行指令、信息的传输,避免其他因素的干扰。
在本发明所提供的实施方式中,所揭露的系统、终端和方法,可以通过其它的方式实现。例如,以上所描述的终端实施例是示意性的,所述单元的划分, 为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,管理服务器,或者网络设备等)或处理器执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文:read-only memory,缩写:ROM)、随机存取存储器(英文:Random Access Memory,缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
Claims (14)
- 一种声音处理系统,其特征在于,所述系统包括:声音识别装置,用于接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送所述第二音频文件;声音接收装置,用于接收所述第二音频文件,检测所述第二音频文件是否包含声波信号,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,并对所述声码进行解码以得到对应的数据信息。
- 根据权利要求1所述的声音处理系统,其特征在于,所述声音识别装置包括:分析单元,用于接收到指令或人声语音时,识别输入内容以产生所述控制指令;编码单元,用于将所述控制指令进行编码以产生对应的声码;第一转换单元,用于将所述声码进行傅里叶正向变换以得到声波信号;音频处理单元,用于将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件;发送单元,用于发送所述音频处理单元生成的所述第二音频文件。
- 根据权利要求1所述的声音处理系统,其特征在于,所述声音接收装置包括:接收单元,用于接收由所述声音识别装置发送的所述第二音频文件;检测单元,用于分析并检测所述第二音频文件中是否包含声波信号;第二变换单元,用于在所述检测单元确认包含声波信号时对所述第二音频文件进行解析以得到对应的声波信号,并对所述声波信号进行傅里叶逆向变换以得到对应的声码;解码单元,用于对所述声码进行解码以得到对应的数据信息。
- 根据权利要求1-3任意一项所述的声音处理系统,其特征在于,所述声音接收装置还包括:指令处理单元,用于判断所述解码单元产生的所述数据信息为基本信息还是延伸信息:当确定所述数据信息为基本信息时,播放或显示所述数据信息的内容;当确定所述数据信息为延伸信息时,访问对应的地址,执行对应的指令;其中,所述基本信息至少包括指令或人声语音内容,所述延伸信息至少包括网页链接地址、执行指令、指令链接。
- 根据权利要求4所述的声音处理系统,其特征在于,所述系统还包括服务器;所述指令处理单元确定所述数据信息为延伸信息时,向所述服务器发送对应的指令信息;其中,所述指令信息为访问指令或网页链接地址;所述服务器用于响应所述指令信息以执行相应的功能或调用对应网页,以获取对应的延伸应用;所述声音接收装置还用于接收所述服务器响应所述指令信息的执行结果。
- 一种声音处理方法,其特征在于,所述方法包括:声音识别装置接收到指令或人声语音时,识别输入内容以产生对应的控制指令,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,并发送所述第二音频文件;以及声音接收装置接收所述第二音频文件,检测所述第二音频文件是否包含声波信号,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,并对所述声码进行解码以得到对应的数据信息。
- 根据权利要求6所述的声音处理方法,其特征在于,对所述控制指令进行处理并与第一音频文件合成以得到包含对应声码的第二音频文件,具体包括:将所述控制指令进行编码以产生对应的声码;将所述声码进行傅里叶正向变换以得到声波信号;以及将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件。
- 根据权利要求6所述的声音处理方法,其特征在于,在确认包含声波信号时对所述第二音频文件进行解析以得到对应的声码,具体包括:在所述检测单元确认包含声波信号时,对所述第二音频文件进行解析以得到对应的声波信号,并对所述声波信号进行傅里叶逆向变换以得到对应的声码。
- 根据权利要求6-8任意一项所述的声音处理方法,其特征在于,所述方法还包括:所述声音接收装置判断所述数据信息为基本信息还是延伸信息:其中,所述基本信息至少包括指令或人声语音内容,所述延伸信息至少包括网页链接地址、执行指令、指令链接;当确定所述数据信息为基本信息时,所述声音接收装置播放或显示所述数据信息的内容;当确定所述数据信息为延伸信息时,所述声音接收装置向一服务器发送对应的访问指令;所述服务器响应所述访问指令以执行相应的功能或调用对应网页,并向所述声音接收装置发送对应的执行结果。
- 一种声音识别装置,其特征在于,所述装置包括:分析单元,用于接收到指令或人声语音时,识别输入内容以产生所述控制指令;编码单元,用于将所述控制指令进行编码以产生对应的声码;第一转换单元,用于将所述声码进行傅里叶正向变换以得到声波信号;音频处理单元,用于将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件;发送单元,用于发送所述音频处理单元生成的所述第二音频文件至一声音接收装置,使所述声音接收装置识别所述第二音频文件包含的声码所对应的数据信息。
- 一种声音接收装置,其特征在于,所述装置包括:接收单元,用于接收一声音识别装置发送的第二音频文件;其中,所述第二音频文件为所述声音识别装置根据接收到的指令或人声语音而生成的包含对应声码的文件;检测单元,用于分析并检测所述第二音频文件中是否包含声波信号;第二变换单元,用于在所述检测单元确认包含声波信号,对所述第二音频文件进行解析得到声波信号,并对所述声波信号进行傅里叶逆向变换以得到对应的声码;以及解码单元,用于对所述声码进行解码以得到对应的数据信息。
- 一种声音处理方法,其特征在于,所述方法包括:接收到指令或人声语音时,识别输入内容以产生对应的控制指令;将所述控制指令进行编码以产生对应的声码;将所述声码进行傅里叶正向变换以得到声波信号;将所述声波信号与所述第一音频文件合成以得到包含所述声码的第二音频文件;其中,所述第一音频文件为高频文件;以及发送所述第二音频文件,使一声音接收装置识别所述第二音频文件包含的声码所对应的数据信息。
- 一种声音处理方法,其特征在于,所述方法包括:接收一声音识别装置发送的第二音频文件,检测所述第二音频文件是否包含声波信号;其中,所述第二音频文件为所述声音识别装置根据接收到的指令或人声语音而生成的包含对应声码的文件;在确认包含声波信号时对,对所述第二音频文件进行解析以得到对应的声波信号;对所述声波信号进行傅里叶逆向变换以得到对应的声码;以及对所述声码进行解码以得到对应的数据信息。
- 根据权利要求13所述的声音处理方法,其特征在于,所述方法还包括:判断所述数据信息为基本信息还是延伸信息;其中,所述基本信息至少包 括指令或人声语音内容,所述延伸信息至少包括网页链接地址、执行指令、指令链接;当确定所述数据信息为基本信息时,播放或显示所述数据信息的内容;当确定所述数据信息为延伸信息时,向一服务器发送对应的访问指令,使所述服务器响应所述访问指令执行相应的功能或调用对应网页以及反馈对应的执行结果。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711256501.6 | 2017-12-03 | ||
CN201711256501.6A CN107993655A (zh) | 2017-12-03 | 2017-12-03 | 一种声音处理系统、方法及声音识别装置和声音接收装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019104889A1 true WO2019104889A1 (zh) | 2019-06-06 |
Family
ID=62035269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/077237 WO2019104889A1 (zh) | 2017-12-03 | 2018-02-26 | 一种声音处理系统、方法及声音识别装置和声音接收装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107993655A (zh) |
WO (1) | WO2019104889A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108777596B (zh) * | 2018-05-30 | 2022-03-08 | 上海惠芽信息技术有限公司 | 一种基于声波的通信方法、通信系统及计算机可读存储介质 |
CN108922550A (zh) * | 2018-07-04 | 2018-11-30 | 全童科教(东莞)有限公司 | 一种采用摩斯声码控制机器人移动的方法及系统 |
TWI682322B (zh) * | 2018-09-06 | 2020-01-11 | 廣達電腦股份有限公司 | 指令處理裝置和方法 |
CN112634899A (zh) * | 2021-01-31 | 2021-04-09 | 成都市玄上科技有限公司 | 一种利用声音信号进行交互控制的方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2375907A (en) * | 2001-05-14 | 2002-11-27 | British Broadcasting Corp | An automated recognition system |
CN1762116A (zh) * | 2003-03-17 | 2006-04-19 | 皇家飞利浦电子股份有限公司 | 用于遥控音频装置的方法 |
CN101219266A (zh) * | 2007-01-10 | 2008-07-16 | 刘鹏 | 跑步机的语音控制装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2523934C2 (ru) * | 2010-03-26 | 2014-07-27 | Филд Систем, Инк. | Передатчик |
CN105847436B (zh) * | 2016-05-26 | 2020-04-17 | 厦门声连网信息科技有限公司 | 一种声波物联网的信息推播系统和方法 |
-
2017
- 2017-12-03 CN CN201711256501.6A patent/CN107993655A/zh active Pending
-
2018
- 2018-02-26 WO PCT/CN2018/077237 patent/WO2019104889A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2375907A (en) * | 2001-05-14 | 2002-11-27 | British Broadcasting Corp | An automated recognition system |
CN1762116A (zh) * | 2003-03-17 | 2006-04-19 | 皇家飞利浦电子股份有限公司 | 用于遥控音频装置的方法 |
CN101219266A (zh) * | 2007-01-10 | 2008-07-16 | 刘鹏 | 跑步机的语音控制装置 |
Also Published As
Publication number | Publication date |
---|---|
CN107993655A (zh) | 2018-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108133707B (zh) | 一种内容分享方法及系统 | |
JP6828001B2 (ja) | 音声ウェイクアップ方法及び装置 | |
WO2019104889A1 (zh) | 一种声音处理系统、方法及声音识别装置和声音接收装置 | |
JP6862632B2 (ja) | 音声インタラクション方法、装置、設備、コンピュータ記憶媒体及びコンピュータプログラム | |
JP6754011B2 (ja) | 音声サービスを提供するための方法、装置およびサーバ | |
US20170046124A1 (en) | Responding to Human Spoken Audio Based on User Input | |
WO2020253509A1 (zh) | 面向情景及情感的中文语音合成方法、装置及存储介质 | |
KR20190077088A (ko) | 성문 구축 및 등록 방법 및 그 장치 | |
CN110018735A (zh) | 智能个人助理接口系统 | |
KR102615154B1 (ko) | 전자 장치 및 전자 장치의 제어 방법 | |
JP2019091419A (ja) | 情報出力方法および装置 | |
US20200019687A1 (en) | Method and apparatus for identity authentication, and computer readable storage medium | |
WO2021237923A1 (zh) | 智能配音方法、装置、计算机设备和存储介质 | |
US20230127787A1 (en) | Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium | |
US20200234181A1 (en) | Implementing training of a machine learning model for embodied conversational agent | |
CN107240396B (zh) | 说话人自适应方法、装置、设备及存储介质 | |
CN112687286A (zh) | 音频设备的降噪模型的调整方法和装置 | |
CN105047192A (zh) | 基于隐马尔科夫模型的统计语音合成方法及装置 | |
CN115116458B (zh) | 语音数据转换方法、装置、计算机设备及存储介质 | |
CN110890098B (zh) | 盲信号分离方法、装置和电子设备 | |
KR20230020508A (ko) | 텍스트 에코 제거 | |
CN112306560B (zh) | 用于唤醒电子设备的方法和装置 | |
CN108766429B (zh) | 语音交互方法和装置 | |
CN112037772A (zh) | 基于多模态的响应义务检测方法、系统及装置 | |
TWM578858U (zh) | 跨通路人工智慧對話式平台 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18883810 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.10.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18883810 Country of ref document: EP Kind code of ref document: A1 |