CN115482806A - Voice processing system, method, device, storage medium and computer equipment - Google Patents

Voice processing system, method, device, storage medium and computer equipment Download PDF

Info

Publication number
CN115482806A
CN115482806A CN202211113080.2A CN202211113080A CN115482806A CN 115482806 A CN115482806 A CN 115482806A CN 202211113080 A CN202211113080 A CN 202211113080A CN 115482806 A CN115482806 A CN 115482806A
Authority
CN
China
Prior art keywords
voice
signal
input source
equipment
radio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211113080.2A
Other languages
Chinese (zh)
Other versions
CN115482806B (en
Inventor
杨广煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202211113080.2A priority Critical patent/CN115482806B/en
Publication of CN115482806A publication Critical patent/CN115482806A/en
Application granted granted Critical
Publication of CN115482806B publication Critical patent/CN115482806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Selective Calling Equipment (AREA)
  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application relates to a voice processing system, a method, a device, a storage medium and a computer device, wherein the system comprises a main control device and at least two radio devices; wherein: the at least two radio devices are respectively used for acquiring first voice signals corresponding to the same voice input source; the main control equipment is used for acquiring a first voice signal; processing the first voice signal according to a voice processing rule, and determining a voice input source signal and corresponding target radio equipment; acquiring a voice broadcast mode corresponding to target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending voice reply information to the target radio equipment; and the target radio equipment in the at least two radio equipments is also used for playing the voice reply information in a voice broadcast mode. The scheme provided by the application can increase the response mode of the equipment.

Description

Voice processing system, method, device, storage medium and computer equipment
The present application is a divisional application entitled "speech processing method, apparatus, computer-readable storage medium, and computer device" filed by the chinese patent office on 2019, 06.06.06.06.2019, with application number 2019104916096, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of computer technologies, and in particular, to a speech processing system, a speech processing method, a speech processing apparatus, a computer-readable storage medium, and a computer device.
Background
With the development of computer technology, intelligent devices can establish wireless connection with other devices. Meanwhile, the intelligent equipment can convert voice information into text information, so that a user can control the intelligent equipment through voice and further control other equipment to perform corresponding operation. However, in the current speech processing method, the response mode of the device is single.
Disclosure of Invention
In view of the above, it is necessary to provide a speech processing system, a method, an apparatus, a computer-readable storage medium, and a computer device for solving the technical problem that the response mode of the device is relatively single.
A speech processing system comprises a main control device and at least two radio devices; wherein:
the at least two radio devices are respectively used for acquiring first voice signals corresponding to the same voice input source;
the main control equipment is used for acquiring a first voice signal; processing the first voice signal according to a voice processing rule, and determining a voice input source signal and corresponding target radio equipment; acquiring a voice broadcasting mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending voice reply information to the target radio equipment;
and the target radio equipment in the at least two radio equipments is also used for playing the voice reply information in a voice broadcast mode.
A speech processing method performed by a target sound reception device of at least two sound reception devices, comprising:
collecting first voice signals of at least two radio devices corresponding to the same voice input source;
sending the first voice signal to a main control device, so that the main control device processes the first voice signal according to a voice processing rule and determines a voice input source signal; when the voice input source signal corresponds to the target radio equipment, acquiring a voice broadcast mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending voice reply information to the target radio equipment;
and receiving the voice reply information sent by the main control equipment, and playing the voice reply information in a voice broadcasting mode.
A speech processing apparatus comprising:
the acquisition module is used for acquiring first voice signals of at least two pieces of radio equipment corresponding to the same voice input source;
the sending module is used for sending the first voice signal to the main control equipment so that the main control equipment processes the first voice signal according to the voice processing rule and determines a voice input source signal; when the voice input source signal corresponds to the target radio equipment, acquiring a voice broadcast mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending voice reply information to the target radio equipment;
and the playing module is used for receiving the voice reply sent by the main control equipment.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
collecting first voice signals of at least two radio devices corresponding to the same voice input source;
sending the first voice signal to a main control device, so that the main control device processes the first voice signal according to a voice processing rule and determines a voice input source signal; when the voice input source signal corresponds to the target radio equipment, acquiring a voice broadcast mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending voice reply information to the target radio equipment;
and receiving the voice reply information sent by the main control equipment, and playing the voice reply information in a voice broadcasting mode.
A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
collecting first voice signals of at least two radio devices corresponding to the same voice input source;
sending the first voice signal to a main control device, so that the main control device processes the first voice signal according to a voice processing rule and determines a voice input source signal; when the voice input source signal corresponds to the target radio equipment, acquiring a voice broadcast mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending voice reply information to the target radio equipment;
and receiving the voice reply information sent by the master control equipment, and playing the voice reply information in a voice broadcasting mode.
According to the voice processing system, the voice processing method, the voice processing device, the computer readable storage medium and the computer device, at least two radio devices respectively collect first voice signals corresponding to the same voice input source, the main control device processes the obtained first voice signals according to the voice processing rule, the voice input source signals and the corresponding target radio devices are determined, the voice input range is large, the target radio devices corresponding to the voice input sources can be obtained, and voice signal transmission when the distance is long is achieved; the method comprises the steps of obtaining a voice broadcast mode corresponding to a target radio device, processing according to a voice input source signal and the voice broadcast mode to obtain corresponding voice reply information, sending the voice reply information to the target radio device, playing the voice reply information in the voice broadcast mode by the target radio device in at least two radio devices, enabling the range of voice signal input to be larger, improving the convenience of voice input, playing the voice reply information in the corresponding voice broadcast mode, increasing the response mode of the radio device, and receiving the voice reply information in the region corresponding to the target radio device.
Drawings
FIG. 1 is a diagram of an exemplary implementation of a speech processing method;
FIG. 2 is a diagram of an application environment of a speech processing method in another embodiment;
FIG. 3 is a diagram showing an application environment of a speech processing method in another embodiment;
FIG. 4 is a flow diagram that illustrates a method for speech processing according to one embodiment;
FIG. 5 is a flow diagram illustrating the determination of a speech input source signal and a target sound receiving device in one embodiment;
FIG. 6 is a timing diagram of a speech processing method in one embodiment;
FIG. 7 is a block diagram showing the structure of a speech processing apparatus according to an embodiment;
FIG. 8 is a block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
FIG. 1 is a diagram of an exemplary implementation of a speech processing method. The voice processing method is applied to a voice processing system. The speech processing system comprises the master control device 130, the first radio device 110, the second radio device 120 … and the like, but is not limited thereto. The master control device 130 is connected with the first radio receiving device 110 and the second radio receiving device 120 through a network. The network may specifically be a wireless communication network, for example, may be bluetooth or a wireless local area network, etc. A voice processing program may be used for master device 130. The main control device 130 is a terminal including a voice processing program, and may specifically be a desktop terminal or a mobile terminal. The mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like, and the desktop terminal may be a desktop computer, a smart tv Box, or a digital video converter Box (STB), and the like. The host device 130 may be loaded with an operating system, an application program, and the like. The first radio receiving device 110 and the second radio receiving device 120 each include a hardware device capable of collecting a voice signal, and may specifically be a microphone device, a terminal device, an electrical appliance with a radio receiving function, and the like.
FIG. 2 is a diagram of an application environment of the speech processing method in another embodiment. The main control device 130 is connected to the first radio device 110 and the second radio device 120 through a network, and the main control device 130 is connected to the server 140 through a network. The server may be implemented by an independent server or a server cluster composed of a plurality of servers.
FIG. 3 is a diagram of an application environment of a speech processing method in another embodiment. The master control device 130, the first radio device, the second radio device, the third radio device, the fourth radio device, and the third radio device may be located at different positions, for example, in a house, the master control device 130 may be located in a living room, the first radio device 110 is located in the bedroom 1, and the second radio device 120 is located in the bedroom 2.
In one embodiment, as shown in FIG. 4, a method of speech processing is provided. This embodiment is mainly illustrated by applying the method to the master device 130 in fig. 1, fig. 2, or fig. 3. Referring to fig. 4, the speech processing method specifically includes the following steps:
step 402, acquiring first voice signals acquired by at least two radio devices, wherein the first voice signals acquired by the at least two radio devices correspond to the same voice input source.
Wherein the first voice signal may be a sound wave signal generated by vibration of an object. The same voice input source refers to the sound made by the same object or the sound made by the same user. For example, when the user speaks in the bedroom 1, the bedroom 1 corresponds to the sound receiving device 110, and the bedroom 2 is closer to the bedroom 1, so that the sound receiving device 120 corresponding to the bedroom 2 can also acquire the first voice signal.
Specifically, at least two radio devices collect a first voice signal and send the first voice signal to the master control device. The master control equipment acquires first voice signals acquired by at least two radio equipment, wherein the first voice signals correspond to the same voice input source.
Step 404, processing the first voice signal according to the voice processing rule, and determining a voice input source signal and a corresponding target radio device.
The speech processing rule is a rule for processing the first speech signal, and may specifically be a rule established according to at least one of intensity, phase, energy, frequency spectrum, and sound pressure of the speech signal. The target radio equipment is the radio equipment corresponding to the voice input source signal. For example, the user speaks in the sound receiving device 110 corresponding to the bedroom 1, and although the sound receiving device 120 corresponding to the bedroom 2 can receive the first voice signal, the sound receiving device corresponding to the voice input source signal obtained by the voice processing program is the sound receiving device 110.
Specifically, the voice processing program processes the first voice signal according to the voice processing rule, and determines a voice input source signal and a corresponding target radio equipment.
In this embodiment, the voice processing program determines the first voice signal with the maximum energy according to the energy of the first voice signal collected by each radio device; and taking the first voice signal with the maximum energy as a voice input source signal, wherein the radio equipment corresponding to the first voice signal with the maximum energy is target radio equipment.
In this embodiment, the radio equipment can be used for video shooting. The radio equipment collects images or videos. The master control equipment acquires images or videos collected by the radio equipment. When the master control device detects that the images or videos collected by the radio equipment contain the images or videos of the user speaking, the radio equipment corresponding to the images or videos of the user speaking is determined to be target radio equipment, and the first voice signal corresponding to the target radio equipment is a voice input source signal.
And step 406, acquiring a voice broadcasting mode corresponding to the target radio equipment.
The voice broadcasting mode is a sound presentation mode. For example, the voice broadcast mode may be different tone broadcast, different language broadcast, etc. The voice broadcast mode can be specifically that the voice broadcast is reported to female or male, or broadcast using guangdong language, the broadcast of Sichuan language, the broadcast of Henan language, the broadcast of English, the broadcast of Japanese, the broadcast of Russian, the broadcast of Spanish, etc., can also use the voice broadcast of specific personage for example the voice broadcast that uses little forest, can also be that the looks sound chamber is transferred and is reported, the broadcast of stage drama mode, the broadcast of Beijing opera mode, etc. are not limited to this. Each radio device corresponds to a voice broadcast mode. The voice broadcasting modes of different radio equipment can be the same or different. The corresponding relation between the radio equipment and the voice broadcasting mode can be stored in the main control equipment.
Specifically, the voice processing program acquires a voice broadcast mode corresponding to a voice broadcast mode identifier corresponding to the device identifier of the target radio device according to the device identifier of the target radio device. Wherein the device identification is used to uniquely identify a radio device. I.e. the device identification of each radio device is not the same. The device identification is composed of at least one of numbers, letters and symbols. For example, the device identification is device 1, device 2 … is not so limited. The voice broadcast mode identification is used for uniquely marking a voice broadcast mode. That is, the voice broadcast mode identification of each voice broadcast mode is different. The voice broadcast mode identification is composed of at least one of numbers, letters and symbols.
And step 408, processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information.
Wherein the voice reply information can be presented in an audio manner.
Specifically, the voice processing program searches the corresponding voice reply information from the voice database corresponding to the voice playing mode according to the voice input source signal and the voice playing mode.
In this embodiment, the voice processing program performs semantic analysis on the voice input source signal to obtain corresponding text information. And the voice processing program searches the corresponding voice reply information from the voice database corresponding to the voice playing mode according to the text information. For example, the text information parsed by the speech processing program from the speech input source signal is "the position of gold in the periodic table of elements". The information is common sense information. The voice processing program searches the corresponding voice reply information from the voice database corresponding to the voice playing mode according to the position of the gold in the element periodic table, wherein the position of the gold in the element periodic table is 79 bits.
In this embodiment, the voice processing program performs semantic analysis on the voice input source signal to obtain corresponding text information. The voice processing program searches the corresponding text reply content from the database according to the text information, and combines the text reply content with the voice broadcasting mode to obtain the corresponding voice reply information. For example, the voice processing program determines that the voice input source signal and the voice broadcast mode are "soft voice", and the text information analyzed from the voice input source signal is "how the weather is today". Wherein the information is real-time information. Then the voice processing program searches the corresponding text reply content from the database as "today is sunny" and searches the corresponding voice reply information from the voice database corresponding to "gentle female voice" according to "today is sunny".
In this embodiment, the voice processing program may further obtain the ambient voice intensity, and adjust the volume of the voice reply message according to the ambient voice intensity, where the volume of the voice reply message is positively correlated with the ambient voice intensity. In particular, the ambient speech intensity may also be referred to as noise. When the environmental voice intensity is increased, the volume of the voice reply message is correspondingly increased; when the ambient voice intensity is reduced, the volume of the voice reply message is also reduced correspondingly. Or, each ambient voice intensity interval corresponds to the volume of one voice reply message. For example, when the ambient sound intensity is 35-40dB (decibel), the corresponding volume level is 1 level, etc., but is not limited thereto.
In this embodiment, when confirming that the speech input source signal is the animal cry according to this speech input source signal, seek the voice broadcast mode that obtains corresponding according to this animal cry, call the voice reply information that the animal is called the voice to be obtained according to this animal and the voice broadcast mode processing that corresponds. For example, if the voice processing program determines that the text corresponding to the voice input source signal is "Wang Wangwang", then the corresponding voice broadcasting mode is "dog cry" according to the action cry search, the voice processing program obtains any voice reply information corresponding to the dog cry, and sends the voice reply information corresponding to the "dog cry" to the target radio equipment.
And step 410, sending the voice reply information to the target radio equipment so that the target radio equipment plays the voice reply information in a voice broadcasting mode.
Specifically, the voice processing program sends the voice reply information corresponding to the voice broadcasting mode to the target radio equipment through the 2.4G network, so that the target radio equipment plays the voice reply information in the voice broadcasting mode. For example, the voice processing program sends the voice reply information corresponding to the "gentle female sound" to the target radio equipment, and then the target radio equipment plays the voice reply information in the "gentle female sound" mode.
The voice processing method comprises the steps of acquiring first voice signals acquired by at least two radio devices, wherein the first voice signals acquired by the at least two radio devices correspond to the same voice input source, processing the first voice signals according to a voice processing rule, determining voice input source signals and corresponding target radio devices, and realizing the transmission of voice signals when the distance is long, wherein the range of voice input is large, and the target radio devices corresponding to the voice input sources can be acquired; the method comprises the steps of obtaining a voice broadcast mode corresponding to a target radio device, processing according to a voice input source signal and the voice broadcast mode to obtain corresponding voice reply information, sending the voice reply information to the target radio device, enabling the target radio device to play the voice reply information in the voice broadcast mode, enabling the range of voice signal input to be larger, improving convenience of voice input, playing the voice reply information in the corresponding voice broadcast mode, increasing the response mode of the radio device, enabling the response mode of the radio device to be diversified, meeting personalized requirements and enabling the voice reply information to be received in an area corresponding to the target radio device.
In one embodiment, processing a voice signal according to a voice processing rule to determine a voice input source signal and a corresponding target radio device comprises: acquiring the intensity of a first voice signal corresponding to each radio equipment in at least two radio equipments; when the intensity of the first voice signal is larger than or equal to the preset intensity, the first voice signal with the intensity larger than or equal to the preset intensity is determined to be a voice input source signal, and the radio equipment corresponding to the first voice signal with the intensity larger than or equal to the preset intensity is determined to be target radio equipment.
Wherein, the intensity of the first voice signal is also called sound intensity. The sound intensity refers to the magnitude of the mean energy flow density of the sound wave. The preset intensity may be an intensity threshold set in the voice processing program and stored in the main control device.
Specifically, the voice processing program obtains the intensity of the first voice signal corresponding to each of the at least two radio sets through detection. The voice processing program judges whether the intensity of each first voice signal is greater than the preset intensity. When the intensity of the first signal in the at least two first voice signals is greater than or equal to the preset intensity, determining that the first voice signal with the intensity greater than or equal to the preset intensity is a voice input source signal, and determining that the radio equipment corresponding to the first voice signal with the intensity greater than or equal to the preset intensity is target radio equipment.
In this embodiment, when the intensities of at least two first voice signals are greater than or equal to the preset intensity, it is determined that the at least two first voice signals with the intensities greater than or equal to the preset intensity are voice input source signals, and the radio receiving device corresponding to the at least two first voice signals with the intensities greater than or equal to the preset intensity is a target radio receiving device. For example, the first voice signal 1 corresponding to the radio receiving device 1 and the first voice signal 2 corresponding to the radio receiving device 2 are both greater than or equal to the preset intensity, then the first voice signal 1 and the first voice signal 2 are both used as voice input source signals, and the radio receiving device 1 and the radio receiving device 2 are used as target radio receiving devices. Then, the voice processing program sends voice reply information to the radio equipment 1 and the radio equipment 2, and the radio equipment 1 and the radio equipment 2 play the voice reply information in a corresponding voice broadcast mode.
According to the voice processing method, the strength of the first voice signal corresponding to each radio device in at least two radio devices is obtained, when the strength of the first voice signal is larger than or equal to the preset strength, the first voice signal with the strength larger than or equal to the preset strength is determined to be the voice input source signal, and the radio device corresponding to the first voice signal with the strength larger than or equal to the preset strength is used as the target radio device, so that the target radio device can be rapidly determined, and the voice processing efficiency is improved.
In one embodiment, when the intensity of the first voice signal is greater than or equal to the preset intensity, the first voice signal is determined to be a voice input source signal, and the radio device corresponding to the first voice signal is a target radio device, including: when the intensity of at least two first voice signals is greater than or equal to the preset intensity, determining the first voice signal with the maximum intensity in the at least two first voice signals; and taking the first voice signal with the maximum intensity as a voice input source signal, and taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
Specifically, when the intensity of at least two first voice signals is greater than or equal to the preset intensity, the voice processing program determines the first voice signal with the maximum intensity in the at least two first voice signals. The voice processing program takes the first voice signal with the maximum intensity as a voice input source signal, and the radio equipment corresponding to the first voice signal with the maximum intensity is the target radio equipment. That is, the greater the strength of the voice signal, the higher the priority. For example, the first voice signal 1 corresponding to the sound receiving device 1 and the first voice signal 2 corresponding to the sound receiving device 2 are both greater than or equal to the preset intensity, and then the intensity of the first voice signal 1 is compared with the intensity of the first voice signal 2, so that the obtained intensity of the first voice signal 1 is large. The first voice signal 1 is used as a voice input source signal, and the radio receiving equipment 1 is used as target radio receiving equipment. Then, the voice processing program sends the voice reply information to the radio equipment 1, and the radio equipment 1 plays the voice reply information in a corresponding voice broadcasting mode.
According to the voice processing method, when the intensity of at least two first voice signals is greater than or equal to the preset intensity, the first voice signal with the maximum intensity in the at least two first voice signals is determined; the first voice signal with the maximum intensity is used as a voice input source signal, and the radio equipment corresponding to the first voice signal with the maximum intensity is used as target radio equipment, so that the target radio equipment can be quickly determined, and the voice processing efficiency is improved.
In one embodiment, as shown in fig. 5, which is a schematic flow chart of determining a speech input source signal and a target radio device in one embodiment, the speech processing method further includes:
step 502, when the intensities of the first voice signals collected by the at least two radio equipments are both smaller than the preset intensity, the first voice signals are merged to suppress noise signals in the first voice signals collected by the at least two radio equipments, so as to obtain a voice input source signal.
Specifically, when the voice processing program detects that the intensities of the first voice signals acquired by the at least two radio devices are both smaller than the preset intensity, all the acquired first voice signals can be combined through the analog beam former, and a part of the first voice signals are enhanced through an LCMV (linear Constrained Minimum-Variance) algorithm or an MVDR (Minimum Variance Distortionless Response) algorithm, so as to suppress noise signals in the first voice signals acquired by the at least two radio devices, thereby obtaining a voice input source signal.
The LCMV algorithm is that if the arrival angle and bandwidth range of the desired signal are known, delay compensation can be performed on the array received data to keep the array receiving the desired signal consistent, and then constraint conditions are imposed on the array coefficients to adaptively minimize the output energy of the beamformer, which is equivalent to minimizing the noise energy in the undesired direction in the output signal, so as to achieve the purpose of enhancing the desired direction signal.
Step 504, determining the first voice signal with the maximum strength in the first voice signals.
Specifically, the speech processing program determines the first speech signal with the maximum intensity in the first speech signals according to the intensities of all the first speech signals.
Step 506, the radio equipment corresponding to the first voice signal with the maximum intensity is used as target radio equipment.
Specifically, the voice processing program takes the radio equipment corresponding to the first voice signal with the maximum intensity as the target radio equipment.
According to the voice processing method, the first voice signals are combined, the noise signals are suppressed, the more accurate first voice signals can be obtained, the radio equipment corresponding to the first voice signals with the maximum intensity is used as the target radio equipment, the problem that voice reply information is inaccurate due to the fact that the voice signals are lost can be avoided, and the accuracy of voice recognition is improved.
In one embodiment, a method of speech processing, comprising: acquiring first voice signals acquired by at least two radio devices, wherein the first voice signals acquired by the at least two radio devices correspond to the same voice input source; combining the first voice signals, and suppressing noise signals in the first voice signals collected by at least two pieces of radio equipment to obtain voice input source signals; acquiring the intensity of a first voice signal corresponding to each radio device in at least two radio devices; determining the first voice signal with the maximum strength in the first voice signals; taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment; acquiring a voice broadcast mode corresponding to target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; and sending the voice reply information to the target radio equipment so that the target radio equipment plays the voice reply information in a voice broadcasting mode. The voice processing method can play the voice reply information in a specific voice broadcasting mode, can avoid inaccurate voice reply information caused by loss of voice signals, improves the accuracy of voice recognition, and simultaneously improves the voice processing efficiency.
In one embodiment, processing according to the voice input source signal and the voice broadcast mode to obtain the corresponding voice reply information includes: sending a voice input source signal and a voice broadcasting mode corresponding to the target radio equipment to a server; and receiving voice reply information returned by the server, wherein the voice reply information is obtained by processing the server according to the voice input source signal and the voice broadcasting mode.
Specifically, the voice processing program sends a voice input source signal obtained by processing the first voice signal and a voice broadcast mode identifier corresponding to the target radio equipment to the server. The server carries out semantic analysis on the voice input source signal to obtain corresponding character information. The server searches corresponding text reply contents from the database according to the text information, and combines the text reply contents with a voice broadcasting mode to obtain corresponding voice reply information. And the server sends the voice reply information to the main control equipment where the voice processing program is located. And the voice processing program receives the voice reply information returned by the server.
According to the voice processing method, the voice input source signal and the voice broadcasting mode corresponding to the target radio equipment are sent to the server, and the voice reply information returned by the server is received, wherein the voice reply information is obtained by processing the server according to the voice input source signal and the voice broadcasting mode, the server is used for converting the voice input source signal and the voice reply information, and the safety of voice processing can be improved.
In one embodiment, the speech processing method further comprises: analyzing according to the voice input source signal to obtain a control instruction; and executing corresponding operation according to the control instruction.
The control instruction can be used for controlling the main control device, and the control instruction can also be used for controlling other devices which establish communication with the main control device.
Specifically, the voice processing program analyzes the voice input source signal to obtain corresponding text information, converts the text information into a control instruction, and executes corresponding operation on equipment corresponding to the control instruction according to the control instruction. For example, the text information analyzed by the voice processing program according to the voice input source signal is "i want to turn on the television", then the corresponding control instruction is "turn on the television", and then the voice processing program can perform the operation of turning on the television according to the control instruction.
In one embodiment, a voice processing program sends a voice input source signal and a voice broadcasting mode corresponding to a target radio device to a server; the voice processing program receives voice reply information and a control instruction returned by the server, wherein the control instruction is obtained by analyzing the voice input source signal by the server, the voice reply information is obtained by processing the voice input source signal and a voice broadcasting mode by the server, and the voice processing program executes corresponding operation according to the control instruction. The control command may be a protocol agreed with the main control device, and the control command may be sent to the main control device in an agreed protocol packet by searching for corresponding content.
According to the voice processing method, the control instruction is obtained through voice input source signal analysis, and corresponding operation is executed according to the control instruction, so that the equipment can be remotely controlled, and voice control is more convenient.
In one embodiment, the control instruction may also be used to switch a voice broadcast mode. Specifically, when the voice processing program obtains a control instruction corresponding to switching to the first voice broadcasting mode according to the voice input source signal analysis, the voice broadcasting mode corresponding to the target radio equipment is switched to the first voice broadcasting mode according to the control instruction. And processing according to the voice input source signal and the first voice broadcasting mode to obtain corresponding voice reply information.
For example, when the control instruction corresponding to the voice broadcast mode obtained by the voice processing program through analysis according to the voice input source signal is 'switch to gentle female voice', the voice broadcast mode corresponding to the target radio equipment is selected.
According to the voice processing method, the voice broadcasting mode corresponding to the target radio equipment is switched to the first voice broadcasting mode according to the control command corresponding to the first voice broadcasting mode, so that the response mode of the equipment can be increased, and the personalized requirements can be met.
In one embodiment, before acquiring the first voice signals collected by the at least two radio equipments, the voice processing method further comprises: acquiring a second voice signal acquired by the radio equipment; and when detecting that the awakening words exist in the second voice signal, controlling the radio equipment to carry out voice acquisition.
The awakening words can be used for awakening the main control equipment and the radio equipment. Each master control device corresponds to a wakeup word. Or the awakening words corresponding to the main control devices of the same brand are the same. The wake-up word may be a default wake-up word or a wake-up word set by a user. For example, the wake word may be "hello," "xxx," "morning good," and so on.
Specifically, the voice processing program acquires a second voice signal acquired by the radio equipment. When the voice processing program does not detect that the awakening words exist in the second voice signal, the main control equipment and the radio equipment keep the original states and do not perform any operation. And when the voice processing program detects that the awakening word exists in the second voice signal, controlling the radio equipment to carry out a voice acquisition state and carrying out voice acquisition.
In this embodiment, the voice processing program may obtain the second voice signals acquired by the at least two radio devices, and when detecting that the wake-up word exists in the second voice signals acquired by the at least two radio devices, control all the radio devices to perform voice acquisition.
The voice processing method acquires a second voice signal acquired by the radio equipment; when the awakening word is detected to exist in the second voice signal, the radio equipment is controlled to carry out voice acquisition, the radio equipment is prevented from being in a meaningless voice acquisition state for a long time, the radio equipment is also prevented from acquiring user information, the power consumption of the radio equipment and the power consumption of the main control equipment are saved, and the information safety is improved.
In one embodiment, before acquiring the first voice signals acquired by at least two sound receiving devices or before acquiring the second voice signals acquired by the sound receiving devices, the voice processing method further comprises the following steps: acquiring a binding instruction initiated by a logged user account for at least two pieces of radio equipment; and determining the corresponding relation between the user account and the equipment identifications of at least two pieces of radio equipment according to the binding instruction.
Specifically, the voice processing program acquires an input user account and a corresponding password, and searches nearby radio equipment or acquires radio equipment connected with the same local area network after logging in. The voice processing program obtains a binding instruction initiated by a logged-in user account for at least two radio devices, binds the user account and the at least two radio devices according to the binding instruction, and determines the corresponding relationship between the user account and the device identifications of the at least two radio devices.
In this embodiment, when the master control device in the scene is replaced, the user may log in a new master control device through the user account and the corresponding password, and may obtain the correspondence between the user account and the device identifiers of the at least two radio devices without repeated binding.
The voice processing method comprises the steps of acquiring binding instructions initiated by a logged user account for at least two pieces of radio equipment; and determining the corresponding relation between the user account and the device identifications of the at least two radio devices according to the binding instruction, so that the radio devices do not need to be bound repeatedly when the master control device is replaced.
In one embodiment, as shown in FIG. 6, a timing diagram of a speech processing method in one embodiment is shown. The method is applied to the application scenario shown in fig. 2, in which the server 140 includes a background server and an AIlab (Artificial Intelligence Laboratory) server, and includes the following steps:
step 602, a first voice signal input by a user is acquired by a first radio device.
Step 604, the second radio equipment acquires a first voice signal input by the user.
Step 606, the first radio equipment sends the first voice signal to the main control equipment.
Step 608, the second radio equipment sends the first voice signal to the main control equipment.
Step 610, the main control device processes the first voice signal according to the voice processing rule, and determines a voice input source signal and a target radio device.
The master control device confirms that the first radio receiving device is the target radio receiving device.
And step 612, the main control device acquires a voice broadcast mode corresponding to the target radio device.
And step 614, the master control device sends the voice input source signal and the corresponding voice broadcasting mode to the background server.
Step 616, the background server sends the voice input source signal to the AILab server.
And step 618, the AILab server processes the voice input source signal according to the voice broadcasting mode to obtain corresponding text information and voice reply information.
Specifically, the AILab server processes the voice input source signal to obtain corresponding text information. And the AILab server processes the voice reply information according to the voice broadcasting mode and the voice input source signal.
In step 620, the AILab server sends text messages and voice reply messages to the background server.
And 622, the background server analyzes the text information to obtain a control instruction.
In step 624, the background server sends the voice reply message and the control command to the main control device.
In step 626, the main control device sends the voice reply message to the first radio device, and executes the operation corresponding to the control instruction.
Step 628, the first radio equipment plays the voice reply message in the voice broadcasting mode.
According to the voice processing method, the voice reply information can be played in a corresponding voice broadcasting mode, the response mode of the radio equipment can be increased, the response mode of the radio equipment is diversified, the personalized requirement is met, the voice reply information can be received in the area corresponding to the radio equipment, the control instruction is obtained through analysis according to the voice reply information, the corresponding operation is executed according to the control instruction, the equipment can be remotely controlled, and the voice control is more convenient.
In one embodiment, a method of speech processing, comprising:
step (a 1), acquiring a binding instruction initiated by a logged user account and applied to at least two radio devices.
And (a 2) determining the corresponding relation between the user account and the equipment identifications of at least two pieces of radio equipment according to the binding instruction.
And (a 3) acquiring a second voice signal acquired by the radio equipment.
And (a 4) controlling the radio equipment to carry out voice acquisition when detecting that the awakening word exists in the second voice signal.
And (a 5) acquiring first voice signals acquired by at least two radio devices, wherein the first voice signals acquired by the at least two radio devices correspond to the same voice input source.
And (a 6) acquiring the intensity of a first voice signal corresponding to each radio equipment in at least two radio equipments.
And (a 7) when the intensities of at least two first voice signals are greater than or equal to the preset intensity, determining the first voice signal with the maximum intensity in the at least two first voice signals, and taking the first voice signal with the maximum intensity as a voice input source signal and the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
And (a 8) when the intensities of the first voice signals collected by the at least two radio devices are smaller than the preset intensity, combining the first voice signals, and suppressing noise signals in the first voice signals collected by the at least two radio devices to obtain voice input source signals.
And (a 9) determining the first voice signal with the maximum strength in the first voice signals.
And (a 10) taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
And (a 11) acquiring a voice broadcast mode corresponding to the target radio equipment.
And (a 12) sending the voice input source signal and the voice broadcasting mode corresponding to the target radio equipment to the server.
And (a 13) receiving voice reply information returned by the server and a control instruction obtained by analyzing according to the voice input source signal, wherein the voice reply information is obtained by processing according to the voice input source signal and a voice broadcasting mode by the server.
And (a 14) executing corresponding operation according to the control command.
And (a 15) sending voice reply information to the target radio equipment so that the target radio equipment plays the voice reply information in a voice broadcasting mode.
According to the voice processing method, the corresponding relation between the user account and the equipment identifications of the at least two pieces of radio equipment is determined according to the binding instruction, and the radio equipment does not need to be bound repeatedly when the master control equipment is replaced;
when the wake-up word is detected to exist in the second voice signal, the radio equipment is controlled to carry out voice acquisition, the radio equipment is prevented from being in a meaningless voice acquisition state for a long time, the radio equipment is also prevented from acquiring user information, the power consumption of the radio equipment and the power consumption of the main control equipment are saved, and the information safety is improved;
when the intensity of the first voice signal is greater than or equal to the preset intensity, the first voice signal with the intensity greater than or equal to the preset intensity is determined as a voice input source signal, and the radio equipment corresponding to the first voice signal with the intensity greater than or equal to the preset intensity is determined as target radio equipment, so that the target radio equipment can be rapidly determined, and the voice processing efficiency is improved;
the first voice signals are combined, noise signals are suppressed, more accurate first voice signals can be obtained, the radio equipment corresponding to the first voice signals with the maximum intensity is used as target radio equipment, the problem that voice reply information is inaccurate due to the fact that the voice signals are lost can be avoided, and accuracy of voice recognition is improved;
the server is used for realizing the conversion of the voice input source signal and the voice reply information, so that the safety of voice processing can be improved; the voice reply information can be played in a corresponding voice broadcasting mode, the response mode of the radio equipment can be increased, the response mode of the radio equipment is diversified, the personalized requirements are met, and the voice reply information can be received in the area corresponding to the radio equipment;
the control instruction is obtained according to the analysis of the voice reply information, and the corresponding operation is executed according to the control instruction, so that the equipment can be remotely controlled, and the voice control is more convenient.
Fig. 4 and 5 are flow diagrams of a speech processing method according to an embodiment. It should be understood that although the steps in the flowcharts of fig. 4 and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 4 and 5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 7, fig. 7 is a block diagram of a structure of a speech processing apparatus in an embodiment, and the speech processing apparatus includes an obtaining module 702, a first processing module 704, a second processing module 706, and a sending module 708, where:
the acquiring module 702 is configured to acquire first voice signals acquired by at least two radio devices, where the first voice signals acquired by the at least two radio devices correspond to the same voice input source.
The first processing module 704 is configured to process the first voice signal according to the voice processing rule, and determine a voice input source signal and a corresponding target radio device.
The obtaining module 702 is further configured to obtain a voice broadcast mode corresponding to the target radio device.
And the second processing module 706 is configured to process the voice input source signal and the voice broadcast mode to obtain corresponding voice reply information.
The sending module 708 is configured to send the voice reply information to the target radio device, so that the target radio device plays the voice reply information in a voice broadcast manner.
The voice processing device acquires first voice signals acquired by at least two radio devices, wherein the first voice signals acquired by the at least two radio devices correspond to the same voice input source, the first voice signals are processed according to the voice processing rule, the voice input source signals and the corresponding target radio devices are determined, the voice input range is large, the target radio devices corresponding to the voice input source can be acquired, and the voice signal transmission when the distance is long is realized; the method comprises the steps of obtaining a voice broadcast mode corresponding to target radio equipment, processing according to a voice input source signal and the voice broadcast mode to obtain corresponding voice reply information, sending the voice reply information to the target radio equipment, playing the voice reply information in the voice broadcast mode by the target radio equipment, enabling the range of voice signal input to be larger, improving convenience of voice input, playing the voice reply information in the corresponding voice broadcast mode, increasing the response mode of the radio equipment, enabling the response mode of the radio equipment to be diversified, meeting personalized requirements and all receiving the voice reply information in the area corresponding to the radio equipment.
In one embodiment, the first processing module 704 is configured to obtain the strength of a first voice signal corresponding to each of at least two sound receivers; when the intensity of the first voice signal is larger than or equal to the preset intensity, the first voice signal with the intensity larger than or equal to the preset intensity is determined to be a voice input source signal, and the radio equipment corresponding to the first voice signal with the intensity larger than or equal to the preset intensity is determined to be target radio equipment.
According to the voice processing device, the intensity of the first voice signal corresponding to each radio device in at least two radio devices is obtained, when the intensity of the first voice signal is larger than or equal to the preset intensity, the first voice signal with the intensity larger than or equal to the preset intensity is determined to be the voice input source signal, and the radio device corresponding to the first voice signal with the intensity larger than or equal to the preset intensity is used as the target radio device, so that the target radio device can be rapidly determined, and the voice processing efficiency is improved.
In one embodiment, the first processing module 704 is configured to determine a first voice signal with the highest strength in the at least two first voice signals when there are at least two first voice signals with strengths greater than or equal to a preset strength; and taking the first voice signal with the maximum intensity as a voice input source signal, and taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
When the intensity of at least two first voice signals is greater than or equal to the preset intensity, the voice processing device determines the first voice signal with the maximum intensity in the at least two first voice signals; the first voice signal with the maximum intensity is used as a voice input source signal, and the radio equipment corresponding to the first voice signal with the maximum intensity is used as target radio equipment, so that the target radio equipment can be quickly determined, and the voice processing efficiency is improved.
In one embodiment, the first processing module 704 is further configured to, when the intensities of the first voice signals acquired by the at least two radio devices are both smaller than a preset intensity, combine the first voice signals, suppress noise signals in the first voice signals acquired by the at least two radio devices, and obtain a voice input source signal; determining the first voice signal with the maximum strength in the first voice signals; and taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
Above-mentioned speech processing apparatus merges first speech signal, suppresses noise signal, can obtain more accurate first speech signal, regards the radio reception equipment that the first speech signal that intensity is the biggest corresponds as target radio reception equipment, can avoid losing speech signal and the pronunciation reply information that leads to is inaccurate, improves speech recognition's accuracy.
In one embodiment, the obtaining module 702 is configured to obtain first voice signals collected by at least two radio equipments, where the first voice signals collected by the at least two radio equipments correspond to the same voice input source.
The first processing module 704 is configured to combine the first voice signals, suppress noise signals in the first voice signals collected by the at least two radio devices, and obtain a voice input source signal; acquiring the intensity of a first voice signal corresponding to each radio device in at least two radio devices; determining the first voice signal with the maximum strength in the first voice signals; and taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
The obtaining module 702 is further configured to obtain a voice broadcast mode corresponding to the target radio device.
The second processing module 706 is configured to process the voice input source signal and the voice broadcast mode to obtain corresponding voice reply information.
The sending module 708 is configured to send the voice reply message to the target radio device, so that the target radio device plays the voice reply message in a voice broadcast manner.
The voice processing device can play the voice reply information in a specific voice broadcasting mode, can avoid the inaccurate voice reply information caused by the loss of voice signals, improves the accuracy of voice recognition, and simultaneously improves the voice processing efficiency.
In one embodiment, the speech processing apparatus further comprises a receiving module. The sending module 708 is configured to send the voice input source signal and the voice broadcast mode corresponding to the target radio device to the server. The receiving module is used for receiving voice reply information returned by the server, wherein the voice reply information is obtained by processing the server according to the voice input source signal and the voice broadcasting mode.
The voice processing device sends the voice input source signal and the voice broadcast mode corresponding to the target radio equipment to the server and receives the voice reply information returned by the server, wherein the voice reply information is obtained by processing the server according to the voice input source signal and the voice broadcast mode, the server is used for realizing the conversion of the voice input source signal and the voice reply information, and the safety of voice processing can be improved.
In one embodiment, the speech processing apparatus further comprises a control module. The second processing module 706 is further configured to obtain a control instruction according to the voice input source signal. The control module is used for executing corresponding operation according to the control instruction.
The voice processing device obtains the control instruction according to the voice input source signal analysis, executes the corresponding operation according to the control instruction, can remotely control the equipment and enables the voice control to be more convenient.
In one embodiment, the speech processing apparatus further comprises a control module. The control module is used for acquiring a second voice signal acquired by the radio equipment; and when detecting that the awakening words exist in the second voice signal, controlling the radio equipment to carry out voice acquisition.
The voice processing device acquires a second voice signal acquired by the radio equipment; when a wake-up word is detected to exist in the second voice signal, the radio equipment is controlled to carry out voice acquisition, the radio equipment is prevented from being in a meaningless voice acquisition state for a long time, the radio equipment is also prevented from acquiring user information, the power consumption of the radio equipment and the power consumption of the main control equipment are saved, and the information safety is improved.
In one embodiment, the speech processing apparatus further comprises a binding module. The binding module is used for acquiring a binding instruction initiated by a logged user account for at least two radio devices; and determining the corresponding relation between the user account and the equipment identifications of at least two pieces of radio equipment according to the binding instruction.
The voice processing device acquires a binding instruction initiated by a logged user account for at least two pieces of radio equipment; and determining the corresponding relation between the user account and the device identifications of the at least two radio devices according to the binding instruction, so that the radio devices do not need to be bound repeatedly when the master control device is replaced.
FIG. 8 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the master control device 130 in fig. 1, fig. 2, or fig. 3. As shown in fig. 8, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the speech processing method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a speech processing method.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the speech processing apparatus provided in the present application may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 8. The memory of the computer device may store various program modules constituting the voice processing apparatus, such as the acquisition module, the first processing module, the second processing module, and the transmission module shown in fig. 7. The respective program modules constitute computer programs that cause the processors to execute the steps in the speech processing methods of the embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 8 may perform, by using the obtaining module in the speech processing apparatus shown in fig. 7, obtaining first speech signals collected by at least two sound receiving devices, where the first speech signals collected by the at least two sound receiving devices correspond to the same speech input source. The computer equipment can process the first voice signal according to the voice processing rule through the first processing module, and determine a voice input source signal and corresponding target radio equipment. The computer equipment can execute processing according to the voice input source signal and the voice broadcasting mode through the second processing module to obtain corresponding voice reply information. The computer equipment can execute the sending of the voice reply information to the target radio equipment through the sending module so that the target radio equipment can play the voice reply information in a voice broadcasting mode.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described speech processing method. The steps of the speech processing method here may be steps in the speech processing methods of the various embodiments described above.
In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, causes the processor to carry out the steps of the above-mentioned speech processing method. The steps of the speech processing method here may be steps in the speech processing methods of the various embodiments described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A voice processing system is characterized in that the system comprises a master control device and at least two radio devices; wherein:
the at least two radio devices are respectively used for acquiring first voice signals corresponding to the same voice input source;
the main control device is used for acquiring the first voice signal; processing the first voice signal according to a voice processing rule, and determining a voice input source signal and corresponding target radio equipment; acquiring a voice broadcast mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending the voice reply information to the target radio equipment;
the target radio equipment in the at least two radio equipments is further configured to play the voice reply information in the voice broadcast manner.
2. The system of claim 1,
the master control device is further configured to obtain the intensity of a first voice signal corresponding to each radio device of the at least two radio devices; when the intensity of the first voice signal is larger than or equal to the preset intensity, determining that the first voice signal with the intensity larger than or equal to the preset intensity is a voice input source signal, and determining that the radio equipment corresponding to the first voice signal with the intensity larger than or equal to the preset intensity is target radio equipment.
3. The system of claim 2,
the main control device is further configured to determine, when the intensities of at least two first voice signals are greater than or equal to a preset intensity, a first voice signal with a maximum intensity in the at least two first voice signals, use the first voice signal with the maximum intensity as a voice input source signal, and use a radio device corresponding to the first voice signal with the maximum intensity as a target radio device.
4. The system of claim 2,
the master control device is further configured to combine the first voice signals acquired by the at least two radio devices when the intensities of the first voice signals are smaller than a preset intensity, and suppress noise signals in the first voice signals acquired by the at least two radio devices to obtain a voice input source signal; determining the first voice signal with the maximum intensity in the first voice signals; and taking the radio equipment corresponding to the first voice signal with the maximum intensity as target radio equipment.
5. The system of claim 1,
the main control device is also used for sending the voice input source signal and a voice broadcast mode corresponding to the target radio equipment to a server; and receiving voice reply information returned by the server, wherein the voice reply information is obtained by processing the server according to the voice input source signal and the voice broadcasting mode.
6. The system of any one of claims 1 to 5,
the main control equipment is also used for obtaining a control instruction according to the voice input source signal analysis; and executing corresponding operation according to the control instruction.
7. The system according to any one of claims 1 to 5,
the master control equipment is also used for acquiring a second voice signal acquired by the radio equipment; and when detecting that the awakening word exists in the second voice signal, controlling the radio equipment to carry out voice acquisition.
8. The system according to any one of claims 1 to 5,
the master control device is further configured to combine the first voice signals through an analog beamformer; and performing enhancement processing on the combined first voice signal, and suppressing noise signals in the first voice signals collected by the at least two radio devices to obtain a voice input source signal.
9. The system according to any one of claims 1 to 5,
the master control device is also used for determining the device identification of the target radio device; and acquiring a voice broadcast mode corresponding to the voice broadcast mode identification corresponding to the equipment identification of the target radio equipment.
10. The system of claim 9,
the master control device is also used for acquiring binding instructions initiated by the logged-in user account for the at least two radio devices; determining the corresponding relation between the user account and the equipment identifications of the at least two radio equipments according to the binding instruction; and determining the equipment identification of the target radio equipment based on the corresponding relation.
11. The system of any one of claims 1 to 5,
the main control equipment is also used for acquiring the environment voice intensity; adjusting the volume of the voice reply message according to the environment voice intensity; the volume of the voice reply message is positively correlated with the environment voice intensity.
12. A speech processing method performed by a target sound receiving device of at least two sound receiving devices, the method comprising:
collecting first voice signals of the at least two radio devices corresponding to the same voice input source;
sending the first voice signal to a main control device, so that the main control device processes the first voice signal according to a voice processing rule and determines a voice input source signal; when the voice input source signal corresponds to the target radio equipment, acquiring a voice broadcast mode corresponding to the target radio equipment; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending the voice reply message to the target radio equipment;
and receiving the voice reply information sent by the master control equipment, and playing the voice reply information in the voice broadcasting mode.
13. A speech processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring first voice signals of at least two pieces of radio equipment corresponding to the same voice input source;
the sending module is used for sending the first voice signal to a main control device so that the main control device can process the first voice signal according to a voice processing rule and determine a voice input source signal; when the voice input source signal corresponds to a target radio device, acquiring a voice broadcast mode corresponding to the target radio device; processing according to the voice input source signal and the voice broadcasting mode to obtain corresponding voice reply information; sending the voice reply message to the target radio equipment;
and the playing module is used for receiving the voice reply information sent by the master control equipment and playing the voice reply information in the voice broadcasting mode.
14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method as claimed in claim 12.
15. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method as claimed in claim 12.
CN202211113080.2A 2019-06-06 2019-06-06 Speech processing system, method, apparatus, storage medium and computer device Active CN115482806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211113080.2A CN115482806B (en) 2019-06-06 2019-06-06 Speech processing system, method, apparatus, storage medium and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211113080.2A CN115482806B (en) 2019-06-06 2019-06-06 Speech processing system, method, apparatus, storage medium and computer device
CN201910491609.6A CN110224904B (en) 2019-06-06 2019-06-06 Voice processing method, device, computer readable storage medium and computer equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910491609.6A Division CN110224904B (en) 2019-06-06 2019-06-06 Voice processing method, device, computer readable storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN115482806A true CN115482806A (en) 2022-12-16
CN115482806B CN115482806B (en) 2024-06-25

Family

ID=67815951

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202211113080.2A Active CN115482806B (en) 2019-06-06 2019-06-06 Speech processing system, method, apparatus, storage medium and computer device
CN201910491609.6A Active CN110224904B (en) 2019-06-06 2019-06-06 Voice processing method, device, computer readable storage medium and computer equipment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910491609.6A Active CN110224904B (en) 2019-06-06 2019-06-06 Voice processing method, device, computer readable storage medium and computer equipment

Country Status (1)

Country Link
CN (2) CN115482806B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037771A (en) * 2023-10-08 2023-11-10 深圳市千贝科技有限公司 Zero cold water control method and device based on voice recognition storage

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114322216A (en) * 2020-09-30 2022-04-12 广州联动万物科技有限公司 Whole-house air field monitoring and interaction system and method and air conditioner

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242651A1 (en) * 2016-02-22 2017-08-24 Sonos, Inc. Audio Response Playback
CN107622767A (en) * 2016-07-15 2018-01-23 青岛海尔智能技术研发有限公司 The sound control method and appliance control system of appliance system
CN107749305A (en) * 2017-09-29 2018-03-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device
CN109074808A (en) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 Sound control method, control device and storage medium
CN109767772A (en) * 2019-03-14 2019-05-17 百度在线网络技术(北京)有限公司 Distributed sound exchange method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910500B (en) * 2016-12-23 2020-04-17 北京小鸟听听科技有限公司 Method and device for voice control of device with microphone array
CN107197090B (en) * 2017-05-18 2020-07-14 维沃移动通信有限公司 Voice signal receiving method and mobile terminal
CN107195305B (en) * 2017-07-21 2021-01-19 合肥联宝信息技术有限公司 Information processing method and electronic equipment
KR102411766B1 (en) * 2017-08-25 2022-06-22 삼성전자주식회사 Method for activating voice recognition servive and electronic device for the same
CN108461084A (en) * 2018-03-01 2018-08-28 广东美的制冷设备有限公司 Speech recognition system control method, control device and computer readable storage medium
CN108469966A (en) * 2018-03-21 2018-08-31 北京金山安全软件有限公司 Voice broadcast control method and device, intelligent device and medium
CN108733341B (en) * 2018-05-18 2021-09-14 出门问问信息科技有限公司 Voice interaction method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242651A1 (en) * 2016-02-22 2017-08-24 Sonos, Inc. Audio Response Playback
CN107622767A (en) * 2016-07-15 2018-01-23 青岛海尔智能技术研发有限公司 The sound control method and appliance control system of appliance system
CN107749305A (en) * 2017-09-29 2018-03-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device
CN109074808A (en) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 Sound control method, control device and storage medium
CN109767772A (en) * 2019-03-14 2019-05-17 百度在线网络技术(北京)有限公司 Distributed sound exchange method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037771A (en) * 2023-10-08 2023-11-10 深圳市千贝科技有限公司 Zero cold water control method and device based on voice recognition storage
CN117037771B (en) * 2023-10-08 2023-12-22 深圳市千贝科技有限公司 Zero cold water control method and device based on voice recognition storage

Also Published As

Publication number Publication date
CN110224904A (en) 2019-09-10
CN115482806B (en) 2024-06-25
CN110224904B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
US11120813B2 (en) Image processing device, operation method of image processing device, and computer-readable recording medium
CN111223497B (en) Nearby wake-up method and device for terminal, computing equipment and storage medium
CN108615527B (en) Data processing method, device and storage medium based on simultaneous interpretation
US20190355354A1 (en) Method, apparatus and system for speech interaction
US9311920B2 (en) Voice processing method, apparatus, and system
US9929709B1 (en) Electronic device capable of adjusting output sound and method of adjusting output sound
CN107966910B (en) Voice processing method, intelligent sound box and readable storage medium
CN106792097A (en) Audio signal captions matching process and device
CN110224904B (en) Voice processing method, device, computer readable storage medium and computer equipment
CN108711424B (en) Distributed voice control method and system
CN111883168B (en) Voice processing method and device
CN110875045A (en) Voice recognition method, intelligent device and intelligent television
CN111312253A (en) Voice control method, cloud server and terminal equipment
CN109270493A (en) Sound localization method and device
CN107103899B (en) Method and apparatus for outputting voice message
CN112634932B (en) Audio signal processing method and device, server and related equipment
EP3059731A1 (en) Method and apparatus for automatically sending multimedia file, mobile terminal, and storage medium
CN113518297A (en) Sound box interaction method, device and system and sound box
EP4283617A1 (en) Audio data processing method and apparatus, device, storage medium, and program product
CN111951821B (en) Communication method and device
CN108281145B (en) Voice processing method, voice processing device and electronic equipment
CN113098931B (en) Information sharing method and multimedia session terminal
CN112992137B (en) Voice interaction method and device, storage medium and electronic device
CN112767936B (en) Voice dialogue method and device, storage medium and electronic equipment
CN117351962A (en) Speech recognition control module and speech recognition control framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant