CN111312244B - Voice interaction system and method for sand table - Google Patents

Voice interaction system and method for sand table Download PDF

Info

Publication number
CN111312244B
CN111312244B CN202010097237.1A CN202010097237A CN111312244B CN 111312244 B CN111312244 B CN 111312244B CN 202010097237 A CN202010097237 A CN 202010097237A CN 111312244 B CN111312244 B CN 111312244B
Authority
CN
China
Prior art keywords
central control
voice
customer
loudspeaker
sound box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010097237.1A
Other languages
Chinese (zh)
Other versions
CN111312244A (en
Inventor
何赛娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN202010097237.1A priority Critical patent/CN111312244B/en
Publication of CN111312244A publication Critical patent/CN111312244A/en
Application granted granted Critical
Publication of CN111312244B publication Critical patent/CN111312244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice interaction system and a method for a sand table, wherein the voice interaction system for the sand table comprises the following steps: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device; the plurality of loudspeaker box devices are arranged around the sand table to collect customer voices and send the collected customer voices to the central control device; the central control equipment is used for determining response content corresponding to the received customer voice and returning the response content to the corresponding loudspeaker box equipment so as to present the response content to the customer. According to the embodiment of the invention, the plurality of sound box devices are configured to be distributed around the sand table, the consultation voice of the client is collected and uploaded, and the plurality of sound box devices are controlled by one central control device, so that the client can know the information of the building by self, building sales personnel are replaced, and the personnel configuration cost is reduced.

Description

Voice interaction system and method for sand table
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a voice interaction system and method for a sand table.
Background
In the existing real estate floor building sales places, customers need to find building sales staff to perform one-to-one explanation if the customers want to know the sand table further. However, when a plurality of persons need to consult at the same time, a plurality of building sales persons need to be arranged, which results in an increase in cost. In addition, different building sales personnel have different specialties, and the current building related information has different understanding degrees, so that the customer is often provided with different consulting services, and the user experience is influenced.
Disclosure of Invention
An embodiment of the present invention provides a voice interaction system and method for a sand table, which are used to solve at least one of the above technical problems.
In a first aspect, an embodiment of the present invention provides a voice interaction system for a sand table, including: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device;
the plurality of loudspeaker box devices are arranged around the sand table to collect customer voices and send the collected customer voices to the central control device;
the central control equipment is used for determining response content corresponding to the received customer voice and returning the response content to the corresponding loudspeaker box equipment so as to present the response content to the customer.
In some embodiments, the response content includes audio content, and presenting the response content to the customer includes broadcasting the audio content to the customer.
In some embodiments, each of the plurality of speaker devices is configured with a display screen, the response content includes image content, and the presenting the response content to the customer includes presenting the image content on the display screen.
In some embodiments, the central control apparatus is further configured to:
when the received customer voice originates from only one sound box device, determining the only one sound box device as the corresponding sound box device;
comparing the audio quality of the client voice originating from each of the plurality of loudspeaker devices when the received client voice originates from the plurality of loudspeaker devices;
and determining the sound box equipment with the highest audio quality as the corresponding sound box equipment.
In some embodiments, the central control apparatus is further configured to:
when the audio quality of the client voice received by the central control equipment from the corresponding sound box equipment is lower than a set threshold value, re-determining new corresponding sound box equipment;
and determining the response content according to the client voice from the new corresponding loudspeaker box device and the conversation information which is stored by the central control device and is generated based on the previous corresponding loudspeaker box device.
In a second aspect, an embodiment of the present invention provides a voice interaction method for a sand table, which is applied to a voice interaction system for a sand table, where the system includes: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device; the method comprises the following steps:
the central control equipment receives the client voices monitored by the loudspeaker box equipment, and the loudspeaker box equipment is used for being arranged around the sand table and monitoring the client voices;
the central control device determines response content corresponding to the received customer voice and returns the response content to the corresponding sound box device so as to present the response content to the customer.
In some embodiments, the central control apparatus is further configured to:
when the received customer voice originates from a unique sound box device, determining the unique sound box device as the corresponding sound box device;
comparing the audio quality of the client voice originating from each of the plurality of loudspeaker devices when the received client voice originates from the plurality of loudspeaker devices;
and determining the sound box equipment with the highest audio quality as the corresponding sound box equipment.
In some embodiments, the central control apparatus is further configured to:
when the audio quality of the customer voice received by the central control equipment from the corresponding sound box equipment is lower than a set threshold value, re-determining new corresponding sound box equipment;
and determining the response content according to the client voice from the new corresponding loudspeaker box device and the conversation information which is stored by the central control device and is generated based on the previous corresponding loudspeaker box device.
In a third aspect, an embodiment of the present invention provides a storage medium, where one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above-described voice interaction methods for a sand table of the present invention.
In a fourth aspect, an electronic device is provided, comprising: the device comprises at least one processor and a memory which is in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute any one of the voice interaction methods for the sand table.
In a fifth aspect, the present invention further provides a computer program product, where the computer program product includes a computer program stored on a storage medium, and the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is caused to execute any one of the above voice interaction methods for a sand table.
The embodiment of the invention has the beneficial effects that: through configuring a plurality of sound box devices to be distributed around the sand table, the consultation voice of the client is collected and uploaded, and one central control device is adopted to control the plurality of sound box devices, so that the client can realize the understanding of the information of the building by self, building selling personnel are replaced, and the personnel configuration cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram of an embodiment of a voice interaction system for a sand table according to the present invention;
FIG. 2 is a flowchart of an embodiment of a voice interaction method for a sand table according to the present invention;
FIG. 3 is a flow chart of another embodiment of a voice interaction method for a sand table of the present invention;
FIG. 4 is a flow chart of another embodiment of a voice interaction method for a sand table of the present invention;
fig. 5 is a schematic structural diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this application, the terms "module," "apparatus," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
An embodiment of the present invention provides a voice interaction system for a sand table, including: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device; the plurality of loudspeaker box devices are arranged around the sand table to collect customer voices and send the collected customer voices to the central control device; the central control equipment is used for determining response content corresponding to the received customer voice and returning the response content to the corresponding loudspeaker box equipment so as to present the response content to the customer.
The central control device is an electronic device with voice recognition and/or voice generation capability, and the voice recognition and/or voice generation capability may be implemented locally, online through a server, or in a combined manner of offline and online, which is not limited by the present invention.
For example, the plurality of sound box devices may be a plurality of sound boxes with microphones, distributed around the entire sand table, and distributed uniformly or non-uniformly according to actual conditions. The number of the sound boxes is more than or equal to 2, and the number is determined according to the size of the sand table. The number of the microphones on the sound box can be 1, and also can be in a microphone array form. Of course, it is also possible to choose a device with separate microphones and loudspeakers, for example several microphones for one loudspeaker, instead of only the case with a microphone box.
According to the embodiment of the invention, the plurality of sound box devices are configured to be distributed around the sand table, the consultation voice of the client is collected and uploaded, and the plurality of sound box devices are controlled by one central control device, so that the client can know the information of the building by self, building selling personnel are replaced, and the personnel configuration cost is reduced.
Fig. 1 is a schematic diagram of an embodiment of a voice interaction system for a sand table according to the present invention, in which 10 strip microphone boxes are disposed around the sand table. Customers may interact with the sand table in speech anywhere around the sand table. The distributed microphones can be used for nearby awakening and interaction, the problem of difficulty in far-field identification is avoided, interference collected by the microphones is reduced, and interaction experience is improved.
In some embodiments, the response content includes audio content, and presenting the response content to the customer includes broadcasting the audio content to the customer.
In some embodiments, each of the plurality of speaker devices is configured with a display screen, the response content includes image content, and the presenting the response content to the customer includes presenting the image content on the display screen.
In some embodiments, the central control apparatus is further configured to:
when the received customer voice originates from only one sound box device, determining the only one sound box device as the corresponding sound box device;
comparing the audio quality of the client voice originating from each of the plurality of loudspeaker devices when the received client voice originates from the plurality of loudspeaker devices;
and determining the sound box equipment with the highest audio quality as the corresponding sound box equipment.
Illustratively, the closer the client is to the loudspeaker devices, the better the audio quality of the client's speech captured by the loudspeaker devices, and when only one loudspeaker device is in an idle state (other loudspeaker devices are already serving the speech consultation of other clients), then only the idle loudspeaker device captures the speech of the current client, thereby determining the loudspeaker device as the corresponding loudspeaker device, and when the reply content is determined, also directly replying the reply content to the corresponding loudspeaker device for presentation to the current client.
In addition, when there are multiple idle loudspeaker box devices, these loudspeaker box devices can all collect the voice of the current client, so that the central control device will also receive multiple paths of client voices, but it is not necessary that the same client simultaneously occupies multiple loudspeaker box devices, and therefore the central control device end will determine the optimal loudspeaker box device as the corresponding loudspeaker box device according to the audio quality of the multiple paths of client voices.
When determining the audio quality of the multi-channel client voice, the audio quality can be indirectly determined according to the distance between the client and the loudspeaker device (the closer the distance is, the higher the quality is), can be directly determined according to the energy of each channel of client voice (the higher the energy is, the higher the quality is), or can be determined according to the signal-to-noise ratio of the multi-channel client voice (the higher the signal-to-noise ratio is, the higher the quality is).
Illustratively, as shown in fig. 1, when the customer is in the vicinity of the mike box 1, although multiple channels of customer voices collected by the mike box 1, the mike box 2, the mike box 10, and the like can be received at the same time, they are determined to be the corresponding speaker devices because they are closest to the mike box 1.
Illustratively, as shown in fig. 1, when the client is located almost in the middle of the strip speaker 4 and the strip speaker 5, the corresponding speaker device can be determined according to the signal-to-noise ratio of two paths of client voices from the two strip speakers.
In some embodiments, the central control apparatus is further configured to:
when the audio quality of the customer voice received by the central control equipment from the corresponding sound box equipment is lower than a set threshold value, re-determining new corresponding sound box equipment;
and determining the response content according to the client voice from the new corresponding loudspeaker box device and the conversation information which is stored by the central control device and is generated based on the previous corresponding loudspeaker box device.
According to the method and the system, new corresponding loudspeaker box equipment can be dynamically determined according to the change of the position of the client, so that the voice collection quality of the client is ensured, and even if the client walks around a sand table and converses with the sand table at different positions, the high-quality client voice can be collected all the time, so that accurate recognition and understanding of the intention of the user are ensured, and the reply content is more accurately determined.
Further, although the optimal speaker device is re-determined, the first few rounds of dialog information performed by the client through the previous optimal speaker device are still considered when determining the current reply content, so that the center control device can be helped to more accurately understand the user intention in combination with the above.
Illustratively, the central control equipment determines and stores voiceprint information according to client voice, and stores historical dialogue information corresponding to the voiceprint information; after determining a new corresponding loudspeaker box device, when the central control device receives a new client voice, firstly extracting the voiceprint information of the new client voice, searching whether corresponding historical conversation information is stored or not, and if so, determining corresponding reply content according to the corresponding historical conversation information and the new client voice.
Illustratively, the voice print of the client is used as the ID, so that the same person is identified, and the voice instruction of the client can be semantically understood and more accurately based on the conversation context which is already completed by the client.
As shown in fig. 2, an embodiment of the present invention further provides a voice interaction method for a sand table, which is applied to a voice interaction system for a sand table, where the system includes: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device; the method comprises the following steps:
s10, the central control device receives the client voices monitored by the multiple loudspeaker box devices, and the multiple loudspeaker box devices are used for being arranged around the sand table and monitoring the client voices;
and S20, the central control device determines the response content corresponding to the received customer voice and returns the response content to the corresponding sound box device so as to present the response content to the customer.
As shown in fig. 3, another embodiment of the voice interaction method for a sand table according to the present invention is shown, in which the central control apparatus is further configured to:
s30, when the received customer voice comes from only one sound box device, determining the only one sound box device as the corresponding sound box device;
s40, when the received customer voice originates from a plurality of sound box devices, comparing the audio quality of the customer voice originating from each sound box device in the plurality of sound box devices;
and S50, determining the sound box equipment with the highest audio quality as the corresponding sound box equipment.
As shown in fig. 4, another embodiment of the voice interaction method for a sand table according to the present invention is shown, in which the central control apparatus is further configured to:
s60, when the audio quality of the customer voice received by the central control equipment from the corresponding sound box equipment is lower than a set threshold value, re-determining new corresponding sound box equipment;
and S70, determining the response content according to the client voice from the new corresponding sound box device and the conversation information which is stored by the central control device and is generated based on the previous corresponding sound box device.
Illustratively, the specific processing flow of the voice interaction method for the sand table of the present invention is as follows:
1) collecting signals by a microphone: the microphones of a plurality of loudspeakers form a network of microphones which simultaneously pick up signals. If a microphone array consisting of a plurality of microphones is arranged on the sound box, a plurality of paths of audio signals are collected on each sound box. The collected signals include useful voice signals of customers, noise and other interference sounds in the environment, voice synthesis broadcast sound carried by a sound box, and the like.
2) Signal processing: in order to obtain a cleaner user useful voice signal and improve the interactive experience, the signal collected by the microphone needs to be processed to enhance the useful signal. The voice synthesis of the sound box needs to be suppressed by using an echo cancellation algorithm; noise in the environment needs to be solved by noise reduction means; if a microphone array is present on the loudspeaker enclosure, speech enhancement can be performed using array signal processing means such as beamforming.
3) Equipment selection: the client then selects a device to interact with using the wake-up word. When the client speaks the wake-up word, at least one device is woken up. One device is then selected by energy in the wake-up device. Assuming that there is a customer in the location of the loadspeaker 1, when he speaks a wake-up word, such as "hello", the loadspeaker 1 closest to him will respond and complete the next series of interactions with him. For example, when a client station calls out a wakeup word between the loadspeaker 4 and the loadspeaker 5, the loadspeaker 4 and the loadspeaker 5 are likely to be woken up at the same time, and then the loadspeaker 4 and the loadspeaker 5 need to calculate the voice energy collected by the loadspeaker 4 and the loadspeaker 5, and select the channel with large energy, such as the loadspeaker 4, to interact with the client station.
4) And (3) identifying interaction: when the interactive equipment is determined, easy interaction can be carried out. And a microphone of the appointed sound box collects the voice signal of the client, and voice recognition is carried out after enhancement. For example, a client says "do i want to know which house types 1 is? "after recognition and semantic understanding, the device tells you, for example," a family of 89-flat three-room two-hall one-toilet and 115-flat three-room two-hall two-toilet, you want to know which? ". Thereby performing multiple rounds of interaction. When the client walks to other positions, the nearby equipment is awakened, and the next round of interaction can be carried out.
Through the layout of the distributed microphone, on one hand, the problem of reduced recognition rate caused by far-field pickup can be avoided, on the other hand, the workload of sales can be reduced, and meanwhile, the client can know information of all aspects of the building by himself.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above-mentioned voice interaction system for a sand table in the embodiment of the present invention may be used to execute the voice interaction method for a sand table in the embodiment of the present invention, and accordingly achieve the technical effect achieved by the implementation of the voice interaction method for a sand table in the embodiment of the present invention, which is not described herein again. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
In some embodiments, the present invention provides a non-transitory computer readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any of the above-described voice interaction methods for a sand table of the present invention.
In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the above-described voice interaction methods for a sand table.
In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a voice interaction method for a sand table.
In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the computer program is used for implementing a voice interaction method for a sand table when being executed by a processor.
Fig. 5 is a schematic hardware structure diagram of an electronic device for executing a voice interaction method for a sand table according to another embodiment of the present application, and as shown in fig. 5, the electronic device includes:
one or more processors 510 and memory 520, with one processor 510 being an example in fig. 5.
The apparatus for performing the voice interaction method for the sand table may further include: an input device 530 and an output device 540.
The processor 510, the memory 520, the input device 530, and the output device 540 may be connected by a bus or other means, and the bus connection is exemplified in fig. 5.
The memory 520, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the voice interaction method for a sandbox in the embodiments of the present application. The processor 510 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 520, that is, implements the voice interaction method for the sand table of the above method embodiment.
The memory 520 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the voice interactive apparatus for a sand table, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 520 may optionally include memory remotely located from processor 510, which may be connected over a network to a voice interaction device for a sand table. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 530 may receive input numeric or character information and generate signals related to user settings and function control of the voice interaction device for the sand table. The output device 540 may include a display device such as a display screen.
The one or more modules are stored in the memory 520 and when executed by the one or more processors 510 perform the voice interaction method for a sandbox in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.

Claims (8)

1. A voice interaction system for a sand table, comprising: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device;
the plurality of loudspeaker box devices are arranged around the sand table to collect customer voices and send the collected customer voices to the central control device;
the central control equipment is used for determining the response content corresponding to the received customer voice and returning the response content to the corresponding loudspeaker box equipment so as to present the response content to the customer,
wherein the central control device is further configured to:
when the received customer voice originates from only one sound box device, determining the only one sound box device as the corresponding sound box device;
comparing the audio quality of the client voice originating from each of the plurality of loudspeaker devices when the received client voice originates from the plurality of loudspeaker devices;
and determining the sound box equipment with the highest audio quality as the corresponding sound box equipment.
2. The system of claim 1, wherein the response content comprises audio content, and wherein presenting the response content to a customer comprises broadcasting the audio content to the customer.
3. The system of claim 1, wherein the plurality of speaker devices are each configured with a display screen, the responsive content includes image content, and the presenting the responsive content to the customer includes presenting the image content on the display screen.
4. The system of any of claims 1-3, wherein the central control device is further to:
when the audio quality of the customer voice received by the central control equipment from the corresponding sound box equipment is lower than a set threshold value, re-determining new corresponding sound box equipment;
and determining the response content according to the client voice from the new corresponding loudspeaker box device and the conversation information which is stored by the central control device and is generated based on the previous corresponding loudspeaker box device.
5. A voice interaction method for a sand table is applied to a voice interaction system for the sand table, and the system comprises the following steps: the loudspeaker box comprises a central control device and a plurality of loudspeaker box devices, wherein the loudspeaker box devices are in communication connection with the central control device; the method comprises the following steps:
the central control equipment receives the client voices monitored by the loudspeaker box equipment, and the loudspeaker box equipment is used for being arranged around the sand table and monitoring the client voices;
the central control device determines response content corresponding to the received customer voice and returns the response content to the corresponding sound box device so as to present the response content to the customer, wherein the central control device is further used for:
when the received customer voice originates from only one sound box device, determining the only one sound box device as the corresponding sound box device;
comparing the audio quality of the client voice originating from each of the plurality of loudspeaker devices when the received client voice originates from the plurality of loudspeaker devices;
and determining the sound box equipment with the highest audio quality as the corresponding sound box equipment.
6. The method of claim 5, wherein the central control device is further configured to:
when the audio quality of the customer voice received by the central control equipment from the corresponding sound box equipment is lower than a set threshold value, re-determining new corresponding sound box equipment;
and determining the response content according to the client voice from the new corresponding loudspeaker box device and the conversation information which is stored by the central control device and is generated based on the previous corresponding loudspeaker box device.
7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 5-6.
8. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 5 to 6.
CN202010097237.1A 2020-02-17 2020-02-17 Voice interaction system and method for sand table Active CN111312244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010097237.1A CN111312244B (en) 2020-02-17 2020-02-17 Voice interaction system and method for sand table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010097237.1A CN111312244B (en) 2020-02-17 2020-02-17 Voice interaction system and method for sand table

Publications (2)

Publication Number Publication Date
CN111312244A CN111312244A (en) 2020-06-19
CN111312244B true CN111312244B (en) 2022-05-17

Family

ID=71161685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010097237.1A Active CN111312244B (en) 2020-02-17 2020-02-17 Voice interaction system and method for sand table

Country Status (1)

Country Link
CN (1) CN111312244B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735462B (en) * 2020-12-30 2024-05-31 科大讯飞股份有限公司 Noise reduction method and voice interaction method for distributed microphone array

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105553114A (en) * 2016-03-04 2016-05-04 国网冀北电力有限公司廊坊供电公司 Four-dimensional digital sand table system and power monitoring method
CN106346487A (en) * 2016-08-25 2017-01-25 威仔软件科技(苏州)有限公司 Interactive VR sand table show robot
CN109257590A (en) * 2018-08-30 2019-01-22 杭州行开科技有限公司 A kind of naked eye 3D sand table display system and its method
CN110019683A (en) * 2017-12-29 2019-07-16 同方威视技术股份有限公司 Intelligent sound interaction robot and its voice interactive method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105553114A (en) * 2016-03-04 2016-05-04 国网冀北电力有限公司廊坊供电公司 Four-dimensional digital sand table system and power monitoring method
CN106346487A (en) * 2016-08-25 2017-01-25 威仔软件科技(苏州)有限公司 Interactive VR sand table show robot
CN110019683A (en) * 2017-12-29 2019-07-16 同方威视技术股份有限公司 Intelligent sound interaction robot and its voice interactive method
CN109257590A (en) * 2018-08-30 2019-01-22 杭州行开科技有限公司 A kind of naked eye 3D sand table display system and its method

Also Published As

Publication number Publication date
CN111312244A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN110288997B (en) Device wake-up method and system for acoustic networking
CN109461449B (en) Voice wake-up method and system for intelligent device
CN108962263B (en) A kind of smart machine control method and system
CN108470034B (en) A kind of smart machine service providing method and system
CN111049996B (en) Multi-scene voice recognition method and device and intelligent customer service system applying same
CN108922553B (en) Direction-of-arrival estimation method and system for sound box equipment
CN112017681B (en) Method and system for enhancing directional voice
CN108962240A (en) A kind of sound control method and system based on earphone
CN110827858B (en) Voice endpoint detection method and system
CN109524013B (en) Voice processing method, device, medium and intelligent equipment
CN112562742B (en) Voice processing method and device
CN112735398B (en) Man-machine conversation mode switching method and system
CN111142833B (en) Method and system for developing voice interaction product based on contextual model
CN107799113B (en) Audio processing method and device, storage medium and mobile terminal
CN112687286A (en) Method and device for adjusting noise reduction model of audio equipment
JP2024507916A (en) Audio signal processing method, device, electronic device, and computer program
CN112767916A (en) Voice interaction method, device, equipment, medium and product of intelligent voice equipment
CN111312244B (en) Voice interaction system and method for sand table
CN113241085B (en) Echo cancellation method, device, equipment and readable storage medium
CN110890104B (en) Voice endpoint detection method and system
CN112700767B (en) Man-machine conversation interruption method and device
CN111161734A (en) Voice interaction method and device based on designated scene
CN110517682A (en) Audio recognition method, device, equipment and storage medium
CN112466305B (en) Voice control method and device of water dispenser
CN112786031B (en) Man-machine conversation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant