CN114783429A

CN114783429A - Man-machine interaction system, server, interaction terminal, interaction method and electronic equipment

Info

Publication number: CN114783429A
Application number: CN202210194596.8A
Authority: CN
Inventors: 刘少耿; 王炜炀; 冯德贞; 冯文茜; 李佳择; 文成; 王鹏; 王佳
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-07-22

Abstract

The disclosure provides a human-computer interaction system, a server, an interaction terminal, an interaction method and electronic equipment, and relates to the technical field of computers, in particular to the technical field of human-computer interaction. The man-machine interaction system comprises a server, a first interaction terminal and a second interaction terminal. The first interactive terminal is configured to receive a voice input; generating a first request instruction by performing voice recognition on the voice input; and sending a first request instruction to the server. The server is configured to store, in response to receiving the first request instruction, a control instruction corresponding to the first request instruction in the cache. The second interactive terminal is configured to: based on the second request instruction, calling an interface in the server to read the control instruction from the cache of the server through the interface; and operating the read control instruction to display the content corresponding to the control instruction.

Description

Man-machine interaction system, server, interaction terminal, interaction method and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a human-computer interaction system, a server, an interaction terminal, an interaction method, an electronic device, a computer-readable storage medium, and a computer program product.

Background

The man-machine interaction technology is a technology for realizing human-computer conversation in an effective mode through computer input and output equipment. With the development and widespread application of voice recognition technology, human-computer interaction by using the voice recognition technology is more and more common. There is a demand for human-computer interaction by voice in a scene using, for example, a road information presentation screen.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been acknowledged in any prior art, unless otherwise indicated.

Disclosure of Invention

The present disclosure provides a human-computer interaction system, a server, an interaction terminal, an interaction method, an electronic device, a computer-readable storage medium, and a computer program product.

According to one aspect of the disclosure, a human-computer interaction system is provided, which includes a server, a first interaction terminal and a second interaction terminal. The first interactive terminal is configured to: receiving a voice input of a user; generating a first request instruction by performing voice recognition on the voice input; and sending the first request instruction to the server. The server is configured to: in response to receiving the first request instruction, a control instruction corresponding to the first request instruction is stored in a cache. The second interactive terminal is configured to: based on the second request instruction, calling an interface in the server to read the control instruction from the cache of the server through the interface; and operating the read control instruction to display the content corresponding to the control instruction.

According to another aspect of the present disclosure, a server for a human-computer interaction system is provided, the system further comprising a first interaction terminal and a second interaction terminal. The server is configured to: storing a control instruction corresponding to a first request instruction in a cache based on the first request instruction, wherein the first request instruction is generated by performing voice recognition on voice input of a user; configuring a corresponding interface for the control instruction stored in the cache; and providing a control instruction to the second interactive terminal via the interface in response to receiving the call of the second interactive terminal.

According to another aspect of the present disclosure, an interactive terminal for a human-computer interaction system is provided, the system further comprising a server. The interactive terminal is configured to: calling an interface in the server based on a second request instruction to read the control instruction from a cache of the server through the interface, wherein the second request instruction is generated by performing voice recognition on voice input of a user; and operating the read control instruction to display the content corresponding to the control instruction.

According to another aspect of the present disclosure, an interaction method performed by a server for a human-computer interaction system is provided, wherein the system further comprises a first interaction terminal and a second interaction terminal. The method comprises the following steps: storing a control instruction corresponding to a first request instruction in a cache based on the first request instruction, wherein the first request instruction is generated by performing voice recognition on voice input of a user; configuring a corresponding interface for the control instruction stored in the cache; and providing a control instruction to the second interactive terminal via the interface in response to receiving the call of the second interactive terminal.

According to another aspect of the present disclosure, an interaction method performed by an interaction terminal for a human-computer interaction system is provided, wherein the system further comprises a server. The method comprises the following steps: calling an interface in the server based on a second request instruction to read the control instruction from a cache of the server through the interface, wherein the second request instruction is generated by performing voice recognition on voice input of a user; and operating the read control instruction to display the content corresponding to the control instruction.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any one of the methods described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform any of the methods described above.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes any of the above methods when executed by a processor.

According to one or more embodiments of the disclosure, resource occupation in a human-computer interaction system can be reduced.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

Fig. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with embodiments of the present disclosure;

FIG. 2 shows a block diagram of a human-computer interaction system according to an embodiment of the present disclosure;

FIG. 3 illustrates an application scenario diagram of a human-computer interaction system according to an embodiment of the present disclosure;

FIG. 4 shows a flow chart of an interaction method performed by a server for a human interaction system according to an embodiment of the present disclosure;

FIG. 5 shows a flow chart of an interaction method performed by an interaction terminal for a human-computer interaction system, according to an embodiment of the present disclosure; and

FIG. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", and the like to describe various elements is not intended to limit the positional relationship, the temporal relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing the particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

As mentioned above, there is a demand for human-computer interaction by voice in a scene such as using a road information presentation screen. For example, an interactive terminal (e.g., an interactive terminal including a road information presentation screen) may be provided in a highway information presentation hall to perform road information presentation, and a server communicatively connected to the interactive terminal may be provided to process voice data.

In the related art, a user may interact with an interactive terminal provided in an exhibition hall by voice. The interactive terminal may receive a voice input of a user and may communicate with the server using, for example, a long connection based on a WebSocket Protocol, which is a Protocol for full duplex communication over a single TCP (Transmission Control Protocol) connection. And after the server carries out a series of operations based on the voice input, the server pushes a control command corresponding to the content of the voice input to the interactive terminal based on the protocol. However, in this application scenario, the user may not need to interact with the interactive terminal at any time, and maintaining such a long connection usually requires occupying more system resources.

The present disclosure provides a data processing method, by storing, by a server, a control instruction corresponding to a first request instruction in a cache of the server, and calling, by a second interactive terminal, an interface in the server based on a second request instruction, thereby reading the control instruction in the cache of the server via the called interface. Therefore, long connection communication between the server and the second interactive terminal can be eliminated, when a user needs to interact with the interactive terminal, the second interactive terminal calls an interface in the server based on the second request instruction, and occupation of long connection communication on human-computer interaction system resources is reduced.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an example system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the interaction methods according to embodiments of the present disclosure to be performed.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.

A user may interact with

client devices

101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems, such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 can also run any of a variety of additional server applications and/or mid-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 can include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and/or 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and/or 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the conventional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or conventional stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Fig. 2 shows a block diagram of a human-computer interaction system 200 according to an embodiment of the present disclosure.

As shown in fig. 2, the human-computer interaction system 200 includes a first interaction terminal 210, a second interaction terminal 220, and a server 230.

The first interaction terminal 210 is configured to: receiving a voice input of a user; generating a first request instruction by performing voice recognition on the voice input; and sends the first request instruction to the server 230.

The server 230 is configured to: in response to receiving the first request instruction, a control instruction corresponding to the first request instruction is stored in a cache.

The second interactive terminal 220 is configured to: based on the second request instruction, invoking an interface in the server 230 to read the control instruction from the cache of the server 230 via the interface; and operating the read control instruction to display the content corresponding to the control instruction.

The control instruction in the cache of the server 230 is read via the called interface by the server 230 storing the control instruction corresponding to the first request instruction in the cache of the server 230 and calling the interface in the server 230 by the second interactive terminal 220 based on the second request instruction. Therefore, long connection communication between the server 230 and the second interactive terminal 220 can be eliminated, when a user needs to interact with the interactive terminal (210 or 220), the second interactive terminal 220 calls an interface in the server 230 based on the second request instruction, and occupation of system resources by the long connection communication is reduced.

In one example, the second interaction terminal 220 may be an interaction terminal including a large screen. The large screen can be used for displaying pictures to audiences in the exhibition hall, so that man-machine interaction is realized. For example, the large screen may execute a control instruction for displaying an image or a video read by the second interactive terminal 220, so as to display the corresponding image or video to the user.

According to some embodiments, the first interaction terminal 210 may be a recognition robot capable of following the movements of the user. For example, the first interactive terminal 210 may be a voice recognition robot capable of following the movement of the user. In some scenarios, the user may need to move around in front of the second interactive terminal 220 comprising a large screen to view the picture displayed by the second interactive terminal 220 in all directions; in other scenarios, the user may not be able to approach the second interaction terminal 220 comprising a large screen. In the above scenario, the first interactive terminal 210 may follow the user, so as to obtain the voice input of the user in time and perform corresponding voice recognition, thereby providing a smoother human-computer interaction experience.

In some examples, the recognition robot may be awakened by the user inputting particular voice content and may twist the head of the robot toward the user, thereby indicating that the user may proceed with voice input for interaction.

In some examples, the recognition robot may be a recognition robot with a proximity sensor that may wake up when a user approaches the recognition robot and twist the head of the robot towards the user, thereby indicating that the user may proceed with voice input for interaction.

In addition, it should be understood that communication transmission can be performed between the first interactive terminal 210 and the second interactive terminal 220, and between the first interactive terminal 210 and the server 230 by using wireless communication, which is not described herein.

According to some embodiments, the second request instruction is received from the first interactive terminal 210. In other words, the second interactive terminal 220 may obtain (e.g., by way of wireless transmission) the first request instruction generated by the first interactive terminal 210, and call a corresponding interface in the server 230 based on the request instruction, so as to read the control instruction corresponding to the request instruction from the cache of the server 230 via the interface. Therefore, only by performing voice recognition on the voice input of the user through the first interactive terminal 210, the first interactive terminal 210 and the second interactive terminal 220 can both acquire the request instruction generated by performing voice recognition on the voice input, so that the request instructions acquired by the first interactive terminal 210 and the second interactive terminal 220 are consistent, and the accuracy of voice interaction is improved.

According to some embodiments, the second interaction terminal 220 may be further configured to: receiving the voice input of a user; and generating a second request instruction by performing voice recognition on the received voice input. In other words, the above mentioned voice input of the user can also be received by the second interactive terminal 220 (for example, the second interactive terminal 220 may comprise a sound collecting unit, such as a microphone), and the second interactive terminal 220 can also perform voice recognition on the received voice input to generate the second request instruction, just like the first interactive terminal 210 described above. The second interactive terminal 220 further calls an interface corresponding to the request instruction in the server 230 based on the second request instruction generated by the second interactive terminal, so as to read the control instruction stored in the cache of the server 230. Therefore, the first interactive terminal 210 and the second interactive terminal 220 both collect the voice input of the user and perform voice recognition, so that channels for collecting the voice input of the user can be increased, and the user experience is further improved.

According to some embodiments, the second interactive terminal 220 may be further configured to send feedback to the first interactive terminal 210 in response to successfully executing the control instruction. And the first interactive terminal 210 may be further configured to perform a voice announcement based on the feedback in response to receiving the feedback. Therefore, after the second interactive terminal 220 successfully runs the control instruction, the user can obtain the voice broadcast feedback from the first interactive terminal 210, so that the user further knows that the interaction is successfully executed, and the user experience is optimized. For example, the user inputs voice to request the large screen of the second interactive terminal 220 to play a video, when the second interactive terminal 220 successfully runs the control instruction, i.e., successfully plays the video, feedback is sent to the first interactive terminal 210 (e.g., an identification robot moving along with the user), and the first interactive terminal 210 performs voice broadcast on the user to inform the user that the video has been successfully played.

In some scenarios, the second interactive terminal 220 may fail to execute the corresponding control instruction due to unstable signal transmission or the system itself, which means that the interaction desired by the user fails to be completed.

According to some embodiments, the first interaction terminal 210 may be further configured to: in response to determining that the first request instruction has been sent, starting timing; and in response to not receiving feedback from the second interactive terminal 220 within the duration threshold, sending the first request instruction to the server 230 again. If the first interactive terminal 210 fails to receive the feedback from the second interactive terminal 220 within a certain time period after sending the first request instruction, it means that the interaction is not completed, and by sending the first request instruction to the server 230 again, it can be ensured that the second interactive terminal 220 successfully completes the operation desired by the user, and the user experience is improved.

According to some embodiments, the first interaction terminal 210 may be further configured to: in response to determining that the first request instruction has been sent, capturing an image currently displayed by the second interactive terminal; performing image recognition on the shot image; and in response to not recognizing the preset identifier in the image, sending the first request instruction to the server 230 again. By shooting the image currently displayed by the second interactive terminal 220 and recognizing the preset identifier in the image, when the preset identifier is not recognized, which may mean that the interaction is not completed, the first request instruction is sent to the server 230 again, so that the second interactive terminal 220 can be ensured to successfully complete the operation desired by the user, and the user experience is improved. The preset identifier may be a preset two-dimensional code, and when the image currently displayed by the second interactive terminal 220 and captured by the first interactive terminal 210 does not include the preset two-dimensional code, it may mean that the operation desired by the user is not successfully performed.

In the following, with continued reference to fig. 3, a man-machine interaction system 200 according to an embodiment of the present disclosure will be further described. FIG. 3 shows an application scenario diagram of a human-computer interaction system according to an embodiment of the present disclosure. In FIG. 3, first interactive terminal 310, second interactive terminal 320, and server 330 are similar to first interactive terminal 210, second interactive terminal 220, and server 230 described above with respect to FIG. 2. Here, an identification robot will be taken as the first interactive terminal 310, and an interactive terminal including a large screen will be taken as the second interactive terminal 320. In addition, FIG. 3 also shows a user 340 in the scene.

As shown in fig. 3, the user 340 may stand in front of the large screen of the second interaction terminal 320, and the first interaction terminal 310 may follow near the user 340. The user 340 may perform voice input, for example, the user 340 may speak a voice of "playing road conditions of a highway", and the first interactive terminal 310 receives the voice input of the user 340; and performing voice recognition on the voice input to generate a first request instruction (for example, recognizing the voice "playing the road condition of the expressway" into a corresponding text as the first request instruction); further, the first interactive terminal 310 sends the first request instruction to the server 330 (step S302).

The server 330 stores a control instruction (e.g., a program for playing highway traffic and a corresponding source file) corresponding to the first request instruction (e.g., "playing highway traffic") in the cache of the server 330 in response to receiving the first request instruction.

The second interactive terminal 320 may call an interface in the server 330 based on the second request instruction to read the control instruction from the cache of the server 330 via the interface (step S303); and the read control instruction is executed to display the content corresponding to the control instruction (for example, the large screen of the second interactive terminal 320 plays the road condition of the expressway).

In some examples, the second interactive terminal 320 may obtain the first request instruction generated by the first interactive terminal 310 from the first interactive terminal 310 (step S304) as the second request instruction. In some examples, the second interactive terminal 320 may also receive a voice input of the user 340 (step S305), and generate a corresponding second request instruction by performing voice recognition on the received voice input, and invoke the interface of the server 330 based on the second request instruction generated by the second interactive terminal 320 itself.

In some examples, the second interactive terminal 320 may send feedback to the first interactive terminal 310 in response to successfully playing the highway condition (step S306). And the first interactive terminal 310 may be further configured to, in response to receiving the feedback, voice-over to the user 340 based on the feedback.

In some examples, the first interactive terminal 310 may start timing in response to determining that the first request instruction has been sent; and in response to not receiving feedback from the second interactive terminal 320 within the duration threshold, again sending the first request instruction to the server 330.

In some examples, the first interactive terminal 310 may capture an image currently displayed by the second interactive terminal 320 and perform image recognition on the captured image in response to determining that the first request instruction has been transmitted; and in response to not recognizing the preset identifier in the image, sending the first request instruction to the server 330 again.

According to an aspect of the present disclosure, there is also provided a server for a human-computer interaction system, the system further including a first interaction terminal and a second interaction terminal, wherein the server is configured to: storing a control instruction corresponding to a first request instruction in a cache based on the first request instruction, wherein the first request instruction is generated by performing voice recognition on voice input of a user; configuring a corresponding interface for the control instruction stored in the cache; and responding to the received call of the second interactive terminal, and providing the control instruction to the second interactive terminal through the interface.

Therefore, the control instruction corresponding to the first request instruction is stored in the cache of the server by the server and can be called by the second interactive terminal, so that long connection communication between the server and the second interactive terminal can be eliminated, and occupation of system resources by communication is reduced.

According to an aspect of the present disclosure, there is also provided an interactive terminal for a human-computer interaction system, the system further including a server, wherein the interactive terminal is configured to: calling an interface in the server to read a control instruction from a cache of the server through the interface based on a second request instruction, wherein the second request instruction is generated by performing voice recognition on voice input of a user; and operating the read control instruction to display the content corresponding to the control instruction.

Thereby, by calling an interface in the server by the second interactive terminal based on the second request instruction, the control instruction in the cache of the server is read via the called interface. Long connection communication between the server and the second interactive terminal can be eliminated, and occupation of system resources by communication is reduced.

Fig. 4 shows a flow diagram of an interaction method 400 performed by a server for a human-computer interaction system according to an embodiment of the present disclosure.

As shown in fig. 4, the method 400 includes: step S410, storing a control instruction corresponding to a first request instruction in a cache based on the first request instruction, wherein the first request instruction is generated by performing voice recognition on voice input of a user; step S420, configuring a corresponding interface for the control instruction stored in the cache; and step S430, in response to receiving the call of the second interactive terminal, providing the control instruction to the second interactive terminal via the interface. Therefore, the control instruction corresponding to the first request instruction is stored in the cache of the server by the server and can be called by the second interactive terminal, so that long connection communication between the server and the second interactive terminal can be eliminated, and occupation of system resources by communication is reduced.

Fig. 5 shows a flow chart of an interaction method 500 performed by an interaction terminal for a human-computer interaction system according to an embodiment of the present disclosure.

As shown in fig. 5, the method 500 includes: step S510, based on a second request instruction, calling an interface in the server to read a control instruction from a cache of the server through the interface, wherein the second request instruction is generated by performing voice recognition on voice input of a user; and step S520, operating the read control command to display the content corresponding to the control command.

Thus, the interface in the server is called by the second interactive terminal based on the second request instruction, so that the control instruction in the cache of the server is read through the called interface. The long connection communication between the server and the second interactive terminal can be eliminated, and the occupation of system resources by communication is reduced.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above method.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the above method.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program when executed by a processor implements the above method.

FIG. 6 illustrates a block diagram of an exemplary electronic device 600 that can be used to implement embodiments of the present disclosure.

Referring to fig. 6, a block diagram of a structure of an electronic device 600, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the electronic device 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608 and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the electronic device 600, and the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 608 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 609 allows the electronic device 600 to communicateExchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth^TMDevices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above. For example, in some embodiments, the methods described herein may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM603 and executed by the computing unit 601, one or more steps of the methods described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the methods described herein.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical aspects of the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the present disclosure.

Claims

1. A man-machine interaction system comprises a server, a first interaction terminal and a second interaction terminal, wherein,

the first interactive terminal is configured to:

receiving a voice input of a user;

generating a first request instruction by performing voice recognition on the voice input; and

sending the first request instruction to the server, and wherein,

the server is configured to:

in response to receiving the first request instruction, storing a control instruction corresponding to the first request instruction in a cache, and wherein,

the second interactive terminal is configured to:

calling an interface in the server based on a second request instruction to read the control instruction from a cache of the server through the interface; and

and operating the read control instruction to display the content corresponding to the control instruction.

2. The system of claim 1, wherein the second interactive terminal is further configured to:

receiving the voice input of a user; and

generating the second request instruction by performing voice recognition on the received voice input.

3. The system of claim 1, wherein the second request instruction is received from the first interactive terminal.

4. The system of any one of claims 1 to 3, wherein the second interactive terminal is further configured to send feedback to the first interactive terminal in response to successful execution of the control instructions, and

wherein the first interactive terminal is further configured to, in response to receiving the feedback, perform a voice broadcast based on the feedback.

5. The system of any of claims 1-3, wherein the first interactive terminal is further configured to:

in response to determining that the first request instruction has been sent, starting timing; and

and in response to not receiving the feedback from the second interactive terminal within the time length threshold value, sending the first request instruction to the server again.

6. The system of any of claims 1-3, wherein the first interactive terminal is further configured to:

in response to determining that the first request instruction has been sent, capturing an image currently displayed by the second interactive terminal;

performing image recognition on the shot image; and

and in response to that the preset identification in the image is not recognized, sending the first request instruction to the server again.

7. The system of any one of claims 1 to 3, wherein the first interactive terminal is a recognition robot that is movable following a user.

8. A server for a human-computer interaction system, the system further comprising a first interaction terminal and a second interaction terminal, wherein the server is configured to:

storing a control instruction corresponding to a first request instruction in a cache based on the first request instruction, wherein the first request instruction is generated by performing voice recognition on voice input of a user;

configuring a corresponding interface for the control instruction stored in the cache; and

and responding to the received call of the second interactive terminal, and providing the control instruction to the second interactive terminal through the interface.

9. An interactive terminal for a human-computer interaction system, the system further comprising a server, wherein the interactive terminal is configured to:

calling an interface in the server to read a control instruction from a cache of the server through the interface based on a second request instruction, wherein the second request instruction is generated by performing voice recognition on voice input of a user; and

10. An interaction method performed by a server for a human-computer interaction system, wherein the system further comprises a first interaction terminal and a second interaction terminal, the method comprising:

11. An interaction method performed by an interaction terminal for a human-computer interaction system, wherein the system further comprises a server, the method comprising:

calling an interface in the server based on a second request instruction to read a control instruction from a cache of the server through the interface, wherein the second request instruction is generated by performing voice recognition on voice input of a user; and

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 10-11.

13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 10-11.

14. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 10-11 when executed by a processor.