CN114179083A

CN114179083A - Method and device for generating voice information of leading robot and leading robot

Info

Publication number: CN114179083A
Application number: CN202111506393.XA
Authority: CN
Inventors: 张卫芳; 李旭; 支涛
Original assignee: Beijing Yunji Technology Co Ltd
Current assignee: Beijing Yunji Technology Co Ltd
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-03-15
Anticipated expiration: 2041-12-10
Also published as: CN114179083B

Abstract

The disclosure relates to the technical field of leading robots, and provides a method and a device for generating voice information of a leading robot and the leading robot. The method comprises the following steps: when the guiding robot detects that a visitor needs guiding help, acquiring a voice signal of the visitor; recognizing the language type of the voice signal; determining a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode; and generating a response voice signal corresponding to the language type of the visitor based on the generation mode, and playing the response voice signal on the leading robot. According to the method, the response voice signal corresponding to the visitor is generated on the lead robot or the server according to different language types, so that the lead robot can respond to more visitors in different languages, and the service range is enlarged.

Description

Method and device for generating voice information of leading robot and leading robot

Technical Field

The present disclosure relates to the field of leading robot technologies, and in particular, to a method and an apparatus for generating voice information of a leading robot, and a leading robot.

Background

At present, in some market or office building scenes, robots are used for providing guidance services for visitors. The visitors can speak the target places to which the visitors want to arrive to the robots, so that corresponding navigation information can be obtained, or the robots can explain the navigation information for the visitors in a voice mode, and the robots are very convenient and intelligent. However, these existing robots generally support only chinese or english, and if the visitor speaks other languages or dialects, the robots may not recognize the speaking contents of the visitor and thus cannot complete the lead service.

Disclosure of Invention

In view of this, the disclosed embodiments provide a method and an apparatus for generating voice information of a lead robot, and a lead robot, so as to solve the problems that the lead robot in the prior art supports few language types and cannot provide a lead service for visitors with more language types.

In a first aspect of the embodiments of the present disclosure, a method for generating voice information of a lead robot is provided, including: when the guiding robot detects that a visitor needs guiding help, acquiring a voice signal of the visitor; recognizing the language type of the voice signal; determining a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode; and generating a response voice signal corresponding to the language type of the visitor based on the generation mode, and playing the response voice signal on the leading robot.

In a second aspect of the embodiments of the present disclosure, there is provided a device for generating voice information of a lead robot, including: the voice acquisition module is configured to acquire a voice signal of a visitor when the guiding robot detects that the visitor needs guiding help; a speech recognition module configured to recognize a language type of the speech signal; the mode confirmation module is configured to determine a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode; a voice generating module configured to generate a response voice signal corresponding to the language type of the visitor based on the generating manner and play the response voice signal on the lead robot.

In a third aspect of the disclosed embodiments, there is provided a lead robot comprising a speaker, a display, and a computing device connected to the speaker and the display, respectively, the computing device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: when the guiding robot detects that a visitor needs guiding help, acquiring a voice signal of the visitor; recognizing the language type of the voice signal; determining a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode; the response voice signal corresponding to the language type of the visitor is generated based on the generation mode, and the response voice signal is played on the leading robot, so that the response voice signal corresponding to the visitor is generated on the leading robot or the server according to different language types, the leading robot can deal with more visitors in different languages, and the service range is enlarged.

Drawings

To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for generating voice information of a lead robot according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a device for generating leading robot voice information according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a computing device provided in an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A method and an apparatus for generating speech information of a lead robot according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include a robot 1, a user 2, a server 3 and a network 4.

The robot 1 is connected to the server 3 through the network 4, and in the application scenario of the present disclosure, the robot 1 may be a lead robot for providing navigation or lead services for the user 2. Specifically, the robot 1 may include a speaker 11 and a display 12, where the speaker 11 may be used to play guidance-related voice information, such as navigation information; the display 12 may be used to display information related to navigation or navigation, for example, the display 12 may display an information input interface for the user 2 to input a target location.

The server 3 may be a server that provides various services, for example, a backend server that receives a request sent by the robot 1 with which a communication connection is established, and the backend server may receive and analyze the request sent by the robot 1, and generate a processing result. The server 4 may be one server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.

The server 3 may be hardware or software. When the server 3 is hardware, it may be various electronic devices that provide various services to the robot 1. When the server 3 is software, it may be multiple software or software modules providing various services for the robot 1, or may be a single software or software module providing various services for the robot 1, which is not limited in the embodiment of the present disclosure.

The network 4 may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network that can interconnect various communication devices without wiring, for example, WIFI, which is not limited in the embodiments of the present disclosure.

The user 2 can speak a voice signal containing a target location which the user wants to go to the robot 1, the robot 1 identifies the language type of the voice signal after receiving the voice signal of the user 2, and according to the difference of the voice type, if the language type is the default language type of the lead robot, the robot 1 generates a corresponding response voice signal according to the target location; if the language type is not the default language type of the lead robot, the robot 1 sends the voice signal of the user 2 to the server 3, according to the language of the voice signal by the server 3.

It should be noted that the specific types, numbers and combinations of the robot 1, the server 3 and the network 4 may be adjusted according to the actual requirements of the application scenario, and the embodiment of the present disclosure does not limit this.

Fig. 2 is a flowchart of a method for generating leading robot voice information according to an embodiment of the present disclosure. The method of generating the lead robot voice information of fig. 2 may be performed by the robot 1 of fig. 1. As shown in fig. 2, the method for generating the voice information of the lead robot includes:

s201, when the guiding robot detects that a visitor needs guiding help, acquiring a voice signal of the visitor;

s202, identifying the language type of the voice signal;

s203, determining a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode;

s204, generating a response voice signal corresponding to the language type of the visitor based on the generation mode, and playing the response voice signal on the leading robot.

In particular, in a local manner, a response voice signal to respond to the visitor may be generated directly on the lead robot. For example, the default language type of the lead robot is chinese or mandarin, and when the visitor sends a voice signal containing a target location to the lead robot in chinese, the lead robot may locally generate navigation voice information, i.e., a response voice signal, in response to the target location. In the server method, the lead robot only needs to transmit a voice signal of the visitor to the server and receive a response voice signal of the responding visitor from the server, that is, the server generates a response voice signal for the visitor. For example, the default language type of the lead robot is chinese or mandarin, when the visitor sends a voice signal containing a target location to the lead robot using english, the lead robot forwards the voice signal to the server, the server identifies the content of the voice signal to obtain the target location, then generates response information of the target location, generates a corresponding response voice signal according to the response information, and returns the response voice signal to the lead robot, and the lead robot receives the response voice signal and plays the response voice signal on the lead robot. Since the server has a stronger information processing capability, it is possible to provide an analysis processing capability of a speech signal of a language type other than the default language type.

According to the embodiment of the disclosure, the language signal of the voice signal sent by the visitor is identified, and the response voice signal corresponding to the visitor is generated on the lead robot or the server according to different language types, so that the lead robot can deal with more visitors of different languages, and the service range is enlarged.

In some embodiments, the determining, based on the language type, a generation manner of the response voice signal of the lead robot, the generation manner including a local manner and a server manner, includes: detecting whether the language type of the voice signal is a default language type; if yes, determining to generate a response voice signal of the leading robot in a local mode; if not, determining to generate a response voice signal of the leading robot in a server mode.

Specifically, the default language type may be one or more language types set by the user according to the usage scenario, or may be a new default language type after the user adjusts the set default language type, which is not limited in this disclosure. In the embodiment of the present disclosure, a default language type, for example, chinese, is preferably preset for the lead robot.

The embodiment of the disclosure aims at that when a visitor sends out a voice signal by using the default language type of the robot, the corresponding response voice signal can be quickly obtained by the leading robot.

In some embodiments, in a case where it is determined that the response voice signal of the lead robot is generated in a local manner, the generating of the response voice signal corresponding to the language type of the visitor based on the generation manner includes: recognizing the voice signal of the visitor to obtain translation information of the voice recognition; inputting the translation information into a pre-trained navigation response model to obtain response information output by the navigation response model; a response speech signal of a default language type is generated based on the response information.

Specifically, the navigation response model may be a machine learning model obtained by learning in advance using the dialog template sample, and the questions of the visitor may be intelligently answered using the navigation response model. For example, the translation information corresponding to the voice signal of the visitor is: if the visitor wants to go to the target location A, the translation information is input into the navigation question-answer model, so that corresponding navigation information can be automatically obtained, and the navigation information is generated into a response voice signal with a default language type, so that the visitor can know the navigation information of the visited target location through the response voice signal played by the leading robot.

The navigation response model corresponding to the voice signal of the visitor is established in advance to intelligently generate the response voice signal corresponding to the visitor, so that the leading robot has higher intelligence and can quickly respond to the voice signal of the visitor.

In some embodiments, in a case where it is determined that the response voice signal of the lead robot is generated in a server manner, the generating of the response voice signal corresponding to the language type of the visitor based on the generation manner includes: sending the voice signal of the visitor to a server, requesting the server to generate a response voice signal consistent with the language type of the voice signal of the visitor, and connecting the server with the leading robot network; and receiving a response voice signal returned by the server.

Specifically, the server has richer calculation resources and calculation capabilities than the lead robot, and therefore, the server can deal with the voice signal of the non-default language type, and the response voice signal can be obtained quickly.

In the application scenario of fig. 1, if the robot 1 of fig. 1 is a lead robot, the server 3 may be connected to one or more robots 1 through the network 4, and when the robot 1 detects that the voice signal sent by the user 2 is not the default language type, the voice signal is sent to the server 3, the server 3 recognizes the voice signal to obtain corresponding translation information, and may also use a pre-trained navigation response model or other machine learning models to intelligently generate corresponding response information, encode the response information into a response voice signal consistent with the language type of the voice signal of the visitor, and return the response voice signal to the robot 1. In the embodiment of the present disclosure, in order to increase the communication speed between the robot 1 and the server 3, a wired network connection may be used between the robot 1 and the server 3, that is, the robot 1 is served under a local area network.

The embodiment of the disclosure generates the response voice signal of the visitor through the server, and can further expand the language service range of the lead robot.

In some embodiments, the obtaining the voice signal of the visitor comprises: and if the voice signal is not received within the preset time, displaying an information input interface on a display screen of the leading robot, and prompting the visitor to input a target place to be inquired on the information input interface.

Specifically, when the guidance robot detects that a visitor needs guidance assistance, some visitors may not send out a voice signal to seek guidance assistance in time, or may not send out a voice signal. For example, when the visitor is a disabled person, it is possible that the visitor cannot speak, and the lead robot cannot acquire the voice signal.

In the embodiment of the present disclosure, if the lead robot does not receive the voice signal within the preset time, an information input interface may be displayed on the lead robot for the visitor to manually input the lead information. In this way, the visitor can use the voice to seek the guidance service, and can also use the manual information input mode to acquire the guidance service.

Specifically, the preset time may be a time threshold set by the user according to empirical data, or may be a new time threshold obtained after the user adjusts the set preset time, which is not limited in this embodiment of the disclosure.

The embodiment of the disclosure provides an additional information input mode for the condition that the visitor does not send out the voice signal or does not send out the voice signal within the specified time, so that the leading robot can deal with more visitors in different conditions and provide more humanized and weekly leading service.

In some embodiments, the recognizing the language type of the speech signal includes: determining a language type of answering the lead information based on the lead information input by the visitor on the information input interface.

Specifically, if the visitor uses the information input interface to send information that the visitor wants to query to the lead robot, for example, the visitor inputs a target location at the information input interface, the lead robot may determine a language type used by the visitor according to the content input by the visitor, so as to subsequently generate response language information that is consistent with the language type.

Further, when the user inputs the target location to be queried by using the information input interface, the guidance robot can display the voice content corresponding to the response voice information on the information input interface while generating the response voice information, so that the guidance information can be acquired through the content displayed on the information input interface if the visitor does not have hearing.

The method and the device provide a corresponding language type identification mode aiming at the condition that the user manually inputs the information to be inquired, and provide a content display mode to enable the visitor to acquire the lead information in time while playing the response voice information.

In some embodiments, the recognizing the language type of the speech signal includes: and under the condition that the language type of the voice signal cannot be identified, controlling the leading robot to open a remote video window, and manually carrying out video call with the visitor through the remote video window.

Specifically, in practice, the language type of the voice signal sent by the visitor may not be within the preset range of the lead robot and the server, or the voice information sent by the visitor is not standard, or the visitor speaks a dialect, and the like, at this time, the lead robot may not be able to recognize the language type corresponding to the voice signal of the visitor. For these scenarios, the lead robot may open a remote video window to provide manual service for the visitor, i.e., a person makes a video call with the visitor, thereby addressing the visitor's lead service.

According to the embodiment of the disclosure, the condition that the service capability of the leading robot and the service capability of the server are out of range is dealt with by setting the remote video window, so that the leading robot can deal with visitors in various different scenes, and the condition that the leading service cannot be provided for the visitors is avoided as much as possible.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a device for generating leading robot voice information according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus for generating the voice information of the lead robot includes:

the voice acquisition module 301 is configured to acquire a voice signal of a visitor when the guidance robot detects that the visitor needs guidance help;

a speech recognition module 302 configured to recognize a language type of the speech signal;

a mode confirmation module 303 configured to determine a generation mode of a response voice signal of the lead robot based on the language type, where the generation mode includes a local mode and a server mode;

a voice generating module 304 configured to generate a response voice signal corresponding to the language type of the visitor based on the generating manner and play the response voice signal on the lead robot.

In some embodiments, the mode determination module 303 in fig. 3 detects whether the language type of the speech signal is a default language type; if yes, determining to generate a response voice signal of the leading robot in a local mode; if not, determining to generate a response voice signal of the leading robot in a server mode.

In some embodiments, in the case that it is determined that the answer voice signal of the lead robot is generated in a local manner, the voice generation module 304 in fig. 3 recognizes the voice signal of the visitor, and obtains translation information of the voice recognition; inputting the translation information into a pre-trained navigation response model to obtain response information output by the navigation response model; a response speech signal of a default language type is generated based on the response information.

In some embodiments, in the case where it is determined that the response voice signal of the lead robot is generated in a server manner, the voice generation module 304 in fig. 3 transmits the voice signal of the visitor to a server requesting the server to generate a response voice signal in accordance with the language type of the voice signal of the visitor, the server being connected to the lead robot network; and receiving a response voice signal returned by the server.

In some embodiments, the voice acquiring module 301 in fig. 3 is configured to display an information input interface on the display screen of the lead robot if no voice signal is received within a preset time, and prompt the visitor to input a target location to be queried on the information input interface.

In some embodiments, the speech recognition module 302 in fig. 3 determines the language type of answering the lead information based on the lead information entered by the visitor on the information input interface.

In some embodiments, in the case that the speech recognition module 302 in fig. 3 cannot recognize the language type of the speech signal, the apparatus for generating the speech information of the lead robot further includes: and a video call module 305 configured to control the lead robot to open a remote video window through which a video call is manually made with the visitor.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

Fig. 4 is a schematic diagram of a computing device 400 provided by embodiments of the present disclosure. The computing device 4 in fig. 4 may be applied in the robot 1 in the application scenario of fig. 1, for example, the robot 1 may include a speaker, a display, and the computing device 4, and the computing device is connected to the speaker and the display, respectively. As shown in fig. 4, the computing device 400 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present disclosure. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions that are used to describe the execution of computer program 403 in computing device 400.

The computing device 400 may be an electronic device such as a desktop computer, a notebook, a palmtop, and a cloud server. Computing device 400 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of a computing device 400 and is not intended to be limiting of computing device 400, and may include more or less components than those shown, or some of the components may be combined, or different components, e.g., computing device may also include input output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the computing device 400, such as a hard disk or memory of the computing device 400. The memory 402 may also be an external storage device of the computing device 400, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc., provided on the computing device 400. Further, memory 402 may also include both internal storage units and external storage devices of computing device 400. The memory 402 is used to store computer programs and other programs and data required by the computing device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/computing device and method may be implemented in other ways. For example, the above-described apparatus/computing device embodiments are merely illustrative, and for example, a division of modules or units is merely one logical division, and an actual implementation may have another division, multiple units or components may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims

1. A method for generating voice information of a leading robot is characterized by comprising the following steps:

when the guiding robot detects that a visitor needs guiding help, acquiring a voice signal of the visitor;

recognizing the language type of the voice signal;

determining a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode;

and generating a response voice signal corresponding to the language type of the visitor based on the generation mode, and playing the response voice signal on the leading robot.

2. The method according to claim 1, wherein the determining a generation manner of the response voice signal of the lead robot based on the language type, the generation manner including a local manner and a server manner, comprises:

detecting whether the language type of the voice signal is a default language type;

if yes, determining to generate a response voice signal of the leading robot in a local mode;

if not, determining to generate a response voice signal of the leading robot in a server mode.

3. The method according to claim 2, wherein in a case where it is determined that the answer voice signal of the lead robot is generated in a local manner, the generating the answer voice signal corresponding to the language type of the visitor based on the generation manner includes:

recognizing the voice signal of the visitor to obtain translation information of the voice recognition;

inputting the translation information into a pre-trained navigation response model to obtain response information output by the navigation response model;

a response speech signal of a default language type is generated based on the response information.

4. The method according to claim 2, wherein in a case where it is determined that the response voice signal of the lead robot is generated in a server manner, the generating a response voice signal corresponding to a language type of the visitor based on the generation manner includes:

sending the voice signal of the visitor to a server, requesting the server to generate a response voice signal consistent with the language type of the voice signal of the visitor, wherein the server is connected with the leading robot network;

and receiving a response voice signal returned by the server.

5. The method of claim 1, wherein the obtaining the voice signal of the visitor comprises:

and if the voice signal is not received within the preset time, displaying an information input interface on a display screen of the leading robot, and prompting the visitor to input a target place to be inquired on the information input interface.

6. The method of claim 5, wherein the identifying the language type of the speech signal comprises: determining a language type of answering the lead information based on the lead information input by the visitor on the information input interface.

7. The method according to any one of claims 1-6, wherein said identifying a linguistic type of the speech signal comprises:

and under the condition that the language type of the voice signal cannot be identified, controlling the leading robot to open a remote video window, and manually carrying out video call with the visitor through the remote video window.

8. An apparatus for generating speech information of a lead robot, comprising:

the voice acquisition module is configured to acquire a voice signal of a visitor when the guiding robot detects that the visitor needs guiding help;

a speech recognition module configured to recognize a language type of the speech signal;

the mode confirming module is configured to determine a generation mode of a response voice signal of the leading robot based on the language type, wherein the generation mode comprises a local mode and a server mode;

a voice generating module configured to generate a response voice signal corresponding to a language type of the visitor based on the generation manner and play the response voice signal on the lead robot.

9. A lead robot comprising a speaker, a display and a computing device connected to the speaker and the display, respectively, the computing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.